Latest News

 

Supporting Portable, Reproducible Computational Science

Published on February 18, 2020 by Aaron Dubrow



The Dynamics of Plate Tectonics and Mantle Flow: From Local to Global Scales, Science 27 Aug 2010: Vol. 329, Issue 5995, pp. 1033-1038, DOI: 10.1126/science.1191223 [Credit: Georg Stadler, Michael Gurnis, Carsten Burstedde, Lucas C. Wilcox, Laura Alisic, Omar Ghattas]

Researchers who use supercomputers for science typically don't limit themselves to one system. They move their projects to whatever resources are available, often using many different systems simultaneously, in their lab, on their campus cluster, at advanced computing centers like TACC, and in the cloud.

It's not a lack of loyalty, it's just a fact of research life — opportunities are shifting and hungry scientists find what they need to get their research accomplished.

The systems researchers use aren't necessarily the same though. They may contain different hardware with different architectures, and different compilers or libraries.

This opportunistic computing paradigm creates a lot of extra work for computational scientists and system administrators – adapting old codes to work on new systems or installing software packages multiple times. As operating systems evolve, supporting code that was developed on deprecated environments becomes a reproducibility challenge.

Isosurfaces of velocity magnitude in a flow pattern called the Taylor--Green vortex that is used to study turbulence models. [Credit: David Kamensky, Yuri Bazilevs]

In recent years, a new solution has emerged. Generically called "containers," it involves a form of isolation, where a researcher's code is packaged together with all the software dependencies in such a way that it can run at many sites without requiring recompilations. By incorporating an application's many dependencies into self-sustainable images, containers avoid a lot of problems.

Popularized by Docker in 2013, containers were quickly accepted in both the commercial and academic scientific computing arenas. TACC was an early adopter and began enabling containerized science in 2016 – first through Docker, and more recently through Singularity, which was released in 2015 by a team of researchers at Lawrence Berkeley National Laboratory, and is particularly well-suited to high-performance computers, which contain tens of thousands of tightly connected processors.

Containerized Science

Among the users of containers on TACC systems are Thomas Hughes — professor of Aerospace Engineering and Engineering Mechanics at The University of Texas at Austin and a member of the National Academies of Science and Engineering — and David Kamensky — a former member of Hughes' team, now an assistant professor at the University of California, San Diego. The pair use containers to develop predictive models of coronary artery flow and study turbulence.

"The reason we started using containers was to run the numerical PDE [partial differential equation] software FEniCS on Stampede2," said Kamensky. "FEniCS is a complicated software with many dependencies, and it can be difficult to install."

When they needed to perform an isogeometric analysis on top of FEniCS, they converted a Docker image maintained by the FEniCS project team into a Singularity image and ended up using more than 1,000 node hours on Stampede2.

Collaborating with John Evans at the University of Colorado, Boulder (CU Boulder) on a turbulence modeling study, they were able to easily switch from Stampede2 to CU Boulder's cyberinfrastructure because of containerization.

"I don't see it as practical for supercomputer centers to maintain and debug all the different software to meet every scientist's needs," Kamensky said. "With Singularity, they only need to maintain one piece of software."

Sharon Glotzer, a professor of chemical engineering at the University of Michigan and a member of both the National Academy of Sciences and the National Academy of Engineering, also uses Singularity on TACC and several other centers to study how the building blocks of matter transition from fluid to solid to better understand how to design new materials.

Lee, S., Teich, E. G., Engel, M. & Glotzer, S. C., "Entropic colloidal crystallization pathways via fluid–fluid transitions and multidimensional prenucleation motifs," Proc. Natl. Acad. Sci. 116, 14843–14851 (2019). DOI: 10.1073/pnas.1905929116

In particular, her group uses molecular simulations to study the assembly behavior of large numbers of hard particles into shapes using HOOMD-Blue, a general-purpose particle simulation toolkit.

"We make use of compute resources on XSEDE — including Stampede2, Comet, and Bridges — Summit at the Oak Ridge Leadership Computing Facility, and local clusters," said Joshua Anderson, a research area specialist who builds and maintain the container images for Glotzer's group. "Singularity containers allow us to use the same software environment across all of these systems so that we can easily transition our workflows between them."

Transitioning between systems isn't always trivial. Moving workflows that use Message Passing Interfaces (MPI) to harness the parallel computing power of supercomputers is still the biggest challenge researchers face for using containers on different clusters.

"Each requires its own compatible MPI and user-space driver stack inside the container," Anderson said. To address this, Anderson builds specialized images for each system based on the same container recipe. In 2019, their team used more than 5,300 node hours on Stampede2 and many more on other systems.

"Each requires its own compatible MPI and user-space driver stack inside the container," Anderson said. To address this, Anderson builds specialized images for each system based on the same container recipe. In 2019, their team used more than 5,300 node hours on Stampede2 and countless more on other systems.

Michael Gurnis, professor of Geophysics and director of the Seismological Laboratory at Caltech, uses containers on Stampede2 to develop computational models of subduction and mantle flow, perform large parallel computation of grid searches, and explore the parameter space with first order impact on the evolution of the Earth using the geodynamic code, Underworld2.

Configuring Underworld2 as a docker image allowed Gurnis's team to circumvent the installation of dependent packages, configure the environment to their needs, and easily run the code on Stampede2.

Ian Wang, a Research Associate in the HPC Performance & Architectures Group at TACC, worked with the developer of the Underworld software to containerize the tool. "I think this is the first application that runs at very large scale within Singularity containers using MPI," Wang said. "The users and the developers actually helped me identify a bug in Singularity that only appears at large scale MPI runs."

Gurnis's team has used 11,000 node hours so far in containers and expects to continue. "With the pulled image and Singularity, we can circumvent the annoying installation of relevant packages and configuration of environment. Containerization makes it easy to install and run a large code on Stampede2," said Yida Li, a graduate student in Gurnis' group.

Containers Help Community Computing Efforts

Biology and bioinformatics are two of the leading communities that have adopted containerization. The disciplines are relative newcomers to the HPC world, and the development of new codes and tools in the field has been fast and furious. This has led to some problems.

"A recent study found that, in bioinformatics, of the software published in last 10 years, 50 percent couldn't be installed," said Greg Zynda, a bioinformatician and member of the Life Sciences Computing group at TACC. "They were built for legacy operating systems and can't be run on today's supercomputers. We're trying to solve that problem using containers."

Zynda has led an effort at TACC to make 16,000 bio-containers available on TACC's supercomputers, eliminating the need for life science researchers to package and maintain each and every piece of software in the field.

Two of TACC's largest collaborative software services projects — the DARPA Synergistic Discovery and Design (SD2E) project and Cyverse — also leverage containers extensively.

SD2E uses automation and machine learning to accelerate discovery in areas where the underlying model is not well understood. The project brings together research labs and companies from around the U.S. with complementary capabilities, and streamlines their interactions using cyberinfrastructure at TACC, including: a centralized data repository and data catalog for sharing, provenance, and discovery; the Tapis APIs to support automated, event-driven data analysis using both cloud and HPC back-end hardware; and a developer ecosystem of command-line tooling, version control, and continuous integration.

"The platform is extremely powerful and flexible, and almost every component uses Docker containers as a core building block," said John Fonner, who manages Emerging Technologies in TACC's Life Sciences Computing group. "The analysis tools, persistent services, and even the APIs themselves are almost entirely composed of containers."

Cyverse is an NSF-funded project that helps life scientists use cyberinfrastructure to handle huge datasets and complex analyses, thus enabling data-driven discovery. CyVerse includes a data storage facility, an interactive, web-based, analytics platform, and cloud infrastructure to for computation, analysis, and storage — much of it build and maintained at TACC.

An organizational overview of how a user might leverage iVirus, iMicrobe and CyVerse Apps to analyze a viral metagenomic data set. Apps can be deployed using Docker images, where the code is packaged with additional software dependences and can be run on CyVerse Docker-dedicated servers. [Credit: Benjamin Bolduc, Ken Youens-Clark, Simon Roux, Bonnie L Hurwitz & Matthew B Sullivan]

"Docker containers are the primary way Cyverse researchers integrate custom apps into the platform," said Fonner. "This includes non-interactive cloud and HPC apps, as well as interactive ‘VICE' apps such as Jupyter Notebooks."

Though often treated as a panacea, there are trade-offs to using containers, Zynda says. Containers are not always as optimized for performance as they could be, and once they are created, they are static and not easy to change.

"But once a container is published and can solve a problem, maybe it doesn't need to change," he said.

TACC doesn't only enable the use of containers, they are active in developing tools to make containers easier to use. One example of this is Rolling Gantry Crane (RGC) — named after the machines that offload containers from ships in harbors. RGC integrates containers into TACC's environmental module system (Lmod) to enable familiar interactions, essentially making the container system transparent.

TACC also trains researchers to use containers through frequent workshops, webinars, and integration into TACC's Institutes.

"We believes that software containers are an important part of reproducible computing," Fonner said. "Containers are supported on all our HPC clusters and in our cloud infrastructure. Internally, we use Docker heavily in our standard development practices, and we deploy images to compute resources using both Docker and Singularity interchangeably. In a short time, containers have become a central part of how we support science at TACC."


Story Highlights

"Containers" involves a form of isolation, where a researcher's code is packaged together with all the software dependencies in such a way that it can run at many sites without requiring recompilations.

TACC was an early adopter of the approach and began enabling containerized science in 2016. More than 16,000 biology-related containers are currently available on TACC's supercomputers.

The center also teaches workshops on using containers and develops tools to make using containers easier.

Scientists have used containers on TACC systems to advance research in geophysics, materials science, and turbulence.

Large community projects like CyVerse and DARPA's Synergistic Discovery and Design (SD2E) use containers on TACC systems as well.


Contact

Faith Singer-Villalobos

Communications Manager
faith@tacc.utexas.edu | 512-232-5771

Aaron Dubrow

Science And Technology Writer
aarondubrow@tacc.utexas.edu

Jorge Salazar

Technical Writer/Editor
jorge@tacc.utexas.edu | 512-475-9411