Latest News

 

Texascale Days at TACC: Labor Day Edition

Published on October 6, 2020 by Aaron Dubrow



An overview of a slice of a large-scale simulation of the universe in three views: dark matter density (left), gas temperature (center) and metallicity field (right). [Credit: Yueying Ni, Carnegie Mellon University]

There's a kind of bravado implicit in supercomputing. How large a simulation can you run? How many processors can you use simultaneously? How efficiently can you scale your code? All for the good of science and society.

The Texascale Days event during the first week of September 2020 provided an opportunity for eight research groups to use large sections of the National Science Foundation-funded Frontera supercomputer at the Texas Advanced Computing Center (TACC), from 100,000 processors to the entire system — nearly 450,000 processors — a scale at which very few simulations have ever been performed.

The event allowed scientists to push their codes to new limits, to accomplish weeks' worth of work in 24 hours, and to explore new algorithms with the capability of improving computational science across the board at scale.

Pipelines for COVID-19 Protagonists

Andre Merzky is a researcher in the Research in Advanced DIstributed Cyberinfrastructure and Applications Laboratory (RADICAL) group at Rutgers University. The group provides workflow and workload execution systems which ensure that certain types of fine and medium granular workloads can effectively and reliably be run on different HPC systems.

Volume renderings of a 2% thin meridional slice through the center of a star, showing the log of the concentration of entrained stably stratified gas (top), the azimuthal velocity magnitude (middle), and vorticity magnitude (bottom) for the Texascale simulation with luminosity enhanced by a factor of 1000. [Credit: Paul Woodward, University of Minnesota]

Like many, Merzky and his team have recently been pulled into a large scale collaborative effort to investigate drug design for COVID-19.

"Our specific task is to scan very large numbers of chemical compounds for their behavior toward identified COVID-19 receptors," he wrote. "That work is performed in multiple stages: the first stage runs a very quick and rough scan through all 'interesting' compounds, identifying those which show promising binding properties, and subsequent stages perform increasingly fine-grained and thorough analysis of the binding behavior for the ever smaller subsets of compounds."

Merzky's Texascale run focused on the first stage of the above pipeline. They recently received a new database of 120 million compounds which need to be scanned against 50 receptors. A single docking attempt takes between two seconds and five minutes on a single core. Despite being ‘trivially parallel,' there is a lot of data management and coordination involved in these runs, a task at which RADICAL's tools excel.

Their usual production runs on Frontera explore about six million compounds, use 128 nodes per receptor, and finish in 24 to 60 hours.

"While we did a number of experiments to ensure that we scale to about 1,700 nodes, it's still challenging to approach the large data set," Merzky wrote. "Well, that is what we did during our Texascale run. We used just shy of 4,000 nodes to scan those 120 million compounds against two receptors."

They managed a complete scan of the database for each receptor in about seven hours. "We will now have to churn through the remaining receptors more slowly — but having one complete scan allows our biochemist to proceed with some analysis much quicker than they could have done otherwise."

Not only that. The large scale run confirmed that their software stack scales as expected. "For the COVID collaborations, it's the first complete scan of that database, and the results are needed to gauge the algorithms of all stages, and to judge the viability of that compound database."

Spiking the COVID-19 Spike

Mahmoud Moradi, a computational chemist at the University of Arkansas, is another researcher who used his Texascale Days compute time to study COVID-19 dynamics.

Moradi has been working on COVID research continuously since April, studying SARS Coronavirus 1 and 2 spike proteins, the causes of the 2003 SARS epidemic and the current COVID pandemic, respectively.

The massive simulation his group ran during Texascale Days used 4,000 virtual copies of the coronavirus spike protein to explore the activation pathway of the protein. These copies exchange minimal but very useful information between each other using a statistical mechanics-based scheme, where they inform each other of their positions along the activation pathway.

Each copy of the system uses one node, but the 4,000 copies occupy 4,000 nodes altogether and the particular communication scheme allows for a highly scalable and efficient way of eventually characterizing the activation pathway of spike protein.

"The simulation was definitely very helpful in finalizing our work on coronavirus spike proteins," he wrote. "We were using a smaller number of copies and nodes before, which would mean a longer time to reach convergence," he explained.

Moradi's results show a meaningful difference between the mechanisms SARS Coronavirus 1 and 2 spike proteins use, which may at least partially explain the dramatically different patterns of the virus' spread.

Space Modelers

Efforts by Gabor Toth, Simeon Bird, and Paul Woodward used Frontera to study space — for Toth, space weather; for Bird, the behavior of black holes; and for Woodward, the death of stars.

"We performed weak scaling study for our brand-new particle-in-cell code FLexible Exascale Kinetic Simulator (FLEKS)," wrote Toth, a research professor in Climate and Space Sciences and Engineering at the University of Michigan. He found that the code scales well up to at least 28,000 CPU cores with typical workload, and can run on up to 230,000 cores that is half of Frontera.

Asymmetric magnetic reconnection simulation performed with the 3D kinetic plasma simulation model FLEKS by Yuxi Chen during the Texascale Days. The electron velocity Ue,z shows turbulence-like features, which is probably related to the Lower Hybrid Drift Instability. [Credit: Toth Group, University of Michigan]

His team also performed a production simulation with 57,000 cores to study asymmetric magnetic reconnection, which is the most important physical process that controls the interaction between the Earth's magnetosphere and the solar wind. This simulation shows turbulence-like electron flows (depicted in the image, left).

"Without the access to thousands of the CPU nodes, it would take days to run such a large simulation," Toth said.

Yueying Ni, graduate student at Carnegie Mellon University, Simeon Bird, professor of physics and astronomy at the University of California, Riverside, and colleagues are using Frontera to run an extremely large simulation of the Universe. It will go from the beginning of time until the era during which star formation in the universe peaked, and it will contain more particles than any other simulation of its type.

"Frontera enabled us to run higher resolution simulations, which let us study physical processes at higher fidelity and make better predictions for future gravitational wave experiments such as the LISA satellite," Bird wrote. "We used Texascale Days to perform the first major segment of our big run, which will take a few months to complete."

University of Minnesota astrophysicist Paul Woodward's team was given access to the entire Frontera machine (7,900 nodes) for 36 hours during Texascale Days. They used this opportunity to perform special, highly resolved simulations of rotating massive main sequence stars. These simulations allow them to compare their results with asteroseismology observations of such stars as a means of checking that their models and simulation technology are accurate.

"The very high grid resolution that Frontera's enormous computing power makes possible enables us for the first time to seriously address questions that concern effects of the convection and rotation on the flow that one might consider to be in some sense second-order small, but are very important over the lifetime of the star," he wrote.

Pushing Turbulence to the Limit

Earth-bound, but still computationally intensive, were tornado-producing thunderstorm simulations by Leigh Orf, an atmospheric scientist with the Space Science and Engineering Center at the University of Wisconsin-Madison.

Orf used his time to benchmark CM1, the cloud model used in his research, at extreme scale. The experiment allowed him to get a much more accurate sense of how much compute time he would need to complete some of his most ambitious simulations of high-intensity storms and the process by which tornadoes are spawned.

"I was very pleased that CM1 showed excellent strong and weak scaling from 512 to 2,048 nodes, and with comparable performance up to 3,900 nodes," he recalled. "Now I have timings and scaling information that I can use for the next round of proposals."

PK Yeung, whose work focuses on the fundamentals of turbulence, used Frontera to study extreme fluctuations in local concentration gradients.

"Large-scale simulations are important to understand the behavior of extreme fluctuations in the scalar gradients, but often limited by their cost," Yeung wrote.

Using large portions of Frontera during the Texascale Days event allowed Yeung's group to make faster progress than possible otherwise. "We have developed a new approach where a long simulation at modest small-scale resolution will provide optimal initial conditions for multiple short simulations at very high resolution."

Testing New Community Libraries

Hari Subramoni, working with DK Panda and his team at Ohio State University, took advantage of Texascale Days to optimize and tune the MVAPICH2 MPI library for Frontera. Panda, a co-principal investigator on the system, leads the MVAPICH2 project, an MPI (or Message Passing Interface) implementation that he hopes will be the fastest way for parallel computing systems to exchange data and compute.

They carried out large-scale experiments, including full-scale system runs, for multiple features of the MVAPICH2 library. This included:

  • Scalable job start-up – They optimized the MVAPICH2 library to complete job startup in 31 seconds for 229,376 processes on 4,096 nodes with 56 processes per node.
  • Hardware multicast – They demonstrated that InfiniBand hardware multicast-based designs were able to improve the latency of MPI_Bcast by up to a factor of two for at 2,048 nodes.
  • Impact of SHARP support for multiple collectives - They accelerated the performance of MPI_Allreduce, MPI_Reduce, and MPI_Barrier at 7,861 nodes (full system scale) by a factor of five, six, and nine time respectively, using Mellanox Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) technology.

"These optimizations and tuning will be available in the next release of the MVAPICH2 library," said Subramani. "This will help the Frontera application users extract higher performance and scalability with the MVAPICH2 library."

"I'm very excited by the results of our third Texascale Days event," said John Cazes, TACC's director of HPC and organizer of the event. "Several of the groups that computed had never run their codes at this scale before. The event showed them what works, what doesn't, what can be learned by running larger problems, and pushed the frontiers of parallelism for many codes. It's a great chance for us and our users to take their codes to extreme scales."


Story Highlights

The Texascale Days event during the first week of September 2020 provided an opportunity for eight research groups to use large sections Frontera, from 100,000 processors to the entire system, a scale at which very few simulations have ever been performed.

Research topics ranged from astrophysics and space science to COVID-19 simulations, and from fundamental studies of turbulence to experiments with new community HPC libraries.


Contact

Faith Singer-Villalobos

Communications Manager
faith@tacc.utexas.edu | 512-232-5771

Aaron Dubrow

Science And Technology Writer
aarondubrow@tacc.utexas.edu

Jorge Salazar

Technical Writer/Editor
jorge@tacc.utexas.edu | 512-475-9411