Latest News

 

Gordon Bell Special Prize Winning Team Reveals AI Workflow for Molecular Systems in the Era of COVID-19

Published on November 19, 2020 by Rachel Harken, ORNL / Jorge Salazar, TACC



Since a team at the University of Texas at Austin and the National Institutes of Health first mapped the SARS-CoV-2 spike protein—the main infection machinery of the virus that causes the COVID-19 disease—scientists around the world have been eager to understand more about this structure and the others that make up the virus to better predict which drugs might successfully be used against it.

Rommie Amaro, Professor of Chemistry and Biochemistry, University of California, San Diego.
Some of the methods used to study the virus include imaging techniques such as X-ray imaging and cryogenic electron microscopy—which uses beams of electrons to image frozen samples—but these fall short of capturing the dynamic movements of the viral proteins. But, computer simulations, such as those performed on systems like the Oak Ridge Leadership Computing Facility's (OLCF's) 200-petaflop IBM AC922 Summit supercomputer, can help scientists capture the movements of these structures virtually. The Frontera supercomputer at the Texas Advanced Computing Center (TACC) and the TACC team also played an important role in the groundbreaking scientific work.

A team led by Rommie Amaro, professor and endowed chair of chemistry and biochemistry at the University of California San Diego, and Arvind Ramanathan, computational biologist at Argonne National Laboratory, has been exploring the movement of the virus's spike protein to understand how it behaves and gains access to the human cell. Now, the team has built a first-of-its-kind workflow based on artificial intelligence (AI) and has run it on the Summit supercomputer to simulate the spike in numerous environments, including within the SARS-CoV-2 viral envelope comprising 305 million atoms—the most comprehensive simulation of the virus performed to date.

Truly, the level of support has been so amazing, there would be no Gordon Bell submission without TACC. Of all the supercomputing sites we used, TACC was the most essential to the work.
Rommie Amaro, UCSD
The accomplishment has earned the team a winning nomination for the Association for Computing Machinery (ACM) Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research, a special version of the ACM Gordon Bell Prize, one of the most coveted awards in supercomputing to be presented at this year's SC20 virtual conference. Both awards acknowledge outstanding achievements in high performance computing, with the new prize focused specifically on COVID-19 research.

"Experiments give us a picture of what these things look like, but they can't tell us the whole story," Amaro said. "The only way we can do this is through simulations, and right now we are pushing the capabilities of molecular simulations to the limits of the computer architectures that we have on this earth. This is at the edge of possibilities of what people are capable of doing."

The team first optimized the Nanoscale Molecular Dynamics (NAMD) and the Visual Molecular Dynamics codes, which model the movements of atoms in time and space, on multiple smaller cluster systems: the Frontera supercomputer at TACC, the Comet system at the San Diego Supercomputer Center, and ThetaGPU at the Argonne Leadership Computing Facility (ALCF).

A team led by Rommie Amaro of UC San Diego has built a first-of-its-kind workflow based on artificial intelligence. Initial simulations developed on the NSF-funded Frontera supercomputer at TACC and scaled up on the Summit supercomputer of the Oak Ridge National Laboratory simulated the spike in numerous environments, including within the SARS-CoV-2 viral envelope comprising 305 million atoms—the most comprehensive simulation of the virus performed to date.
"TACC Frontera as well as Longhorn were instrumental in our work," Amaro said. Frontera is a Dell-Intel supercomputer funded by the National Science Foundation and is currently ranked as the ninth fastest in the world by Top500. Longhorn works as a subsystem to Frontera and was built in partnership with IBM to support graphics-processor-unit (GPU)-accelerated workloads.

Two-parallel-membrane system of the spike-ACE2 complex (8.5 M atoms). The spike protein is depicted with a gray transparent surface, whereas the ACE2 receptor is shown with a yellow transparent surface. Glycans are shown in blue. Credit: Lorenzo Casalino (UCSD) et al.
"From the very beginning, in February, the TACC team has been extraordinarily supportive of our work, in multiple ways. We used Frontera for the initial spike simulations and the viral envelope preparatory simulations, which was fantastic," Amaro explained.

She added that Longhorn was utilized for all of the weighted ensemble runs. "These were super intensive runs that ran for weeks on a hundred of Longhorn's GPUs. Truly, the level of support has been so amazing, there would be no Gordon Bell submission without TACC. Of all the supercomputing sites we used, TACC was the most essential to the work," she said.

"With the burden of the COVID-19 pandemic being experienced every day, it feels good to have the hardware and software environment in place to assist Dr. Amaro's lab to gather results as quickly as possible," said John Fonner, manager of the Life Sciences Computing - Emerging Technologies group and lead on the Longhorn system at TACC.

The optimizations prepared the team to run full-scale simulations on the OLCF's Summit. The OLCF and the ALCF are US Department of Energy (DOE) Office of Science User Facilities located at DOE's Oak Ridge and Argonne National Laboratories, respectively.

After code optimizations, the team was able to successfully scale NAMD to 24,576 of Summit's NVIDIA V100 GPUs. The results of the team's initial runs on Summit have led to discoveries of one of the mechanisms that the virus uses to evade detection as well as a characterization of interactions between the spike protein and the protein that the virus takes advantage of in human cells to gain entrance into them—the ACE2 receptor.

"This is one of the first biological systems of the virus that we can learn from to drive scientific discovery," Amaro said. "Our methods of computing allow us to get down to actually see detailed intricacies of this virus that are useful for understanding not only how it behaves but also its vulnerabilities, from a vaccine development standpoint, and a drug targeting perspective."

TOP: The Frontera supercomputer at TACC is an NSF-funded Dell-Intel system ranked as the ninth fastest in the world by Top500 (Nov. 2020). BOTTOM: The IBM Longhorn GPU subsystem of the Frontera supercomputer.
Because one set of the calculations generated a whopping 200 terabytes of data, the team used AI to identify the intrinsic features from the simulations and break down the information to help them interpret what was happening. By layering the experimental data and the simulation data and combining it with their AI-based approach, the researchers were able to capture the virus and its mechanisms in unprecedented detail.

"We never thought we could use our machine-learning tools at this scale," Ramanathan said. "Using these AI-based approaches on Summit has helped accelerate the process of truly understanding the motion of these complex systems."


This research was supported by the Exascale Computing Project; the DOE National Virtual Biotechnology Laboratory, with funding provided by the Coronavirus CARES Act; and the COVID-19 HPC Consortium.

Related Publication: Lorenzo Casalino, Abigail Dommer, Zied Gaieb, Emilia P. Barros, Terra Sztain, Surl-Hee Ahn, Anda Trifan, Alexander Brace, Anthony Bogetti, Heng Ma, Hyungro Lee, Matteo Turilli, Syma Khalid, Lillian Chong, Carlos Simmerling, David J. Hardy, Julio D. C. Maia, James C. Phillips, Thorsten Kurth, Abraham Stern, Lei Huang, John McCalpin, Mahidhar Tatineni, Tom Gibbs, John E. Stone, Shantenu Jha, Arvind Ramanathan, and Rommie E. Amaro. "AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics." To appear in International Journal of High Performance Computing Applications, 2020.


Contact

Faith Singer-Villalobos

Communications Manager
faith@tacc.utexas.edu | 512-232-5771

Aaron Dubrow

Science And Technology Writer
aarondubrow@tacc.utexas.edu

Jorge Salazar

Technical Writer/Editor
jorge@tacc.utexas.edu | 512-475-9411