Research by team at Argonne National Laboratory, UC San Diego leads to novel understanding of SARS-CoV-2; research supported by ORNL's Summit and TACC's Frontera supercomputers
Since a team at the University of Texas at Austin and the National Institutes of Health first mapped the SARS-CoV-2 spike protein—the main infection machinery of the virus that causes the COVID-19 disease—scientists around the world have been eager to understand more about this structure and the others that make up the virus to better predict which drugs might successfully be used against it.
A team led by Rommie Amaro, professor and endowed chair of chemistry and biochemistry at the University of California San Diego, and Arvind Ramanathan, computational biologist at Argonne National Laboratory, has been exploring the movement of the virus's spike protein to understand how it behaves and gains access to the human cell. Now, the team has built a first-of-its-kind workflow based on artificial intelligence (AI) and has run it on the Summit supercomputer to simulate the spike in numerous environments, including within the SARS-CoV-2 viral envelope comprising 305 million atoms—the most comprehensive simulation of the virus performed to date.
Truly, the level of support has been so amazing, there would be no Gordon Bell submission without TACC. Of all the supercomputing sites we used, TACC was the most essential to the work.The accomplishment has earned the team a winning nomination for the Association for Computing Machinery (ACM) Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research, a special version of the ACM Gordon Bell Prize, one of the most coveted awards in supercomputing to be presented at this year's SC20 virtual conference. Both awards acknowledge outstanding achievements in high performance computing, with the new prize focused specifically on COVID-19 research.
"Experiments give us a picture of what these things look like, but they can't tell us the whole story," Amaro said. "The only way we can do this is through simulations, and right now we are pushing the capabilities of molecular simulations to the limits of the computer architectures that we have on this earth. This is at the edge of possibilities of what people are capable of doing."
The team first optimized the Nanoscale Molecular Dynamics (NAMD) and the Visual Molecular Dynamics codes, which model the movements of atoms in time and space, on multiple smaller cluster systems: the Frontera supercomputer at TACC, the Comet system at the San Diego Supercomputer Center, and ThetaGPU at the Argonne Leadership Computing Facility (ALCF).
She added that Longhorn was utilized for all of the weighted ensemble runs. "These were super intensive runs that ran for weeks on a hundred of Longhorn's GPUs. Truly, the level of support has been so amazing, there would be no Gordon Bell submission without TACC. Of all the supercomputing sites we used, TACC was the most essential to the work," she said.
"With the burden of the COVID-19 pandemic being experienced every day, it feels good to have the hardware and software environment in place to assist Dr. Amaro's lab to gather results as quickly as possible," said John Fonner, manager of the Life Sciences Computing - Emerging Technologies group and lead on the Longhorn system at TACC.
The optimizations prepared the team to run full-scale simulations on the OLCF's Summit. The OLCF and the ALCF are US Department of Energy (DOE) Office of Science User Facilities located at DOE's Oak Ridge and Argonne National Laboratories, respectively.
After code optimizations, the team was able to successfully scale NAMD to 24,576 of Summit's NVIDIA V100 GPUs. The results of the team's initial runs on Summit have led to discoveries of one of the mechanisms that the virus uses to evade detection as well as a characterization of interactions between the spike protein and the protein that the virus takes advantage of in human cells to gain entrance into them—the ACE2 receptor.
"This is one of the first biological systems of the virus that we can learn from to drive scientific discovery," Amaro said. "Our methods of computing allow us to get down to actually see detailed intricacies of this virus that are useful for understanding not only how it behaves but also its vulnerabilities, from a vaccine development standpoint, and a drug targeting perspective."
"We never thought we could use our machine-learning tools at this scale," Ramanathan said. "Using these AI-based approaches on Summit has helped accelerate the process of truly understanding the motion of these complex systems."
This research was supported by the Exascale Computing Project; the DOE National Virtual Biotechnology Laboratory, with funding provided by the Coronavirus CARES Act; and the COVID-19 HPC Consortium.
Related Publication: Lorenzo Casalino, Abigail Dommer, Zied Gaieb, Emilia P. Barros, Terra Sztain, Surl-Hee Ahn, Anda Trifan, Alexander Brace, Anthony Bogetti, Heng Ma, Hyungro Lee, Matteo Turilli, Syma Khalid, Lillian Chong, Carlos Simmerling, David J. Hardy, Julio D. C. Maia, James C. Phillips, Thorsten Kurth, Abraham Stern, Lei Huang, John McCalpin, Mahidhar Tatineni, Tom Gibbs, John E. Stone, Shantenu Jha, Arvind Ramanathan, and Rommie E. Amaro. "AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics." To appear in International Journal of High Performance Computing Applications, 2020.