Developing Scientific Computing Communities
Researchers present experiences from ENZO, CACTUS, and iPlant API development efforts
This image shows a volume rendering of the gravitational radiation generated during the merger of a binary black hole system. The red spheres represent the horizons of the black holes. The colour map shows positive wave amplitudes in yellow/red, negative amplitudes in green/blue. [Image courtesy of W. Benger, LSU.] |
Decades of scholarships and billions of dollars have gone into the development of community software codes that are crucial not only to science, but to our everyday lives and future.
The General Circulation Model, used by the Intergovernmental Panel on Climate Change to model our future environment, and the Weather Research and Forecasting model, which helps predict extreme weather, are two key examples. Others, like CHARMM and NAMD, are used by researchers and pharmaceutical companies to find drug leads and to better understand disease.
Nearly every field of science has a community code (or several) that satisfies a large percentage of the discipline's scientific needs. Great minds — and thousands of hours of PhD and post-doc labor — have gone into the creation of these codes. However, as new technologies emerge that are capable of delivering millions of times the power as previous systems, it is often necessary to rethink and rewrite these existing community codes, which is no small feat.
What to do about community codes has been an open question in the scientific computing community for many years. The problem is described in the final report of the National Science Foundation's Task Force on Software for Science and Engineering published in March 2011.
"All software must evolve to keep up with changes in systems, usage, and to include new algorithms and techniques," the authors wrote. "The scientific community has an interest in ensuring that the software it needs will continue to be available, efficient, and employ state-of-the-art technology."
Several sessions addressed this issue at the Teragrid ‘11 conference in Salt Lake City, pointing to successful examples of community, and community code, development. These talks represented technologies or methods that interact with HPC hardware and software at very different levels of the architecture; nonetheless, they represent possible paths for other scientific computing communities to follow.
Open Source Astrophysics
Brian O'Shea, assistant professor of physics and astronomy at Michigan State University, began his talk with a question: How do you transform a closed scientific computing code into a community code that can address the needs and harness the skills of a wide variety of researchers?
His talk described the evolution of the astrophysics code, Enzo, from a black box system that only a few understood or could access, to a free-for-all in which divergent strains of the code proliferated, to the current state of controlled chaos whereby several dozen developers experiment with and provide input into the code development, spurring rapid advances.
The new development workflow is "transparent to the users and easy to use," O'Shea said. "The result is that we have a very enthusiastic and involved user community. And it's sustainable."
Enzo is used by a relatively small number of scientists, yet they are among the most adept and proficient users of HPC resources. Approximately 60 Enzo users consumed 60 million computing hours on the TeraGrid in 2010, according to O'Shea, leading to many astrophysical discoveries, including a better understanding of cosmic reionization.
An API to Feed the World
World governments and private industry are investing trillions of dollars in the collection of data relating to plants in the hopes of continuing to feed the growing population on Earth. To date, however, these data collections have been scattered and difficult to connect.
To address this issue, the National Science Foundation funded a 5-year, $50 million dollar effort called "iPlant" to develop new tools, networks, and cyberinfrastructure that can connect plant biologists and bring their data together to spur insights and innovations.
Software developer Rion Dooley from the Texas Advanced Computing Center described the creation of a common application programming interface (API) for iPlant that allows researchers with little programming experience to add common functionality to their plant biology projects.
iPlant is a community of researchers, educators, and students working to enrich all plant sciences through the development of cyberinfrastructure that are essential components of modern biology. |
APIs are a particular set of rules and specifications that software programs use to communicate with each other. They serve as an interface between different software programs and facilitate their interaction, similar to the way user interfaces facilitate interaction between humans and computers.
Modeled after popular social and industry APIs like Yelp or PayPal, the tools are intuitive, easy to use, and scalable on the very large high-performance computing systems of the TeraGrid—now officially called the Extreme Digital Environment for Science and Engineering (XSEDE). Among the most important API capabilities in iPlant are tools that allow any user to translate and integrate data in different file formats, allowing for far greater collaboration.
"The API serves as a Rosetta stone for our users," Dooley said. "It gives them a way to collaborate with any other user without having to be fluent in every piece software used in the plant bio community. And that's really the goal: to keep scientists focused on science rather than semantics."
Modular Software for Community Growth
A third example of community code development was featured in a full day tutorial at the conference centered on the Cactus computational framework, an open source problem-solving environment for scientists and engineers. Its modular structure enables parallel computation across different architectures and collaborative code development between different groups.
Cactus originated in the academic research community, where it was developed and used over many years by a large international collaboration of physicists and computational scientists. Applications, developed on standard workstations or laptops, can seamlessly run on clusters or supercomputers.
The Cactus user community has created and maintained toolkits for several research fields. The Einstein Toolkit (described at length by Ed Seidel in his keynote talk is a powerful example of Cactus' capabilities. The Toolkit consists of an open set of more than 100 Cactus "thorns," or application modules, useful for computational relativity, along with associated tools for simulation management and visualization. The code has undergone tremendous growth by virtue of the development model in the last several years.
"Our aim is to provide the core computational tools than can enable new science, broaden our community, facilitate interdisciplinary research and take advantage of emerging petascale computers and advanced cyberinfrastructure," Allen said.
Whether through the controlled chaos of the Enzo evolution, the add-on extensibility of the iPlant API, or the parallel framework offered by Cactus, successful models of community code creation and evolution are critical to the continued growth of the scientific computing community.
September 7, 2011
The Texas Advanced Computing Center (TACC) at The University of Texas at Austin is one of the leading centers of computational excellence in the United States. The center's mission is to enable discoveries that advance science and society through the application of advanced computing technologies. To fulfill this mission, TACC identifies, evaluates, deploys, and supports powerful computing, visualization, and storage systems and software. TACC's staff experts help researchers and educators use these technologies effectively, and conduct research and development to make these technologies more powerful, more reliable, and easier to use. TACC staff also help encourage, educate, and train the next generation of researchers, empowering them to make discoveries that change the world.
- Nearly every field of science has a community code (or several) that satisfies a large percentage of the discipline's scientific needs.
- Several sessions at the Teragrid ‘11 described successful examples of community, and community code, development, including ENZO, Cactus, and the iPlant API.
- These successful models of community code creation and evolution are critical to the continued growth of the scientific computing community.
- Cyberinfrastructure for Plant Biologists
- Record-setting simulations on Ranger reconstruct the Reionization Era
Aaron Dubrow
Science and Technology Writer
aarondubrow@tacc.utexas.edu

