Modeling without Models

TACC, DARPA work together to accelerate design and discovery where predictive models don't yet exist

If you look at many of the industrial products that we build today — airplanes, microchips, steel beams — there are mathematical models of the underlying physics that determine how they will behave.

Engineering starts with those robust mathematical models which allow inventors to predict the properties of a process or a product. For example, knowledge of how a material conducts electricity or bends lets engineers use computer-aided design to improve these systems predictably. The combination of theory, experiment, and simulation has created fantastic increases in design efficiency over the years.

However, there are some scientific domains where good, predictive models are still unavailable. Biology and chemistry are two such domains. Researchers would seize the opportunity to engineer new drugs or solar technologies that work more effectively if only they had mathematical formulas to represent how the human body or the quantum world worked.

TACC and the Defense Advanced Research Projects Agency (DARPA) are working to solve this problem.

A little over a year ago, TACC and DARPA began developing data-driven ways to accelerate design and discovery in research areas where predictive models don't yet exist as part of the Synergistic Discovery and Design (SD2) program.

"TACC is making an important contribution toward the creation of a whole new world of designed proteins to address current day challenges."

— David Baker, University of Washington

The program is building computational tools that use experimental data to develop hypotheses and new designs in areas such as synthetic biology, neuro-computation, and polymer chemistry. The team is using artificial intelligence to consider a universe of hypotheses; converge on the ones that fit the data; design experiments to test those hypotheses; then evaluate outcomes and feed that back into the training process.

"Humans learn this way," said Matt Vaughn, director of Life Sciences Computing at TACC. "But machines can consider a much wider number of variables and examples."

SD2 employs TACC's powerful computing and data ecosystem and takes advantage of software and services that enable collaboration, sharing, and reproducible science tailored to the evolving needs of the fouryear program.

"This approach has served TACC and its partners well," Vaughn said.

One of the SD2E partners is Bree Cummins, a mathematics professor at Montana State University, who uses TACC's environment for code collaboration, data sharing, and data analysis on a regular basis.

"TACC's ecosystem is backed by a helpful and responsive staff that provides timely support and feature enhancements," Cummins said. "These resources have enabled me to participate in successful group projects with researchers scattered across the country."

David Baker, another SD2 participant and a professor of Biochemistry at the University of Washington, works to create new proteins.

"TACC is making an important contribution to the creation of a whole new world of designed proteins to address current day challenges," he said.

TACC supports other large research communities through projects like CyVerse, which serves life scientists, and DesignSafe, which empowers the natural hazards engineering community. The center also supports hundreds of long-term data and archival curation projects that make use of TACC resources.

"Researchers shouldn't feel constrained by their environment," Vaughn said. "Whether they need graphical interfaces, APIs, or batch data analytics, we provide it and make sure they can use it."

By the end of the project, real and tangible outcomes are possible. These include complex biological circuits that can sense and respond to target molecules; therapeutic proteins that are more resilient to their surrounding environment; and more efficient, flexible solar cells.

"TACC is excited to provide DARPA with a diverse computational infrastructure, including HPC, GPU, and cloud technologies, coupled with powerful data-sharing mechanisms that democratize progress across domains of science by making it much easier for people to reproduce one another's work and share compelling results," Vaughn said.

"Ultimately, these capabilities accelerate the pace of turning research into useful applications for society."