Top NSF petascale supercomputer and expert staff accelerate discoveries for nation's scientists

Published on May 27, 2014 by Aaron Dubrow

The image above is a map of links between the genes of the mustard plant Arabidopsis thaliana. Picture DNA on Facebook. "It's not unlike a social network," according to biologist Seung Yon Rhee. Credits: Insuk Lee, Michael Ahn, Edward Marcotte, Seung Yon Rhee, Carnegie Institution for Science

"Bottom-Up" Proteomics

Tandem protein mass spectrometry is one of the most widely used high-throughput, data-intensive methods in proteomics, the large-scale study of proteins (particularly their structures and functions). Researchers in the Marcotte Lab at The University of Texas at Austin are using Stampede to develop and test a number of computational algorithms that allow them to more accurately and efficiently interpret proteomics mass spectrometry data.

The researchers are midway through a project that focuses on computational analyses of the largest animal proteomics dataset ever collected (data equivalent to roughly half of all currently existing shotgun proteomics data in the public domain). These samples span protein extracts from a wide variety of tissues and cell types sampled across the animal tree of life. The analyses consume considerable computational resources and require the use of Stampede large memory "fat" nodes. They help the group reconstruct the 'wiring diagrams' of cells by learning how all of the proteins encoded by a genome are associated into functional pathways, systems, and networks. Such models let scientists better define the functions of genes, and link genes to traits and diseases.

"Researchers would usually analyze these sorts of datasets one at a time," Edward Marcotte said. "TACC let us scale this to thousands."

Marcotte's work was featured in the New York Times in August 2012.

Back to overview >>