Investigating the Dark Matter of Life

Supercomputer-enabled metagenomic research explores ecosystems in oceans and microbes in the human esophagus

Header Image
A map of the Indian Ocean indicating where researchers from the J. Craig Venter Institute (JCVI)'s Global Ocean Sampling Expedition collected samples for metagenomic analysis.

Between August and October 2005, the Sorcerer II sailed the tropical Indian Ocean collecting samples of seawater from 17 sites in the first survey of life along the Indian Ocean transect. The voyage was part of the Global Ocean Sampling Expedition, a continuing effort by the J. Craig Venter Institute (JCVI) to dive into the microbial diversity of the oceans and provide a baseline of the organisms that live there.

A phylogenetic tree built from amino acid sequences using FastTree for the gene encoding the large subunit terminase. [Click image to see larger version and read more.]

Back at JCVI's Rockville, Maryland laboratory, researchers from the voyage extracted DNA from the microbial cell and viral particle in samples and sequenced this information using a combination of technologies. What emerged were several billion pieces of DNA representing an ecosystem that scientists know very little about.

This approach to biology is called metagenomics and it represents the next frontier of genetic and microbial ecology research. According to a 2011 study, 85 percent of the world's organisms are unnamed and unknown. This "dark matter of life" — organisms that resist culturing and traditional sequencing methods — is all around us. Though difficult to uncover, these organisms play a critical role in our lives.

"Microbes run the world. It's that simple," wrote the authors of the 2007 National Academies release The New Science of Metagenomics. "Although we can't usually see them, microbes are essential for every part of human life—indeed all life on Earth. Every process in the biosphere is touched by the seemingly endless capacity of microbes to transform the world around them."

Until now, almost all of our knowledge of microbes came from the laboratory, where microbes are raised in unnatural circumstances without ecological context. "The science of metagenomics, only a few years old, will make it possible to investigate microbes in their natural environments, the complex communities in which they normally live," the report continued. "It will bring about a transformation in biology, medicine, ecology, and biotechnology that may be as profound as that initiated by the invention of the microscope."

Functional characterization of Indian Ocean viral sequences from the viral and larger fractions of metagenomic data in the context of KEGG pathways. [Click image to see larger version and read more.]

Like trying to solve a jigsaw puzzle without the picture on the box and many missing pieces, the researchers set about reconstructing the larger biome they had sampled in the ocean.

Reporting in the October 2012 edition of PLOS One, researchers from JCVI described the method by which they analyzed the samples to determine the bacterial and viral diversity of the Indian Ocean and the relationships among organisms.  It was the first study to holistically explore the dynamics of aquatic viruses across multiple size classes and provided unprecedented insight into virus diversity, metabolic potential, and virus-host interactions in the region.

One of the authors of the PLOS One paper, Andrey Tovchigrechko, a research scientist at JCVI, developed the software that organizes and draws insights from the fragments of DNA extracted from the oceans. Running on the Ranger supercomputer at the Texas Advanced Computing Center (TACC), the researchers gleaned information about this oceanic ecosystem that would be impossible to gather without massive computational power, including biodiversity data that will be useful in tracking the impact of climate change on ocean life.

"With metagenomics, you take a sample of the environment and slice all the DNA that's present into small pieces," explained Tovchigrechko. "You don't have to culture. That's a big advantage because nobody knows how to culture most of the organisms that are out there. The downside is that you get a mix of organisms and you have to sort it out. That's what most of the bioinformatics activities related to metagenomics deal with."


"Metagenomics is a data-intensive research area. We take this culture-independent approach to studying microbial communities, and the result is that most of the complexity in making inferences has been transferred to the computational end of things. TACC's resources are incredibly useful in addressing the throughput computing requirements in this field."

- Shibu Yooseph, JCVI researcher and Associate Professor

"Metagenomics is a data-intensive research area," said fellow JCVI researcher and associate professor, Shibu Yooseph. "We take this culture-independent approach to studying microbial communities, and the result is that most of the complexity in making inferences has been transferred to the computational end of things. TACC's resources are incredibly useful in addressing the throughput computing requirements in this field."

Trailblazing science needs innovative computer programmers to create the tools that can turn an ocean of genomic information into useful information. For the metagenomic study of the Indian Ocean, Tovchigrechko and his colleagues used MGTAXA, one of the tools Tovchigrechko created, to ascertain the relationship between viruses and bacteria in the Indian Ocean study.

Many viruses insert themselves for some time into the host genome and lie dormant, and that class of viruses tends to adopt the DNA composition of a host. "The tool was implemented with a novel method to predict what the bacterial hosts might be, based on certain viral sequences," Tovchigrechko said.

Using MGTAXA helped them develop hypotheses about the relative abundance of various microbes with more precision, and identify types of bacteria that would have been difficult to isolate otherwise.

MGTAXA and the other software tools developed at JCVI apply equally well to the microbial communities present in and on our body, the so-called microbiome.

In a study funded by the National Institutes of Health and jointly led by Zhiheng Pei (NYU) and Karen Nelson (JCVI), Tovchigrechko, Yooseph and others are applying a metagenomic approach to the human esophagus and the microbial imbalances there that may play a role in certain kinds of gastric acid reflux and esophageal adenocarcinoma, a form of cancer.

Methodology for Water Sampling: Scientists take a 200 to 400 liter seawater sample approximately every 200 miles as the vessel sails. [Click image to see larger version and read more.]

Their recent study analyzed the esophageal environment of more than 50 subjects, representing both healthy individuals and those at various stages of the disease. Though not enough is known about the dynamics of disease and the microbiome, evidence suggests a relationship between the two.

Studies like these, driven by creative algorithms and powered by supercomputers, provide evidence, promote the creation of new treatment options, and help develop the methods and workflows required for further analysis.

"A lot of this is our initial effort to understand the microbiome," Yooseph said, "but ultimately, we want to be at a stage where we can identify biomarkers, either organisms or gene products, that could be the triggers for some of these diseases, or use those markers as diagnostics to identify the disease status."

From single genes to the human genome, and from discrete organisms to ecological niches, genetics continues to evolve as a field, providing new useful information about the broader world around us.

"There's so much uncharacterized diversity in the microbial realm that we're still trying to understand using metagenomics," Yooseph said. "From human health to bioenergy alternatives, understanding these microbial systems would have an immediate impact on society."

Aaron Dubrow, Science and Technology Writer
July 18, 2013


The Texas Advanced Computing Center (TACC) at The University of Texas at Austin is one of the leading centers of computational excellence in the United States. The center's mission is to enable discoveries that advance science and society through the application of advanced computing technologies. To fulfill this mission, TACC identifies, evaluates, deploys, and supports powerful computing, visualization, and storage systems and software. TACC's staff experts help researchers and educators use these technologies effectively, and conduct research and development to make these technologies more powerful, more reliable, and easier to use. TACC staff also help encourage, educate, and train the next generation of researchers, empowering them to make discoveries that change the world.

  • Microbes play a critical role in our environment, but much about them remains unknown. The science of metagenomics makes it possible to investigate microbes in their natural environments and in the complex communities in which they normally live, but requires massive computing power.
  • Researchers from the J. Craig Venter Institute used the Ranger supercomputer at TACC to determine the bacterial and viral diversity of the Indian Ocean as part of the Global Ocean Sampling Expedition. They reported their results in PLOS One in October 2012.
  • They are now applying a metagenomic approach to the human esophagus and the microbial imbalances there that may play a role in certain kinds of gastric acid reflux and esophageal cancer.

Aaron Dubrow
Science and Technology Writer
aarondubrow@tacc.utexas.edu