Wrangler Supercomputer Speeds through Big Data

Data-intensive supercomputer brings new users to high performance computing for science

Published on March 10, 2016 by Jorge Salazar

Evolution of Monogamy

UT Austin biologist Rebecca Young traces the genes behind monogamous behavior using Wrangler supercomputer

One of the mysteries of monogamy to scientists is whether different species share regulatory genes that can be traced back to a common ancestor.

Researchers at the Hofmann Lab of UT Austin are using the Wrangler data-intensive supercomputer to find orthologs — genes common to different species. They'll search for them in each of the major lineages of vertebrates — mammals, birds, reptiles, amphibians and fishes.

Rebecca Young

Rebecca Young, Department of Integrative Biology and at the Center for Computational Biology and Bioinformatics, UT Austin.

What we want to know is, even though they've evolved independently, whether it's possible that some of the same genes are important in regulating this behavior, in particular expression of these genes in the brain while monogamous males are reproductively active," said Rebecca Young. Young is a research associate in the Department of Integrative Biology and at the Center for Computational Biology and Bioinformatics at UT Austin.

One of the difficulties of the research is that resources are limited for genomic analysis beyond the model organisms such as lab rats and fruit flies. "For those species, there are online-available databases that group genes together into orthologous groups, or these groups of gene families that are comparable across species," Young said. "When you're using nontraditional species like we are, you need to be able to do that on your own."

She and other scientists do just that with a software package called OrthoMCL. It lets scientist find orthologs, the shared genes that could be candidates for ones that regulate monogamous behavior.

The data that goes into the OrthoMCL code running on Wrangler are protein-coding sequences of RNA from the brain tissue of males of different species of vertebrates. So far Young said the monogamy project has analyzed two species of voles; two species of mice; two species of songbirds; two frogs; and two Cichlid fishes.

"When you're sequencing the genes that are expressed in a tissue using transcriptomic approaches like what we use, you're getting gene counts for most of the genes in the genome," Young said.

Young said it was an astronomically large amount of data to analyze. Across the 10 species, "we're starting on the minimum 200,000 genes that we're going to compare [sequence similarity], and compare in all pairwise fashion. These databases have to be quite huge to manage all of this data in a way that is usable by components of the [OrthoMCL] software," she said.

Monogamous vs. Non-Monogamous species

This research identifies the genes active in the brains of 10 species and asks whether these genes are the same (called orthologs) across species. OrthoMCL identifies which genes are the same (orthologous) in different species. The Wrangler data-intensive supercomputer helps manage the cumbersome databases the code produces. Credit: Rebecca Young.

Bird in nature

Some behavioral traits like monogamy might have a genetic basis, which can be traced up the evolutionary ladder.
Credit: fra298

Supercomputers like Stampede are tailored more for arithmetic 'number crunching' rather than handling the massive amount of data transfer between storage and memory that the OrthoMCL code generates.

"That's when some folks over at TACC suggested that we try and implement this on Wrangler, because this is what Wrangler is set up for," Young said. "It's set up to have this relational database, where individual computational steps can go back and talk to this database and pull out the information that it needs."

So far, the results with Wrangler have been encouraging. Young compared prior attempts with one search using online resources, which yielded only 350 genes across 10 species. "When I run OrthoMCL on Wrangler, I'm able to get almost 2,000 genes that are comparable across the species," Young said. "This is an enormous improvement from what is already available."

Young added, " We're looking to OrthoMCL to allow us to make an increasing number of comparisons across species when we're looking at these very divergent, these very ancient species separated by 450 million years of evolution."

Continue reading Wrangler Special Report >>