Garden Variety Mutants
Scientists use TACC's Ranger supercomputer to analyze unusual geranium genomes
Behold the geranium: mainstay of the home garden. These colorful bundles of blooms are actually quite unique, evolving many times faster than their plant peers, according to Robert Jansen, professor of biology at The University of Texas at Austin.
"The degree of change in this group is off the charts," said Jansen. "It's a situation where you have a natural set of mutants."
Map of the plastid genome of Geranium palmatum. Click image to see larger version.
Geraniums (the Geraniaceae family) are unusual for a couple of reasons. For one, the organization of the chloroplast genome and the genes within it, are highly rearranged in comparison to other plants. Second, the rates of change for certain gene sequences, especially some functional groups of genes, are highly elevated in both the chloroplast and mitochondrial genomes. Geraniums are one of only two plant groups known to have such mutable genomes, making this garden ornamental a model species for scientific study.
Most mutations are caused by coding errors that occur during cell division, when the DNA unravels and is copied. More often than not, however, when an error occurs it is repaired quickly. Heritable mutations are fairly rare, but in the Geraniaceae family, they are common. Why is this the case?
Recently, with a grant from the National Science Foundation through the Plant Genome Research Program, some of the leading geranium scholars began applying next-generation sequencing methods to better understand why the geranium has evolved to be so radically different from other plants.
"There seems to have been repeated bursts of change," Jansen said. "It may be an on-going process, but it certainly has happened at different times in different lineages within the group, so we're taking a comparative approach."
In the coming months, the scientists will sequence genomes from dozens of geranium as well as some closely related rosids, whose evolutionary rates are normal. They will compare the genes involved in recombination and DNA repair in geraniums relative to their close relatives to identify key differences that may be causing unchecked mutations.
The geranium's rapid evolution leads to another mystery. Like most plants, Geraniaceae have genomes in three separate compartments: nucleus, mitochondrion and chloroplast. The nuclear genome is large and complex, with as many as 30,000 genes and an intricate repeating structure. It does the bulk of the work in the cell. The mitochondrial and chloroplast genomes, on the other hand, are much smaller—on the order of tens of genes—with specialized functions.
Linear maps for the four Geraniaceae plastomes relative to a representative rosid reference genome. Genes were ordered from 1 to 113 in the reference and blocks of genes were colored. The numbers above rearranged blocks of genes display gene order changes for each genome. Click image to see larger version.
It turns out that the genomes do not act independently in plants; they cooperate. In fact, several proteins made by plant cells consist of multiple sub-units, each produced by a different genome within the cell. For an organism to survive, the separate genomes would need to develop in tandem, or coevolve. If the gene that produced part A of a protein changed so that it could not bind to part B, the protein could become non-functional and the organism may die. But the geranium and its unique genetic makeup have thrived for millions of years.
To find out how this coevolution occurs, Jansen teamed with Jeff Palmer at Indiana University and Jeff Mower at the University of Nebraska to sequence all three genomes of dozens of species of geranium. The project is in year two of a five-year study. The researchers are currently gathering sequence data and assembling and analyzing it with the assistance of the Ranger supercomputer at the Texas Advanced Computing Center (TACC). They are hopeful that the results will help explain how multiple genomes coevolve and why geraniums mutate so quickly.
New Tools Bring New Challenges
The technologies that researchers use to sequence and analyze genetic data are only a few years old and the scale of the information involved is massive. Before Jansen and collaborators could start interpreting the genomic data, they needed to determine the most efficient way to gather it.
"We first went through the literature to see what everybody thought we should do and there was absolutely no consensus," Jansen said. "Many of the aspects of the sequencing and analysis hadn't even been compared."
Basic questions needed to be answered: Which sequencing platform works best for this type of problem? Which algorithm is fastest and most accurate for assembling sequences? And how much information is needed to find significant factors in the evolution of the genome?
A recent analysis by Jansen and his colleagues explored these questions and advanced the researchers' quest for the optimal experimental setup. They found that by using the Illumina HighSeq 2000 platform (a next-generation sequencer) in tandem with Trinity (a leading assembly tool), they were able to achieve the most accurate and efficient results. They also determined that roughly 40% of the sequence data was needed before they reached a plateau of useful information to assemble a complete transcriptome.
Members of the Jansen Group (from left to right): Mao-Lun Weng (PhD student); Jin Zhang (PhD student); Chris Blazier (PhD student); Tracey Ruhlman (Research Associate); Bob Jansen (Professor).
They established this percentage by taking increments of a huge amount of data — about 14 billion sequence reads — from 5% up to 100%, assembling those different increments, and using a reference genome to see how many more genes they found and how the coverage of each improved.
"We had no idea how much data we needed and the more data you have to gather the more expensive it is," Jansen said.
Supercomputers like TACC's Ranger speed up sequence analyses by breaking the process down into small chunks and distributing them to thousands of computer processors working together. In the case of Jansen's project, Ranger also acted as a test-bed for method development, allowing the researchers to compare multiple experimental approaches to find the best one.
"For each species that we're looking at, we get all of these DNA or RNA sequences and we have to assemble these short reads into a complete genome, or into complete transcriptomes. This takes lots of memory and space," Jansen said. "The bottom line in our case—we could not do it without TACC."
Identifying Genetic Differences
Above and beyond the specific evolutionary history of the geranium, the researchers are hoping their investigation will uncover basic facts about evolution. They speculate that the high levels of rate change occurring in this group might have something to do with genes that are involved in DNA repair and recombination.
"Experimental evidence demonstrates that if you mutate the recombination genes, you can generate instability in the genome," Jansen said. "We're hoping to uncover some evidence that this phenomenon is related to those classes of genes."
Understanding how plant genomes evolve, interact with each other, and coordinate functions may seem obscure, but a general model of the division of labor within plant cells and their shared genomic functions could eventually lead to practical applications.
"We use evolution for lots of purposes agriculturally. We select for certain features in crop plants to have bigger ears of corn or bigger tomatoes," Jansen said. "If you don't understand the genes that are involved in that and how they work, it's hit or miss with regard to whether you're doing the right thing."
Aaron Dubrow, Science and Technology Writer
October 3, 2012
The Texas Advanced Computing Center (TACC) at The University of Texas at Austin is one of the leading centers of computational excellence in the United States. The center's mission is to enable discoveries that advance science and society through the application of advanced computing technologies. To fulfill this mission, TACC identifies, evaluates, deploys, and supports powerful computing, visualization, and storage systems and software. TACC's staff experts help researchers and educators use these technologies effectively, and conduct research and development to make these technologies more powerful, more reliable, and easier to use. TACC staff also help encourage, educate, and train the next generation of researchers, empowering them to make discoveries that change the world.
- Geraniums are one of two plant groups known to have mutable genomes.
- The organization of the geranium's chloroplast genome and the genes within it are highly rearranged in comparison to other plants and the rates of change for certain gene sequences are highly elevated.
- Several leading geranium scholars are using Ranger to better understand why the geranium has evolved to be so radically different from other plants and what this can tell us about genomic function in general.
- Exposing the Machinery of the Resistome
- Cyberinfrastructure for Plant Biology
- GGP: Geraniaceae Genomes Project
Science and Technology Writer