News Header

TACC to Deploy 20 Petabyte Global File System to Support Data Driven Science

AUSTIN, Texas — The Texas Advanced Computing Center (TACC) at The University of Texas at Austin today announced it is expanding its ecosystem of hardware resources to further support data driven science. In September, the center will deploy a DataDirect Networks (DDN) high-performance, scalable global file system (GFS) that will be accessible to all of TACC's computing and visualization systems and easily expandable in the coming years.

Data driven science is emerging alongside modeling and simulation as another important computational methodology that uses high-end computing and storage systems. In this mode, vast amounts of digital data, collected by digital instruments such as gene sequencers, electron microscopes, satellite-based imagers and distributed sensor networks, can be mined for scientific insights.

"The rate of data growth during the last decade has been exponential," said Tommy Minyard, TACC's director of Advanced Computing Systems. "This new global file system will offer a very large storage pool for thousands of researchers to store persistent data and make it easier for them to perform their scientific research. They will be able to access data from any production system at TACC and use it for later analysis, further processing, or to continue to run their applications."

"For nearly 15 years, DDN has been delivering leading-edge storage solutions for compute-intensive environments that have helped universities around the world redefine the boundaries of science and research," said Jean-Luc Chatelain, chief technology officer at DDN. "Today, we are proud to build on our long-standing partnership with TACC, a leader in academic research. With DDN scale-out file system technologies, TACC will be able to achieve even higher levels of performance and scale in one of the first petascale systems, helping thousands of researchers realize unparalleled advances in support of data driven science."

The new GFS will provide more than 20 petabytes of storage capacity for scientific data, and will allow scientists to access their data rapidly through its aggregate bandwidth of greater than 100 gigabytes per second.  By providing a massive storage capability with high-performance access from all of TACC's production systems, the system will further diversify the resources available to researchers and eliminate the need for researchers to manage data across multiple systems. This will balance the storage available to all TACC systems and make migration between systems and upgrades to new ones more transparent.

The GFS is made possible by private funding from the O'Donnell Foundation. In 2012, the Dallas-based foundation committed $10 million to TACC to advance data driven science. This system and other data infrastructure to be announced in the coming months will sustain and broaden the university's leadership in advanced computing and computational science.

"This storage system will be the foundation of our infrastructure, allowing researchers using all TACC systems to store and access massive amounts of data effectively," said Jay Boisseau, director of TACC. "It is the first of multiple new technologies we're deploying to enable world-class, data driven science, with more to come later this year."

The GFS will be an ideal complement to the existing suite of TACC resources for data intensive computing, including the Corral repository for data collections, the Ranch tape archive library, and Stampede, the newest petascale supercomputer to join the national open science community. These systems will provide for computational needs such as data analysis, shared data management, and long-term archiving of big data. The GFS will also provide for online storage of large datasets used on Stampede, which could later be migrated to Corral or Ranch for further data management or archiving.

Data driven methods are changing nearly every field of science from biology to materials design to astronomy allowing researchers to ask different questions and to use existing data in new ways. When completed, these new systems will benefit research conducted in Texas and nationwide.

"Managing big data is definitely a challenging problem and requires different hardware to meet the varying science demands," Minyard said. "At TACC, we provide an ecosystem of hardware resources and services so that we can satisfy the wide range of research needs."

###

Date Posted: 2013-06-14       Faith Singer-Villalobos

Share |