Seeing is Believing

Extreme Digital visualization and data analysis resources help researchers derive insights from massive data sets

Longhorn
Longhorn, 256-Node Dell Visualization Cluster. View the Longhorn Visualization Slideshow.

Isaac Asimov, the American science fiction and popular science writer, famously said, "The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I found it!) but 'That's funny. '"

In a world swimming in information, how does a scientist have such a revelation? How do they find a needle of insight in a growing digital haystack?

Scientific visualization is one important tool scientists use to make discoveries. The process of visualization converts data — from sensors, DNA sequencers, social networks, and massive HPC simulations and models — into images that can be perceived by the eye and explored and interpreted by the human mind.

Nautilus

The figure depicts a cross-section of the artery taken right through the vulnerable plaque with a large lipid core and a thin fibrous cap that is formed near the coronary artery bifurcation region.

This aspect of discovery has always been valuable, but as our ability to perform high-resolution 3D scans of the body or to map the universe improves, turning that data into useful information is increasingly critical.

In November 2008, the National Science Foundation (NSF) requested proposals for "TeraGrid Phase III: eXtreme Digital Resources for Science and Engineering (XD)". The grants funded the first of a new class of visualization systems: two state-of-the-art computing systems at the Texas Advanced Computing Center (TACC) and the National Institute for Computational Sciences (NICS) that together increased the visualization capability of the open science community by a factor of 25.

The NSF solicitation was motivated by an awareness that scientific instruments were producing copious amounts of data that could not be analyzed or visualized by any previous system. 

"We were seeing science at a completely different scale," said Kelly Gaither, principal investigator for the XD Vis award and Director of Visualization at TACC. "These systems address the data deluge that we saw coming down the pipe as a result of the bigger HPC systems."

TACC's Longhorn was deployed in January 2010 and has been supporting visualization, data analysis, and general computing for a year and a half. A Dell cluster with both NVIDIA GPUs and Intel quad-core CPUs on each node, Longhorn provides unprecedented capabilities, foremost among them, the ability to remotely visualize massive data sets in real-time.

This means a researcher group in Topeka, Kansas, can compute and visualize their data set on the Longhorn system in Austin, Texas, from the quiet of their offices. The researchers can move, spin, zoom, and, in some cases, animate the subject with the touch of a button.

Gaither thinks this new capability — a hands-on approach to virtual experiments — improves scientists' relationship to their data and has the potential to transform research.

"Oftentimes, researchers don't know what they're looking for. They use visualization to do debugging or to do exploratory analysis of their simulation data. In those cases, visualization is really the only way to see," Gaither said.  "It's generally recognized in the vis community that interactivity is a crucial component of being able to do that analysis."

Longhorn is billed as the "largest hardware-accelerated interactive visualization cluster in the world," and has supported these real-time interactions for users as remote as Saudi Arabia. Longhorn is also able to manage incredibly huge data sets, including (8K cubed) visualizations designed to study the instabilities in a burning Helium flame.

Nautilus

Nautilus, an SGI Altix UV system, is a remote visualization system configured with both CPU and GPU computing nodes. It contains a single, very large shared memory space that all 1024 CPU processors can directly access, and 8 GPUs for hardware-accelerated graphics.

Nautilus, an SGI Altix UV system is likewise a remote visualization system configured with both CPU and GPU computing nodes, but with a significantly different architecture. The system contains a single, very large shared memory space that all 1024 CPU processors can directly access. This large shared memory space is useful for data analysis — a growing area of research in many domains.  The system also contains 8 GPUs for hardware-accelerated graphics.

"Graph and societal network analysis. Correlation and document clustering. There are all sorts of analyses that are not amenable to a cluster type of architecture," explained Sean Ahern, principal investigator on Nautilus and visualization task leader at Oak Ridge National Laboratory (ORNL). "So we said, the only other thing we can deploy that meets all of these needs is an SMP [symmetric multiprocessing]."

The SMP architecture allows many processors to access a single memory system, multiplying the amount of memory available for on-processor computation. Nautilus is configured with four terabytes of global shared memory, which allows much larger and faster data analysis than previous systems.

"We've been able to accelerate the science that researchers are already doing, taking it from weeks to hours," Ahern said. "And we have other projects where the size of the memory means researchers can pull in entire data sets where they were never able to do so before."

Rather than proposing purely visualization systems, as have dominated in the past, these machines were built to be multipurpose, allowing interactive and batch visualization, GPGPU (general-purpose GPU-based) computing, traditional HPC computing, and new kinds of data analysis.

This composite nature allows the systems to provide improved visualization resources for the academic community, while remaining fully utilized to maximize the public investment.

Like all resources in the XSEDE infrastructure, Longhorn and Nautilus run 24 hours a day, 7 days a week, 365 days a year, with full staff support. The resources are available to U.S. researchers through an XSEDE allocation from the NSF.

Over the course of the last year and a half, 1560 scientists have used Longhorn and Nautilus, applying their unique speed and capabilities to wide-ranging science problems, while also exploring what role GPU-processing can play in science generally.

The results emerging from the systems are encouraging.

For example, some of the notable successes on Longhorn are a collaboration with the National Archives and Record Administration to develop a new visualization framework for digital archivists; visualizations of the Gulf oil spill that helped the National Oceanic and Atmospheric Administration (NOAA) and the Coast Guard locate and contain oil slicks; record-setting molecular dynamics simulations of surfactants, which are used in detergents, manufacturing, and nanotechnology; and visualizations of the Earthquake in Japan. [links to articles and publications available for each]

"With our analysis code, I get as much as 16,000x speedup on Longhorn, which has given much insight into the physics of the protein-water interface, and allows us to understand at a more fundamental level how nature designs proteins to catalyze reactions under non-extreme conditions," said David LeBard, a postdoctoral fellow in the Institute for Computational Molecular Science at Temple University and self-proclaimed "Longhorn zealot."

Simulations by LeBard and his collaborator Dmitry V. Matyushov appeared in the Journal of Physical Chemistry B. and were featured on the cover of Physical Chemistry Chemical Physics in December 2010.

Seeing is Believing

Investigation of Flux Rope Formation via Flow Turbulence. From top to bottom: LIC of magnetic field, ion velocity, and electron velocity all colored by their out of plane component. Animations show the relationship of vortices in ion and electron flow as magnetic flux ropes form.

Nautilus has seen similar successes. Researchers on the system have performed unprecedented species modeling in the Great Smoky Mountains National Park, a biodiversity hot spot; gained new insights in the role turbulence plays in fusion; and explored how human society has evolved over the last half-century using historical sources covering the planet over this period.

"Nautilus has been a critical enabling resource for the GlobalNet project in several ways," said Kalev Leetaru, Senior Research Scientist for Content Analysis at the National Center for Supercomputing Applications (NCSA). "Most visibly, the ability to instantly leverage terabytes of memory in a single system image has allowed the project for the first time to move beyond small 1 to 5 percent samples to explore the dataset as a whole, leading to numerous fundamental new discoveries simply not possible without the ability to analyze the entire dataset at once."

Together, the two systems have supported 759 projects, totaling 11.4 million computing hours (the equivalent of 1250 years on a single desktop system) in the last year and a half.

Visualization and data analysis are clearly moving into the mainstream and with the Extreme Digital visualization grants, the NSF has given a big boost to the national science community. Gaither and Ahern see this as the beginning of a new paradigm.

"Seeing the visualization and interacting with the data is probably one of the great enablers that will propel science for the next generation and beyond," Gaither said. "I think in some respects, you won't even see this intermediate thing called a data set. You will interact with the simulations itself, or, if you'd prefer, with the science."

Ahern went further.

"Data without analysis is nothing," Ahern said. "If you've run a giant simulation, you've only done half the work. The real science comes from processing that data into something that people can understand. The job of science is done in the phase of analysis, and that's purely where we live."

August 15, 2011


The Texas Advanced Computing Center (TACC) at The University of Texas at Austin is one of the leading centers of computational excellence in the United States. The center's mission is to enable discoveries that advance science and society through the application of advanced computing technologies. To fulfill this mission, TACC identifies, evaluates, deploys, and supports powerful computing, visualization, and storage systems and software. TACC's staff experts help researchers and educators use these technologies effectively, and conduct research and development to make these technologies more powerful, more reliable, and easier to use. TACC staff also help encourage, educate, and train the next generation of researchers, empowering them to make discoveries that change the world.

Share |
  • View the Longhorn Visualization Slideshow.
  • Scientific visualization is one important tool scientists use to make discoveries. The process of visualization converts data into images that can be perceived by the eye and explored and interpreted by the human mind.
  • TACC's Longhorn was deployed in January 2010 and has been supporting visualization, data analysis, and general computing for a year and a half.
  • Nautilus, an SGI Altix UV system, is a remote visualization system configured with both CPU and GPU computing nodes, but with a significantly different architecture.

Aaron Dubrow
Science and Technology Writer
aarondubrow@tacc.utexas.edu