Every science today benefits from rapid data acquisition systems, but the resulting datasets are so large (often measured in terabytes) that they are expensive to store and manipulate. How can we get from data to knowledge? The Data Information Systems (DIS) group develops solutions for storing, managing, and extracting information from these increasingly large datasets. We identify and extend emerging data and information technologies that can give computational scientists fast and reliable access to large amounts of that data with simple and standardized operations. We also manage database applications for TACC's IT infrastructure. There are many new methods for cataloguing datasets, sharing and querying datasets and catalogues, and managing datasets across heterogeneous storage systems that may be in different physical locations. We aim to optimize usage and throughput of the underlying storage infrastructure. Evolving data management issues include restructuring datasets through relational database technologies and building data grid infrastructures. The goal is to make it easy to understand what data exists and where it resides, search the data for answers to specific questions, and discover interesting new data elements and patterns.
|