Research and Development Header

Data Management & Collections Group (DMC)

Overview

Established in 2008, the DMC group works to meet the needs of faculty and researchers for data collection services, and to contribute to the potential of data-driven research to make discoveries. The group builds and maintains large data-management and storage resources and consults with collections' creators in all aspects of the data lifecycle, from creation to long-term preservation and access. The DMC group actively seeks out research and grant proposal collaborations with researchers and institutions with collections of interest.

DMC Group consulting services and TACC Storage resources of up to 5TB of data are available to researchers at The University of Texas at Austin free of charge. Researchers with collections larger than 5TB and those at other institutions are encouraged to contact us to discuss their needs.

Team Members and Interests

  • Chris Jordan - Large-scale data architecture, Preservation and access policies
  • David Walling - Software development, data management and analysis systems
  • Tomislav Urban - Database and GIS applications development
  • Maria Esteva - Life-cycle digital collections management and preservation, metadata development

Services

As part of our mission, the DMC group collaborates with researchers in the following areas:

  • Data backup
  • Developing retention policies
  • Consulting on rights, licensing, privacy/confidentiality issues regarding data access
  • Record-keeping consulting
  • Metadata consulting
  • Geo-referencing Information System (GIS) development
  • Database development
  • Long term preservation planning
  • Design of data pipelines
  • Provenance documentation
  • Code and software preservation if needed for a particular data set
  • Training in data management and preservation

Resources

TACC maintains and makes available to researchers the following data resources:

  • Corral
  • Ranch
  • iRODS Data Management and Replication
    • The iRODS data management and replication service enables researchers to store, retrieve, and manage data across multiple resources including Corral and Ranch. In addition, geographically-remote archive resources are made available to collections with high data reliability needs. In all, up to 12 Petabytes of storage can be accessed using the iRODS system at TACC. The iRODS software has extensive metadata creation and search features, and can be accessed using special client tools as well as the WebDAV protocol. In addition, web access using any browser can be enabled for data collections with a public access requirement.

Partners

The following list highlights several of the DMC group's research partners illustrating the types of services that we provide:

Institution for Classical Archaeology (ICA)
Assessed the collection. Documented the site-to-archive data workflow and developed a record-keeping structure to classify data objects as they are generated and to capture their relationships to other data objects generated throughout the research process. Built a metadata mapping between the record-keeping structure and standard metadata schemas. Developed extensions to iRods to automatically extract metadata from the record-keeping system. The metadata as an xml document is preserved alongside the data objects upon ingestion to the long-term archive on Corral. This allows to search for individual data objects and its relationships

Plant Resources Center
The Plant Resources Center (PRC) of The University of Texas at Austin comprises The University of Texas (TEX) and Lundell (LL) herbaria. On-line access to the ca. 1,000,000 specimens in the PRC is a critical component of its scientific mission and utility. At present, data for over 400,000 specimens (Texas, Mexico, and type) are on-line, as well as camera-generated digital images of ca. 6500 types. However, the data are divided among four separate databases with different structures and separate on-line interfaces. TACC is integrating all specimen data using a single platform and software, which in turn will allow the development of more powerful on-line user interfaces, more efficient database workflow, and improved database structure for collaborative projects with other herbaria. High quality scans of all 7500 types, resulting from a Latin American Plants Initiative (LAPI) project supported by the Mellon Foundation, will be integrated. TACC will provide permanent archiving of all data and image files.

Texas Natural Sciences Center
The DMC group maintains multiple MySQL databases for use with the Specify collections-management software, and efforts are in progress to proviThe following websites are all hosted on the Corral data applications facility:

Back to Top