Data management includes a variety of tasks, viz., data transfer, data integrity check, metadata extraction, and data preservation. At many organizations, despite the rapid growth in the size and complexity of the datasets, such data management tasks are still being conducted on desktop computers and single-node servers. The hardware and software limitations of these resources make it difficult to conduct routine data management activities efficiently for large datasets. Therefore, it is imperative to leverage High Performance Computing (HPC) or High Throughput Computing (HTC) resources along with massive storage resources for timely processing and management of large datasets. Even though such resources are available to data curators and data managers through a National CyberInfrastructure (NCI) like Extreme Science and Engineering Discovery Environment (XSEDE) without entailing any direct cost, the learning curve associated with leveraging such remote supercomputing resources poses a significant adoption barrier. The learning curve and other barriers that the data curators and data managers face in using the NCI for Big Data management activities motivated this hands-on workshop.

This workshop was previously offered at the 2014 IEEE International Conference on Big Data and was very well-received. Further details on the last year's workshop are available at the following link:

Target Audience: This workshop will be relevant to data curators, data managers, and archivists from various domains like archaeology, microbiology, earth sciences, space research, humanities and next generation sequencing. The workshop will also be relevant to librarians who are in charge of long-term preservation and access of data. The audience will not be required to have prior knowledge about HPC/HTC and will be provided accounts for accessing TACC resources during the workshop.

