Workshop on Cyberinfrastructure and Machine Learning for Digital Libraries and Archives

In conjunction with Joint Conference on Digital Libraries 2018

June 3-6, 2018
Fort Worth, Texas

Workshop Date: June 3, 2018

Latest Update

Abstract submission has been extended to May 7.
Submission link:

If you are interested to present during the workshop, please consider submitting an abstract at your earliest convenience. This will allow organizing committee time to provide timely feedback to authors before the final version due on May 31. If you have any question and concern, please send your inquiry to Maria Esteva ( and/or Weijia Xu (

About The Workshop

Academic libraries and archives have made significant progress accommodating data in their operations by implementing data management consulting services and repositories for final and relatively small sized datasets. However, providing scalable data management support and services remains challenging, especially for large volumes of data or at large research institutions. There is an urgent need for further research and implementation of automated methods to describe, represent, preserve, and facilitate the prompt and efficient access and reuse of large-scale scholarly data. This workshop introduces a tryptic model to address these challenges.

The model above illustrates the interactions between digital libraries and archives, cyberinfrastructure, and machine learning methods and tools.

As collecting institutions aggregate more and larger digital content, the processes of curating, preserving, and making that content accessible requires automation and scalability. Cyberinfrastructure refers to shared online research environments, backed up by advanced computing resources, hosted in data centers, and supported by experts. Coupled with cyberinfrastructure, machine learning methods and tools can provide digital libraries and archives with powerful resources to enhance their ability to represent, keep, and provide persistent access to collections, thus facilitating reuse. In turn, cyberinfrastructure and the projects that make use of it benefit from adopters within libraries and archives, who provide grounding in best practices and standards for data curation, discoverability, and integrity. To explore these topics, we invite researchers and practitioners from cyberinfrastructure, digital libraries, archives, and machine learning fields as well as domain experts to share ideas, introduce the theory and research methods, and share examples of best practices. The workshop will include keynote speakers, peer-reviewed papers, and a panel discussion.

Call for Participation

We are soliciting presentations in the following research areas:

  • Best practices for using open science cyberinfrastructure for digital libraries and archives
  • Models and methods to improve large-scale data access and reuse including issues of data understandability and representation of complex datasets (e.g., derived from large-scale simulations, experiments, and observational research projects).
  • Machine learning methods using Linked data models and ontology applications for digital libraries and archives.
  • Challenges and solutions in curating datasets beyond textual content (e.g., video, volumetric images, genomics/bio data, architectural drawings, point clouds, GIS, satellite imagery, etc.).
  • Automated methods for managing and preserving scientific data collections of diverse formats.
  • Machine learning methods for improving collections accessibility and reuse.
  • Large-scale metadata generation and management for integration and interoperability of scholarly data.
  • Systems design and implementation, including data analysis/ for digital collections services in cloud computing environments.
  • Theory and models for integration of analysis and curation tasks for evolving scientific data collections using cyberinfrastructure.

Important Dates

Abstract Submission Deadline: May 7, 2018
Final Version (for presentation): May 31, 2018

Workshop Registration

Workshop attendees should register the workshop through the JCDL'18 conference registration system:

ACM/IEE Members: $105 (by May 3) / $155
Non-members: $155 (by May 3) / $175
Students: $25 (by May 3) / $75

Submission Instructions

We accept extended abstracts of a minimum of 2 pages. Abstracts should be submitted as PDF's in the standard ACM conference format available here:

After the workshop, presenters will have the opportunity to revise their submissions based on the feedback they received and published in one of the following:

As a reference, the latest version of the bulletin can be accessed here:

Submission link:

Workshop Program Committee

Maria Esteva, Texas Advanced Computing Center

Weijia Xu, Texas Advanced Computing Center

Jessica Trelogan, University of Texas Libraries

Ashley Adair, University of Texas Libraries

Richard Marciano, University of Maryland

Mark Hedges, Kings College

Dan Wu, WuHan University


Workshop Objectives
  • Bring together data curators, librarians and archivists, researchers, and computational scientists to share practical experiences at the intersection of digital libraries, machine learning, and cyberinfrastructure.
  • Promote the usage of cyberinfrastructure within the digital library and archives community.
  • Advocate adoption of best data curation practices within the computational research and cyberinfrastructure communities.
  • Discuss future opportunities and forge a dedicated community.