TACC, UT Austin, University of Hawaii start $3.9 million NSF awards for Tapis platform development
Scientists looking to reduce their complexity to research and add a new computational tool to their tool belt can explore the Tapis Project. The Tapis software platform aims to help researchers more easily leverage powerful supercomputers and integrate and manage data from different and distant sources.
The National Science Foundation (NSF) awarded a $2.9 million grant to the Texas Advanced Computing Center (TACC) and The University of Texas at Austin (UT Austin), in addition to a $1 million award to the University of Hawaii (UH). The NSF awards started September of 2019 and supports continued development of Tapis, short for TACC-APIs and plays off of the word tapestry — weaving together services and capabilities. An application programming interface (API) is an interface to a software system that has been built or engineered for another program to use.
"The easiest way to describe Tapis is that it's a web-based application that provides all the tools a modern scientist needs to do data-intensive, computationally-intensive research," said Co-PI Gwen A. Jacobs, Director of Cyberinfrastructure, University of Hawai'i System. "One of the things that's different about Tapis is that it weaves together all the important tools that the researcher needs. That's the real power of Tapis."
Tapis will serve a diverse group of users with varying expertise in using computational tools for their research. On one end of the spectrum will be ‘power users' with extensive experience of advanced computing resources and programming. Tapis will help them automate and streamline their large workflows or pipelines of software applications.
On the opposite end of the spectrum are scientists just beginning to tap into the possibilities of applying advanced computing to their research. "What we're trying to do for them with Tapis," said Stubbs, "is have the easiest road to entry on running computational programs on the supercomputers."
And then there's the group in the middle, typically large software development projects focused on specific research domains, such as immunology, astronomy, or bioinformatics.
"The goal with Tapis is to enable researchers to access these computational resources in a more user-friendly way," said Stubbs.
The NSF-funded computational resources are broadly described as cyberinfrastructure, the online ecosystem shared by researchers, backed up by advanced computing resources, hosted in data centers, and supported by experts. "Web developer teams and other developers on those cyberinfrastructure projects can leverage Tapis to build their cyberinfrastructure project more quickly."
"Event-driven computing," explained Jacobs, "means that the workflow isn't running all the time. That's a great feature for scientists who have to acquire their data sporadically, where they're getting data from sources such as sensors and data uploads. This means that they don't have to run all the code manually. Once the workflow is set up, it can be hands-free computing, in a way, hands-free analysis."
Tapis will integrate the Cloud-Hosted Real-time Data Services for the Geosciences (CHORDS) project, part of the NSF-funded EarthCube, to achieve event-driven computing.
The APIs applied to science allow different systems to talk to each other, in a sense. "The idea with Tapis," said Stubbs, "is to have a machine-readable and consumable interface to computational resources, like supercomputers, but also high performance storage systems, like our Corral storage system, or our global file system, Stockyard, and other filesystems across the country. We want to have an interface that is easily accessed and manipulated in other programs."
Another feature Tapis will offer is a new security kernel, which acts like a gate that controls access to system resources. The Tapis security kernel will be decentralized, allowing scientists to more easily stand up their own applications and retain local control over confidential data.
"The new security kernel allows us to offer all the managed security, authentication, and authorizations that have been done in the past," said Co-PI Sean Cleveland, a cyberinfrastructure research scientist at the University of Hawaii. "But It will also allow data centers and institutions to deploy their own security kernel, so they can use their own user credentials and manage their own security in their own way, as well as deploy individual components of the framework at their institution, and be able to leverage some of the centralized work. It's a new, hybrid system of using the science-as-a-service, platform-as-a-service, but if you want more control and customization, you can deploy smaller pieces on site and still be able to leverage some of the larger, managed components for different needs."
Tapis will give users the ability to simplify the process of creating applications, a powerful tool for scientists. "If you can program a workflow and have that workflow run in a platform like Tapis, that makes the process easier because all of the components can talk to each other more easily," said Jacobs. "That means that the investigator has to construct that workflow once. Then they save that workflow as an application within the Tapis infrastructure and reuse it."
"This really is a complete collaboration between TACC and the University of Hawaii," explained Stubbs.
TACC brings extensive expertise in high performance computing and in building distributed software systems. The components of Tapis themselves can run on commodity, or off-the-shelf servers, although some components at TACC will run on the NSF-funded Jetstream cloud.
Team members at UH are contributing to the development, design, and architecture of the Tapis system. What's more, they bring access to an abundance of important domain research unique to Hawaii in areas such as climate, ocean, coral reefs, human microbiome, and population studies around health disparities.
"Having the Tapis project for us here in Hawaii is a huge awareness boost for applying advanced cyberinfrastructure to data intensive science," said Jacobs. "Without a project like this, many of our investigators might not be aware of these resources."
One of the major milestones the investigators are working toward is an end-of-year workshop for early adopters in the summer of 2020. "The idea is to have the workshop where we invite the researchers to come, bring their data sets, to give presentations on their science and use case, but also for the Tapis team to present on the capabilities of the system by the end of year one," said Stubbs.
The Tapis project is funded as part of the Cyberinfrastructure for Sustained Innovation (CSSI), a crosscutting NSF program lead by the Office of Advanced Cyberinfrastructure (OAC). "CSSI supports the development of innovative cyberinfrastructure that enables communities of researchers to continue and accelerate advances in all fundamental science and engineering domains supported by NSF," said Dr. Stefan Robila, the Program Director in OAC that manages the award. "By building on prior work and leveraging existing leadership computational resources such as those available at TACC, Tapis contributes to continuous strengthening of the national cyberinfrastructure, while at the same time lowering the barriers in accessing it."