XALT: Tracking Job-level Activity on Supercomputers

Purpose

XALT is a tool that allows supercomputer support staff to collect and understand job-level information about the libraries and executables that end-users access during their jobs. The tool can also work with a system's module software to provide additional information about module usage. XALT is a collaboration between PI Mark Fahey (University of Chicago, formerly National Institute for Computational Sciences) and co-PI Robert McLay (TACC).

Overview

XALT collects data by intercepting the linker (ld) and the parallel job launcher (aprun, ibrun, mpirun, etc.). At link time, XALT learns the static and dynamic libraries that executable needs. At run time XALT determines the details of each parallel job: the name of executable along with its dependencies, the nature of the computation (e.g. total nodes, MPI tasks, duration, etc.), and the environment in which the job ran. XALT writes a record of each link-time and run-time event to a database.

Data

The dataset produced by XALT contains information on the number of nodes, libraries, field of science, and executables used by each user running a given computational job on Stampede. As part of the publication process, personal identification information is sanitized prior to publication, but all jobs may be related to a particular user through an anonymous user id. Current proposed uses of the dataset include tracking library usage over time and how executables This dataset will continue to grow past the initial data of publication. TACC started releasing data in September of 2015.

Data collected as part of the XALT project may be found at the following links:

Commerical Support

Ellexus is providing commercial support for XALT. Please see www.ellexus.com/products/xalt-tracking-job-level-activity-supercomputers for more details.

Funding Source

National Science Foundation award 1339690 (Collaborative Research: SI2-SSE: XALT: Understanding the Software Needs of High End Computer Users)

Download

Download XALT from github or sourceforge.

Robert McLay

Manager, HPC Software Tools and Research Associate
mclay@tacc.utexas.edu | 512-232-8104