XALT: Tracking Job-level Activity on Supercomputers

Purpose

XALT is a tool that allows supercomputer support staff to collect and understand job-level information about the libraries and executables that end-users access during their jobs. The tool can also work with a system's module software to provide additional information about module usage.

Overview

XALT collects data by intercepting the linker (ld) and the parallel job launcher (aprun, ibrun, mpirun, etc.). At link time, XALT learns the static and dynamic libraries that executable needs. At run time XALT determines the details of each parallel job: the name of executable along with its dependencies, the nature of the computation (e.g. total nodes, MPI tasks, duration, etc.), and the environment in which the job ran. XALT writes a record of each link-time and run-time event to a database.

XALT is a collaboration between PI Mark Fahey (University of Chicago, formerly National Institute for Computational Sciences) and co-PI Robert McLay (TACC).

Impact

XALT is used at TACC to track how our system is used. It is also used at several site all over the world.

Funding Source

NSF Award #1339690: Collaborative Research: SI2-SSE: XALT: Understanding the Software Needs of High End Computer Users

DOI

10.5281/zenodo.49772

Cite As

"User Environment Tracking and Problem Detection with XALT," K. Agrawal, M. R. Fahey, R. McLay, and D. James, In Proceedings of the First International Workshop on HPC User Support Tools, HUST '14, Nov. 2014. dx.doi.org/10.1109/HUST.2014.6.

Paper Reference

"Tales from the Trenches: Can User Support Tools Make a Difference?" D. James, R. McLay, S. Liu, R. T. Evans, W. L. Barth, A. Lamas-Linares, R. Budiardja, and M. Fahey, In Proceedings of the Second International Workshop on HPC User Support Tools, HUST '15, Nov. 2015. doi.acm.org/10.1145/2834996.2834998.

"Community Use of XALT in Its First Year in Production," R. Budiardja, M. Fahey, R. McLay, P. M. Don, B. Hadri, and D. James, In Proceedings of the Second International Workshop on HPC User Support Tools, HUST '15, Nov. 2015. doi.acm.org/10.1145/2834996.2835000.

Robert McLay

Manager, HPC Software Tools, Research Associate, High Performance Computing
mclay@tacc.utexas.edu | 512-232-8104