TACC Stats

git repo: https://github.com/TACC/tacc_stats


TACC Stats is an infrastructure for the low-overhead collection of system-wide performance data that integrates information from a variety of sources. TACC Stats provides a web-based interface for exploring jobs and system-level reports about this data as well as automated analysis and flagging of jobs that need human attention.

Histograms generated from WRF runs on Stampede. Subplots show run times, job size in cores, average cycles per instruction (CPI), and average floating point computation rate.


The TACC Stats monitor runs periodically during the execution of each job to collect a large variety of system statistics and hardware performance counter data from a variety of sources including: CPU usage, socket- level memory usage, swapping and paging statistics, system load and process statistics, system and block device counters, interprocess communications, filesystems usage (NFS, Lustre, Panasas), interconnect fabric traffic, and CPU counters and Uncore counters (e.g. counters from the Memory Controller, Cache and NUMA Coherence Agents, Power Control Unit).

Nightly analyses are available to flag underperforming and misconfigured jobs for later attention by HPC consultants. Jobs are flagged when they leave nodes idle, use the wrong network, experience a drastic drop in performance, or show evidence of low efficiency.

Its associated web interface allows for browsing all jobs associated with a cluster, identifying flagged jobs, and plotting basic job characteristics.

Funding Source(s)

National Science Foundation award ACI-1203560: "Collaborative Research: Integrated HPC Systems Usage and Performance of Resources Monitoring and Modeling (SUPReMM)"

Evans, T.; Barth, W.L.; Browne, J.C.; DeLeon, R.L.; Furlani, T.R.; Gallo, S.M.; Jones, M.D.; Patra, A.K., "Comprehensive Resource Use Monitoring for HPC Systems with TACC Stats," HPC User Support Tools (HUST), 2014 First International Workshop on , vol., no., pp.13,21, 21-21 Nov. 2014 doi: 10.1109/HUST.2014.7 [pdf]

Bill Barth

Director of High Performance Computing
bbarth@tacc.utexas.edu | 512-232-7069

Todd Evans

Research Associate
rtevans@tacc.utexas.edu | 512-475-9411

John McCalpin

Research Scientist
mccalpin@tacc.utexas.edu | 512-232-3754