The Launcher

Large-scale HTC on HPC Systems

Purpose

The launcher is a framework for running large collections of serial or multi-threaded applications, known as High Throughput Computing (HTC), as a single multi-node parallel job on batch-scheduled High Performance Computing (HPC) systems.
Many HPC systems discourage the submission of serial or single-node jobs, instead encouraging larger parallel runs. Additionally, many HPC systems either discourage or prohibit queuing hundreds of jobs at a time, in order to make these systems more available for large-scale parallel jobs. And while many batch-scheduling tools provide a throughput feature called job arrays, these are sometimes restricted in size or disabled entirely.
The launcher circumvents these problems by allowing the user to create their own miniature high throughput machine inside of a single multi-node parallel job submitted to a large-scale HPC resource. Once started, the launcher independently manages the execution of a user's application batch on the provided nodes, with extremely low overhead and almost no learning curve.

Is the Launcher Right for You?

The Launcher may be able to help you improve your throughput if:
Your program is serial or only uses threads
Execution instances of your application are independent of one another (i.e., the results of one run are not fed into another)
You need to run 10s, 100s, or 1000s of application instances

What Can the Launcher Do for My Workload?

The Launcher can be run at practically any scale, from a single node to 4,096 nodes or more, and has utilized more than 65,536 compute cores at a time. We have shown the Launcher to be effective at running very large workloads of more than 640,000 individual application instances, requiring only minutes to do what would have taken an average lab workstation weeks or months to complete.

Using the Launcher at TACC

The Launcher is available on all HPC resources at TACC, including Stampede and Lonestar.

Quick Start Steps:
  1. The Launcher is accessible through the module system by loading the launcher module:

    $module load launcher
  2. Copy the appropriate submission script (launcher.sge or launcher.slurm) to your working directory.
  3. Create a file that contains the set of application instances you want to run, one per line. Be sure there are no blank lines or trailing lines at the end of the file.
  4. Open the submission script (launcher.sge or launcher.slurm) in your working directory and set the CONTROL_FILE variable to the name of the file containing the commands you want to run (see step 3).
  5. Set the number of parallel processes you want to run. On Stampede, this is done by altering the #SBATCH –N and #SBATCH –n options.
  6. Submit your Launcher job to the scheduler. On Stampede the command is:

    $sbatch launcher.slurm

Using the Launcher on Your System

The Launcher can be run on a cluster, workgroup of computers, or on a single system. Download the source code from GitHub: https://github.com/TACC/launcher

Publications

Wilson, Lucas A. and Fonner, John M., "Launcher: A Shell-based Framework for Rapid Development of Parallel Parametric Studies," to appear in Proceedings of the Extreme Science and Engineering Discovery Environment: Engaging Communities (XSEDE14), Atlanta, GA, USA, July 2014.

Lucas A. Wilson

Research Associate
lwilson@tacc.utexas.edu | 512-232-7351