Batch Systems

To help users migrate from one batch system to another, basic resource specifications and batch options are compared and listed side-by-side for the LoadLeveler, LSF, and OpenPBS batch utilities.

Table 1 compares the resource syntax of each batch system for the most commonly employed user specifications. Examples of these specifications include the total number of nodes and tasks per node, the wall clock time, and the peak memory usage per task. Additional specifications include the delineation of a specific "class" or "queue" to designate a relative priority in advancing through the queue structure and an email request for notifying the user at the beginning or end of a given job execution.

Table 2 provides a list of important environment variables under each batch system, and Table 3 compares the relevant resource management commands to submit, monitor, and cancel queued jobs. As a final comparison between the batch systems, example submission scripts are provided for each of the three batch systems to request comparable resources and run a given parallel executable named mpihello.

Utility LoadLeveler (LL) PBS LSF
Resource Sentinel # @ #PBS #BSUB
Nodes/Processors node = <#>
tasks_per_node = <#>
-l nodes=<#>:ppn=<#>
(ppn = proc. per node)
-n <#>
Wall Clock Limit wall_clock_limit =[dd:]hh:mm:ss -l walltime
   =hh:mm:ss
-W hh:mm
Queue Class = <queue> -q <queue> -q <queue>
email notification =always|error|start|never|complete -me -B sends mail when job begins execution
-N sends job report by mail when job finishes
email address notify_user=<email> -M <email_address> -u <email_address>
Initial Directory initialdir=<directory> (default = $HOME) (default = job
submission directory)
Job Name job_name=<name> -N <name> -J <name>
STDERR & STDOUT to same file output = <file>
error = $(output)
-j oe (use -o without -e)
Project to charge account_no=<project>   -P <project>

  LoadLeveler PBS LSF
Processor List $LOADL_PROCESSOR_LIST cat -n $PBS_NODEFILE $LSB_HOSTS
Directory of Submission $LOADL_STEP_INITDIR $PBS_O_WORKDIR $LS_SUBCWD
Job id $LOADL_STEP_ID $PBS_JOBID $LSB_JOBI

  LoadLeveler PBS LSF
Submission llsubmit qsub bsub
Deletion llcancel qdel bkill
Status llq qstat bjobs
Queue List llclass qstat -Q bqueues -l
GUI Monitor xloadl xpbsmon
HR100 Example Batch Scripts

Below are job scripts for each batch system. All scripts specify the same resources and run the same parallel executable:

PBS example job scripts:

       #!/bin/csh
       #PBS -l nodes=8:ppn=2
       #PBS -l walltime=6:00:00
       #PBS -q normal
       #PBS -N hello
       #PBS -j oe
       #PBS -me -M somebody@tacc.utexas.edu

       echo "Master Host: $PBS_O_HOST"
       echo "Nodes:"; cat -n $PBS_NODEFILE; echo ""
       echo "-----------------------------------------------"

       cd $PBS_O_WORKDIR
       mpirun -np 16 ./mpihello

The environment variables PBS_O_HOST, PBS_NODEFILE, and PBS_O_WORKDIR contain the master host, list of assigned compute nodes, and the directory of submission, respectively. Mpirun is used to launch the parallel applications on 16 processors (-np argument).

LSF example job scripts:

       #!/bin/csh
       #BSUB -n 16
       #BSUB -W 6:00
       #BSUB -q normal
       #BSUB -J hello
       #BSUB -o out.o%J
       #BSUB -u somebody@tacc.utexas.edu

       echo "Master Host: `hostname` "
       echo "Node   List: $LSB_HOSTS "

       cd $LS_SUBCWD
       pam -g 1 gmmpirun_wrapper ./mpihello

The "%J" expression is evaluated as the job name by the LSF interpreter. The environment variables LSB_HOSTS and LS_SUBCWD contain the list of assigned compute nodes and the directory of submission, respectively. Pam (Parallel Application Manager) is used to launch the parallel applications on 16 processors (-n 16 specifies the number of processors launched). Note that the -g 1 gmmpirun_wrapper options are required on the Linux clusters to use the Myrinet interconnect.

LoadLeveler example job scripts:

       #!/usr/bin/csh
       #
       # @ environment = COPY_ALL;MP_EUILIB=us;MP_INTRDELAY=100;
                         XLSMPOPTS=parthds=1;SPINLOOPTIME=10000;
                         YIELDLOOPTIME=10000;MP_CPU_USE=multiple;
                         MP_SHARED_MEMORY=yes;MP_INTRDELAY=100

       # @             node = 4
       # @   tasks_per_node = 4
       # @        resources = ConsumableCpus(1) ConsumableMemory(1800MB)

       # @ wall_clock_limit = 06:00:00
       # @            class = normal

       # @         job_name = hello
       # @           output = $(job_name).o$(jobid)
       # @            error = $(job_name).o$(jobid)
       # @     notification = never

       # @      network.MPI = csss,shared,US
       # @         job_type = parallel

       # @ notification=never
       # @ notify_user = somebody@tacc.utexas.edu
       # @ queue

       echo "Master Host: `pwd`"
       echo "NODELIST: $LOADL_PROCESSOR_LIST"
       echo "----------------------------------"

       cd $LOADL_STEP_INITDIR
       poe ./mpihello

The "environment" keyword is used for providing a list of colon separated environment variable values (variable_name=variable_value). Note: The environment resource specification must be on a SINGLE LINE (the expression above was wrapped for clean display). Setting COPY_ALL (without a value) signals LoadLeveler to copy all of your interactive variables to the batch environment. The MP_EUILIB=us and MP_SHARED_MEMORY are important for using the correct software (user space) and shared memory mpi buffers for MPI, respectively. The network.MPI resources (csss,shared,US) specifies the SP2 dual-plane adapters, shared memory, and "us" software stack, respectively.

The $LOADL_PROCESSOR_LIST and $LOADL_STEP_INITDIR contain the list of processors and the directory of submission. Poe is used to launch the parallel applications on 16 processors (node and tasks_per_node are used to determine the number of processors). For code compiled with "MP" compilers (mpxlf90, mpcc, etc.) the "poe" is not necessary. Using hpmcount in lieu of the poe will provide hardware counter information for the parallel execution.