Introduction
Conceptual Paradigm Shift

Using the MyCluster middleware system can be viewed as a conceptual change in the way jobs are managed clusters. Essentially, MyCluster will replace the local job schedulers with and enable the use of a single scheduler instead of mutliple platforms. MyCluster currently supports Condor and SGE grid scheduling systems. Jobscripts must be created in condor and submitted to the MyCluster system, regardless of which scheduling system is enabled on the public cluster. Running sessions will appear as though there is a single Condor or SGE cluster with nodes that will be available for immediate use.

MyCluster will interface with the local scheduling system to obtain nodes for the job ensemble until the entire workflow is finished. Local system accounting will be handled by the local scheduler as usual.

The MyCluster Commands

Mycluster is comprised of the commands vo-login and pool_starter. Each command has the same arguments and will similiarly create a Condor pool from the specified information on the command line. Pool_starter is the local version of MyCluster that is used locally on the TACC lonestar machine. Vo-login is the TeraGrid version of pool_starter that is used to make cross-site runs with multiple TeraGrid resources.

Name

   vo-login = virtual login command for creating a MyCluster/Condor session 
Synopsis
   vo-login [-d] [-h] [-n <jobs:size>] [-m] [-H <conf_file>] [-W <mins>] [-T] [-M <host>] [-J]
   vosub.pl [-d] [-h] [-n <jobs:size>] [-m] [-H <conf_file>] [-W <mins>] [-T] [-M <host>] [-J]

Description The vo-login and vosub.pl (non-GNU screened version) commands add compute nodes from clusters, belonging to a virtual organization, to either a pre-existing Condor pool or into a private Condor pool created for a virtual login session on a per experiment basis.

The options are as follows
-d
Prints additional debug messages in standard error.
-h
Prints a usage message and exits.




-n
<jobs:size>
Specifies the number of condor starter jobs and the size of each job that are submitted at each
site; e.g. -n 2:16 specifies 2 starter jobs of size 16 processors each. Default: -n 1:1
-m
Specifies that no starter jobs are to be submitted to the local client-only machine; this is the :machine on which the vo-login command is invoked.
-H <conf_file>
Specifies the configuration file listing the cluster gateway nodes participating in this session. :The configuration file may also contain site specific configuration settings for the ses- sion. :Default: $HOME/vo.conf

The vo-login configuration file is an ASCII file listing the clusters contributing to a virtual organization. Comments in the file are introduced with "#", and the _GRID_THROTTLE and _GRID_JOB_SIZE configuration variables are used to specify site specific Condor starter job submission requirements; e.g. tg- login.tacc.utexas.edu%_GRID_THROTTLE=2:_GRID_JOB_SIZE=32 specifies 2 Condor starter jobs of size 32 processors each submitted to the cluster tg-login.lonestar.tacc.teragrid.org. Note that if the _GRID_MAX_THROTTLE specification is set at the participating sites, the starter submission requirements are treated as inital "hints" by the system. MyCluster will attempt to move pending Condor starter jobs between sites, to minimize their queue wait times.

-W <mins>
Specifies the wall time limit for the condor starter jobs when they are submitted to the remote :sites. The unit of the wall clock limit is in minutes.
-M <host>
Specifies alternative Condor central manager, if condor starters are to callback to an existing :Condor pool.
-T
Do not set the TimeToLive option in the Condor starter. The TimeToLive option allows the Condor :starter to advertise the TimeToLive classad, indicating the time left in the wall clock limit set :for that job.
-J
Specifies that the Condor starter is to join to an existing pool specified with the -M option.
-u
Instructs the Condor starters to use UDP to update the Condor collector. By default, the system uses :TCP updates.

Using Condor

Condor is an extemely powerful scheduling software that creates a High-Throughput Computing (HTC) environment across workstations and clusters of dedicated nodes. Condor's main attraction is the ease of execution of large ensembles and workflow management with simple commands.

For a detailed explanation of condor which includes scripting, monitoring commands, and submittion, please see the TACC Condor guide.

http://www.tacc.utexas.edu/services/userguides/rodeo/


Running MyCluster on Lonestar
Enabling MyCluster on TACC accounts

TeraGrid Users

For users that have TeraGrid accounts simply edit the .soft file in your /home directory to include the addition for gridshell as:

lonestar$ cat /home/eturner/.soft
@teragrid
+gridshell

Once you edit the file you must run resoft to commit the changes.

lonestar$ resoft

Local TACC Users

Users at TACC that are NOT TeraGrid users should add the MyCluster module to their runtime environment.

lonestar$ module add mycluster
Simple Example of MyCluster on Lonestar

This is a simple example of running MyCluster locally on Lonestar. Below is a condor script run-script that will run a simple executable over the created condor pool.

lonestar$ cat run-script 
executable = /work/eturner/mycluster-testing/testscript
 #arguments = input.$(PROCESS)
arguments = 
universe = vanilla
Requirements = (TARGET.TimeToLive > 200) 
Error = output/errfile.$(CLUSTER).$(PROCESS)
Output = output/outfile.$(CLUSTER).$(PROCESS)
queue 20

The executable is the binary, or script, that you want to execute on your pool of nodes. The arguments is any argument you want to pass to the script, which may be an input file, or initial conditions. Pay special attention to the requirements statment, the number given in this line indicates that your application needs at least 200 seconds of runtime to execute on a condor node. The queue statments tells condor how many times you want this program spawned on the pool.

In this example I want to run this script /work/eturner/mycluster-testing/testscript twenty times on the pool:

#!/bin/sh

uname -a
hostname
sleep 5m
Creating the Virtual Cluster

When you are ready to start the pool, which actually generates the local LSF jobs to the system, run the vo-login command.

 vo-login -n 1:8 -s -W 60

In this example I am going to start the pool with 1 job with 8 processors, -s for silence, and with a walltime of 60 minutes. Note: this command takes time in miunutes, the condor script we setup had time in seconds for each task. Generally speaking, your pool walltime should be greater than your task times, since most likely you can run multiple tasks for the life of the pool. If you have too many tasks, and the pool finishes it's walltime, pool_starter should submit a new pool to the lsf local queue to finish your tasks.

If there is thousands of jobs to be run, there may be some testing involved to see which configuration gets the maximum number of processors on the local system. Shorter processor jobs will spawn quicker on the system, but may also lower the total procs you are given. Probably the best number we have found is doing -n 8:32, or running 8 jobs with 32 processors each. If all of the jobs run at the same time, you will get 256 processors at once, with the maximum any one user can obtain on the system in the current defined LSF queues.


Once I submit the vo-login command I can use condor_status to see if I have any condor nodes that have started. I can still run the local job commands to see if my job is running or pending as well

bash-3.00$ condor_status
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
16492@compute LINUX       INTEL  Unclaimed  Idle       0.110  2026  0+00:00:04
16494@compute LINUX       INTEL  Unclaimed  Idle       0.110  2026  0+00:00:04
12830@compute LINUX       INTEL  Unclaimed  Idle       0.120  2026  0+00:00:05
12831@compute LINUX       INTEL  Unclaimed  Idle       0.120  2026  0+00:00:05
13723@compute LINUX       INTEL  Unclaimed  Idle       0.160  2026  0+00:00:04
13724@compute LINUX       INTEL  Unclaimed  Idle       0.160  2026  0+00:00:04
19466@compute LINUX       INTEL  Unclaimed  Idle       0.120  2026  0+00:00:05
19467@compute LINUX       INTEL  Unclaimed  Idle       0.120  2026  0+00:00:05
                    Machines Owner Claimed Unclaimed Matched Preempting
        INTEL/LINUX        8     0       0         8       0          0
              Total        8     0       0         8       0          0
bash-3.00$ lsuser
JOBID   QUEUE    USER             NAME         PROCS    SUBMITTED
753989  normal  eturner /home/eturner/.gtcsh_submit//tsubmit.TAG_58190  8       Wed Jan 25 10:49:58 2006
HOST    R15s    R1m     R15m    PAGES           MEM     SWAP    TEMP
c0-10     0.0     0.2     1.6    49.3P/s        1924M   2047M   24704M
c11-4     0.0     0.0     1.6    32.1P/s        1951M   2047M   23744M
c7-0      0.0     0.1     1.6    32.7P/s        1948M   2047M   24752M
c7-27     0.0     0.1     1.6    33.1P/s        1944M   2047M   23744M
AVRG:     0.0     0.1     1.6    36.8P/s        1942M   2047M   24236M
TOTALS:                                          297M      0M   2064M


bash-3.00$


I can now submit my job to the condor pool via condor_submit as:

bash-3.00$ condor_submit run-script 
Submitting job(s)....................
20 job(s) submitted to cluster 1.


When my jobs start running I can checkup on their status:

bash-3.00$ ls output/
errfile.1.0   errfile.1.13  errfile.1.18  errfile.1.5  outfile.1.0   outfile.1.13  outfile.1.18 
outfile.1.5
errfile.1.1   errfile.1.14  errfile.1.19  errfile.1.6  outfile.1.1   outfile.1.14  outfile.1.19  
outfile.1.6
errfile.1.10  errfile.1.15  errfile.1.2   errfile.1.7  outfile.1.10  outfile.1.15  outfile.1.2   
outfile.1.7
errfile.1.11  errfile.1.16  errfile.1.3   errfile.1.8  outfile.1.11  outfile.1.16  outfile.1.3   
outfile.1.8
errfile.1.12  errfile.1.17  errfile.1.4   errfile.1.9  outfile.1.12  outfile.1.17  outfile.1.4   
outfile.1.9
bash-3.00$ cat output/outfile.1.0  
Linux c7-0 2.6.11.12-papismp #5 SMP Mon Oct 17 17:39:45 CDT 2005 i686 i686 i386 GNU/Linux
c7-0

Running MyCluster over TeraGrid Sites

The table below shows the MyCluster installation locations on TeraGrid resources. On some sites, it may be necessary to add or change environment variables for proper execution.


Site Architecture Installation Notes
IU IA-64 cluster _GRID_PROJECT_NAME must be defined in user's login profile.
NCSA IA-64 cluster
PSC Alpha cluster Able to submit across TeraGrid resources from the lemieux login node only.
PSC Alpha cluster Able to submit across TeraGrid resources from the lemieux login node only.
Purdue IA-32 cluster _GRID_PROJECT_NAME must be defined in user's login profile.
SDSC IA-64
TACC IA-32 cluster
UC/ANL IA-32 IA-64 cluster GRID_PROJECT_NAME and _GRID_NODE_RESOURCE1 must be defined in user’s login profile.


Example Cross-Site Runs on TeraGrid

Here is a sample Condor script that will be run for this example

# File: mysub.file
# Running 100 instances of "a.out -i <rank>" 
#
# Consult http://www.cs.wisc.edu/condor for general info on Condor and
# http://www.cs.wisc.edu/condor/manual/v6.6/condor_submit.html 
# for info about condor_submit
# IMPORTANT: TimeToLive is the expected execution time of your application
# in secs, and this _must_ be set with the job Requirements.
Executable = a.out
Arguments = -i $(PROCESS)
Universe = vanilla
Requirements = (TimeToLive > 300) && (FileSystemDomain != "dummy") 
&& (Arch != "dummy") && (OpSys != "dummy")
Should_transfer_files = true
When_to_transfer_output = on_exit
output = log/out.$(CLUSTER).$(PROCESS)
error = log/err.$(CLUSTER).$(PROCESS)
queue 100 

An example MyCluster/Condor run on the TeraGrid is specified by the command line invocation as shown:

   %> vo-login -H ~/vo.conf -n 5:32 -W 280

This submits 5 Condor starter jobs, with 32 processors and a wall-clock limit of 280 minutes each, on each of the sites specified in the vo.conf configuration file. The content of the configuration file is as follows:

  # file contains remote sites in the cross-site run
   tg-login.sdsc.teragrid.org
   tg-login.lonestar.tacc.teragrid.org

The local site (where the vo-login command is invoked) does not need to be listed in the file.

In the case where you might not want jobs to be submitted to you local client workstation when you invoke a cross-site GridShell/Condor run, the -m option should be invoked as shown below:

   %> vo-login -H ~/vo.conf -n 5:32 -W 280 -m

The vo.conf configuration file can be used to specify a different number of Condor starters to be submitted for each site in your virtual organization. This can be specified in the configuration file with the configuration variables _GRID_JOB_SIZE and _GRID_THROTTLE. An example configuration file with individual settings for each site is as follows:

  tg-login.ncsa.teragrid.org%_GRID_THROTTLE=3:_GRID_JOB_SIZE=8  
   tg-login.lonestar.tacc.teragrid.org%_GRID_THROTTLE=5:_GRID_JOB_SIZE=32

The configuration above starts 3 Condor starter jobs of 8 processors each at NCSA, and 5 Condor starter jobs of 32 processors each at TACC.

To add TeraGrid compute resources to your local Condor pool, where the master host is tejas.utexas.edu, you can issue the following command:

  %> vo-login -H ~/vo.conf -M tejas.utexas.edu -u -J

Note that the local Condor pool needs to have the HOSTALLOW_READ and HOSTALLOW_WRITE configuration set appropriately to allow TeraGrid resources to join this pool.


Adding Remote Condor Pools to Mycluster

Users with existing departmental Condor pools can continue submitting jobs to their departmental pool, and have MyCluster add TeraGrid and TACC resources to their local pool during peak job submission periods. The user thus continues to use the default Condor commands in his departmental pool to manage and control his job submissions, whilst MyCluster manages the submission (and re-submission) of Condor starter daemons through the different resource managers on the TeraGrid clusters.

Adding Local TACC resources to UT Departmental Pools

Users that are local UT users that do not have TeraGrid accounts can add lonestar and other TACC resources to their departmental condor pools in nearly the same fashion as TeraGrid users with some small differences.

Adding TeraGrid Condor Resources to Departmental Pools

There are multiple ways of configuring Mycluster to do this, but we will highlight one example of how this may be achieved. In our scenario, we assume the user wants his/her jobs to run on any local or TeraGrid contributed CPU resource. There are three steps that need to be performed in order to do this:

Step 1

Configure your local Condor pool to give TeraGrid permission to add resources to it. This can be achieved by adding the following configuration line into your local Condor installation:

   HOSTALLOW_READ = 129.114.*, *.teragrid.org
   HOSTALLOW_WRITE = 129.114.*, *.teragrid.org

Step 2

Configure the local MyCluster template configuration file on each cluster to start jobs only with a specified job classAd, e.g. Tg_Resource. In order to do this, create a personal template directory at each cluster, and copy over the condor starter configuration file (condor_config.glidein.template) to this directory, remembering to set _GRID_TEMPLATE_DIR in your login environment as well. Then modify the START configuration with the following:

   START = (TARGET.Project =?= "Tg_Resource")

Step 3

Submit your jobs advertising a TeraGrid project classAd. You can do this by adding the following line in your condor submit file:

   +Project = "Tg_Resource"

This ensures that only your jobs will be run on TeraGrid resources contributed to your local Condor pool.


Supporting Documentation
Optional Settings

You can also configure additional options in MyCluster by setting environment variables in your login script (.cshrc or .bashrc). These additional options include setting the project account to which the Condor starter jobs will be charged, setting a limit to the maximum number of starter jobs at each site to enable "pending" job load-balancing, or enabling periodic checker scripts to detect faulty site conditions and disabling job dispatch to these sites. The following examples are in TCSH syntax (you may convert accordingly in a BASH environment).

Project Accounting

If you wish to have your Condor starter jobs submitted and charged to a non-default account, you can add the optional line in your login script:

   setenv _GRID_PROJECT_NAME <LSF/PBS project account>

Your jobs will be now charged to the specified account when the starter jobs are submitted by GridShell on your behalf.

Load-Balancing "Pending" Jobs across Clusters

MyCluster will migrate "pending" Condor starter jobs between cluster sites, if you specify a limit to the maximum number of Condor starter jobs in your login script:

   setenv _GRID_MAX_THROTTLE 10

This will cause pending jobs to be migrated to sites which have shorter queue wait times, up to the number of jobs set in this configuration variable. If this is not set, only the specified number of Condor starter jobs for the site will be submitted.

The default interval at which MyCluster will migrate "pending" Condor starter jobs is 600 seconds. You can however reset this by specifying the _GRID_GETJOB_INTERVAL environment variable:

   setenv _GRID_GETJOB_INTERVAL 300 # every 300 seconds

Site Checker Scripts

You can also specify the location of a checker script for MyCluster to run periodically on the login node with the _GRID_USER_CHECKER environment variable. An example entry in your login script could be:

   setenv _GRID_USER_CHECKER /home/ewalker/checker.sh

If the script, checker.sh, exits with a value other then "0", MyCluster will not submit any more jobs to the site during your login session. This is useful if you wish to prevent jobs from running at a site when commonly occurring error scenarios, such as a full scratch space, missing libraries, etc., are detected.

The default interval at which this user check script will be invoked is 600 seconds. However, you can reset this by specifying the _GRID_USERCHECK_INTERVAL:

  setenv _GRID_USERCHECK_INTERVAL 60 # every 60 seconds
 


Using Screen to Manage a Large Simulation of Mycluster

For large experiments, it may be helpful to use the built-in GNU screen support for pool_starter and vo-login which is installed for Lonestar.

Start the pool_starter as:

screen pool_starter -n 8:32 -W 60 -s

Now your session can be detached, and reattached via detach and screen -r. Multiple sceen sessions can be started and thus many logical pools started, which may be highly effective for ensamble work and workflow management. You must be logged in with the same user account to attach a screen, however. You cannot attach to a screen for another user. Look in the screen manual for extra help for using screen.

This is also effective usage incase there is a network issue between you and the lonestar system. As long as the lonestar headnode is alive, you can reattach your screen to manage the workflow.


Office of the Vice President for Research
© Copyright 2002 The University of Texas at Austin