NOTICE: On February 4, 2013 the Ranger compute cluster and the Spur visualization cluster will be decommissioned. The last day of production for both systems is February 3, 2013. After February 3, 2013 users will NOT be able to run jobs on either system. The Ranger and Spur login nodes will, however, remain available to users throughout the month of February to enable users access to their $WORK and $SCRATCH directories for data migration to Stampede.


Spur User Guide

System Overview
System Access
Computing Environment
Transferring your files to Spur
Application Development
Running your applications
     Running Batch Jobs on Spur
     Running Interactive Applications on Spur
Tools
     Using CUDA on Spur

System Overview

Overview

Spur (spur.tacc.utexas.edu), a Sun Visualization Cluster, contains 128 compute cores, 1 TB aggregate memory and 32 GPUs. Spur shares the InfiniBand interconnect and Lustre Parallel file system of Ranger, TACC's Sun Constellation Linux Cluster. Thus Spur acts not only as a powerful stand-alone visualization system: it also enables researchers to perform visualization tasks on Ranger-produced data without migrating to another file system and to integrate simulation and rendering tasks on a single network fabric.

System Configuration

Spur is an eight node distributed-memory visualization cluster (+ login node). Each each node contains 16 cores, at least 128 GB of RAM, and 4 NVIDIA Quadro FX 5600 GPUs. Spur shares an Infiniband interconnect, Lustre parallel file system and an SGE job scheduler with Ranger. Please note that Spur and Ranger have separate SU allocations. Compute time on Spur must be requested separately through the allocation process.

Spur & Ranger Topology

The individual nodes are configured as follows:

  • spur: login node (no graphics resources)
  • visbig : Sun Fire X4600M2 server with
    • 8 dual-core AMD Opteron processors
    • 256 GB RAM
    • 2 NVIDIA QuadroPlex 1000 Model IV (2 FX 5600 each)
  • vis1: Sun Fire X4400 server with
    • 4 quad-core AMD Opteron processors
    • 128 GB RAM
    • 2 NVIDIA QuadroPlex 1000 Model IV (2 FX 5600 each)
  • vis2-7: Sun Fire X4400 server with
    • 4 quad-core AMD Opteron processors
    • 128 GB RAM
    • 1 NVIDIA QuadroPlex 2100 S4 (4 FX 5600)

File Systems

Spur shares all Ranger file systems, with the same capabilities and restrictions. Please see the Ranger User Guide for more details.

System Access

Note the following user guide conventions:
  • Commands issued on spur's login node will be preceded with a "spur$" shell prompt.
  • Compute-node command line examples are preceded with a "vis2$" prompt.
  • Commands issued from your own local machine are indicated with a "mymachine$" shell prompt.

Spur presents a similar interface to Ranger differing only in the name of the login node to use: spur.tacc.utexas.edu. Consult the Ranger User Guide for complete details on available access methods and logging in. However, Spur is intended to be used as an interactive system by remote users; this mode is implemented by using VNC to provide remote users with access to an interactive desktop running on one node of a set of allocated Spur compute nodes. To set this up, the user uses the batch-mode interface to start a job that:

  • Allocates one or more Spur compute nodes;
  • Starts vncserver on one of the nodes;
  • Creates an ssh tunnel on the Spur login node that provides IP access to the compute node's vncserver via a unique port on the Spur login node.

Once this job is running, the user need only create a secure SSH socket connection between his remote system and the Spur login node, and then connect a vncviewer. Once the user has a desktop on the Spur compute node, serial and parallel applications can be run on that desktop using all the resources allocated to the initial batch job. This is illustrated in the following figure:

Note that all visualization and data analysis (VDA) jobs must be run on Spur compute nodes. No VDA applications should be run on the Spur login node (spur.tacc.utexas.edu). VDA applications running on the login node may be terminated without notice, and repeated violations may result in your account being suspended. Please submit a consulting ticket at https://portal.tacc.utexas.edu/consulting with questions regarding this policy.

Establishing Interactive Access Via VNC

To launch an interactive, remotely accessible desktop on a Spur compute node:

  1. ssh to spur:

    mymachine$ ssh -l <username>@spur.tacc.utexas.edu

  2. If this is your first time connecting to spur, you must run vncpasswd to create a password for your VNC servers. This should NOT be your login password! This mechanism only deters unauthorized connections; it is not fully secure, as only the first eight characters of the password are saved. All VNC connections are tunnelled through ssh for extra security, as described below.
  3. Launch a vnc desktop via SGE:

    spur$ qsub /share/doc/sge/job.vnc

    for instance, to specify a particular account and desktop size, use:

    spur$ qsub -A <TG-MyAcct> /share/doc/sge/job.vnc -geometry 1440x900

    This script can be copied to your home directory and modified, particularly if you would like to add your account information or change the default runtime of your job (currently limited to 24 hours). You can also change job runtime using the qsub command-line option "-l h_rt=<hours:minutes:seconds>". To request a specific node, use the command line option "-l h=<node>". For example, to request visbig, use "-l h=ivisbig". Note that you must put a leading 'i' before the node name.

    The default window manager is twm, a spartan window manager which reduces connection overhead. Gnome is available, if your connection speed is sufficient to support it. To use gnome, open the file ~/.vnc/xstartup and replace twm with gnome-session.

  4. Once the job launches, connection info will be written to vncserver.out in your home directory. The very first time you run the VNC script, the vncserver file will not exist, so you can create it with the touch command. You can then track when your connection information is written out using the tail -f command:

    spur$ touch ~/vncserver.out
    spur$ tail -f ~/vncserver.out

  5. The connection info will have a VNC port on spur for your session. However, for security,

    TACC requires that you tunnel your VNC session through ssh. You can set up a tunnel on a unix command line, or with a GUI-based ssh client. You will need to select an arbitrary VNC port number on your local machine for the tunnel.

    From your local machine (not on the spur login), on unix command line, forward the port specified in your vncserver.out file to the matching port on spur, use the command:

    mymachine$ ssh -f -N -L 59xx:spur.tacc.utexas.edu:59xx <username>@spur.tacc.utexas.edu

    The '-f' instructs ssh to only forward ports, not to execute a remote command; the '-N' puts the ssh command into the background after connecting; and the '-L' forwards the port. For example, to tunnel from 5951 on the local machine to 5951 on spur, use:

    spur$ ssh -f -N -L 5951:spur.tacc.utexas.edu:5951 <username>@spur.tacc.utexas.edu

    In a GUI-based ssh client, find the menu where tunnels can be specified, and specify the local and remote ports as required, then launch the ssh connection to spur.

  6. Once the ssh tunnel has been established, use a VNC client to connect to the local port you created, which will then be tunnelled to your VNC server on spur. Connect to localhost:59xx, where 59xx is the local port you used for your tunnel. In the examples above, we would connect the VNC client to localhost::5951. Some VNC clients accept localhost:51.

    If you do not have a VNC client, these are available for Windows/Linux:

    and for Macs:

  7. After connecting your VNC client to your VNC server on Spur, you may use visualization applications directly on the remote desktop without launching other SGE jobs. Applications that use hardware-assisted OpenGL library calls must be launched via a wrapper that provides access to the hardware (e.g. vglrun or tacc_xrun) as described below.

  8. When you are finished with your VNC session, kill the session by typing exit in the black xterm window titled:

    *** Exit this window to kill your VNC server ***

    Note that merely closing your VNC client will NOT kill your VNC server job on Spur, and you will continue to be billed for time usage until the job ends. If you close your VNC client, you can reconnect to your VNC server at any time until the server job ends.

Computing Environment

Most of the computing environment found when the user logs on to Spur is identical to that found on Ranger.

Modules

Spur provides all modules available on Ranger and loads the same subset by default. See the Ranger User Guide for full details. In addition, Spur provides a set of visualization-specific modules. Access to these modules is provided via the vis module, which should be loaded by default; if the list of loaded modules shown by running module list does not include vis, then run module load vis.

Visualization modules available on Spur include:

  • amira: Access to Amira visualization application
  • chromium: Parallel rendering environment
  • ffmpeg: Toolkit for processing audio and video data
  • glui: GLUT-based C++ toolkit for GUI building
  • qt: Application and toolkit for building GUIs
  • silo: Access to the Silo visualization application and associated tools and libraries
  • vapor: Access to NCAR's Vapor visualization application
  • vtk: Toolkit for scientific and information visualization
  • blender: Animation and video-stream editting application
  • ensight: Access to CEI's Ensight visualization application
  • glew: GL Extension Wrangler library
  • glut: GL Utilities toolkit. Includes great examples
  • idl: Access to IDL visualization application
  • mesa: Access to software implementation of OpenGL
  • mplayer: Access to movie player
  • paraview: Access to Paraview visualization application
  • sdl: Simple application development toolkit
  • teragrid: Tools supporting Teragrid environment, including Globus
  • visit: Access to Visit visualization application

Transferring your files to Spur

Since Spur shares filesystems with Ranger, there is no need to move data from Ranger to Spur; Spur can directly import data files written by Ranger. From other systems, the process of moving data onto Spur is identical to moving data to Ranger; see the Ranger User Guide for more information.

Application Development

In general, application development on Spur is identical to that on Ranger, including the availability and usage of compilers, the parallel development libraries (e.g. MPI and OpenMP), tuning and debugging. Please see the Ranger User Guide for detailed information.

Additional visualization-oriented libraries available on Spur are made accessible through the modules system and are listed above. Library and include-file search path environment variables are modified when modules are loaded. For detailed information on the effect of loading a module, use:

spur$ module help modulename

Running your applications

Running Batch Jobs on Spur

As with Ranger, jobs are run on Spur using the SGE job scheduler. Please see the Ranger User Guide for detailed information on the use of SGE. Jobs are submitted to Spur using the vis queue. Using this queue, users can allocate one or two of Spur's compute nodes using the qsub command-line arguments -pe nway m, where n is 1, 2, 4, 8, 12, 14, 15, or 16 and indicates the number of processes to run on each node, and m is either 16, to allocate a single node, or 32 to allocate two. The qsub command-line arguments -l h=name can be used to allocate a specific node (e.g. -l h=visbig) if the large memory node is required.

Running Interactive Applications on Spur

As discussed above, Spur is designed for interactive use through a remotely accessible VNC desktop. Several specialized tools facilitate using high-performance graphics applications on Spur.

  • ibrun: enables parallel MPI jobs to be started from the VNC desktop. ibrun uses information from the user's environment to start MPI jobs across the user's set of Spur compute nodes. This information is determined by the initial SGE job submission, and includes the location of the hostfile created by SGE (found in the $PE_HOSTFILE environment variable).

    To run an MPI-parallel job from the VNC desktop, run: ibrun application application-args

    For more information on ibrun, run ibrun --help on either the Spur login node or from a window on a Spur VNC desktop.

  • vglrun: VNC does not support OpenGL applications. vglrun is a wrapper for OpenGL applications that redirects rendering instructions to graphics hardware and then copies the results to destination windows on the desktop.

    To run an application using vglrun:

    vis2$ vglrun application application-args

    For more information about vglrun, see VirtualGL.

  • tacc_vglrun: Some parallel visualization back-end tasks create visible windows on the display indicated by their $DISPLAY environment variable. When run under vglrun (e.g. ibrun vglrun application application-args) all participating tasks will receive a pointer to the VNC desktop and will then cause windows to appear on the VNC desktop with major impact to performance and usability. Instead, tacc_vglrun ensures that the only the root process of the parallel application will receive a $DISPLAY environment variable that points to the VNC desktop, while the remainder will receive pointers to invisible desktops running on the local hardware graphics cards. Note that the available graphics cards are assigned to tasks in a round-robin order.

    To run an application using tacc_vglrun:

    vis2$ tacc_vglrun application application-args

  • tacc_xrun: Sometimes, only the assembled rendering results should be shown on the VNC desktop, not the individual windows of the parallel processes. When this is the case, use tacc_xrun to direct all the tasks to use invisible desktops running on the local hardware graphics cards. Again, the available graphics cards are assigned to tasks in a round-robin order.

    To run an application using tacc_xrun:

    vis2$ tacc_xrun application application-args

Tools

In addition to the CUDA compiler & libraries, TACC supports several widely used visualization tools on Spur: VisIt, ParaView, IDL & Amira.

Using CUDA on Spur

NVIDIA's CUDA compiler and libraries are accessed by loading the CUDA module:

spur$ module load cuda

This puts nvcc in your $PATH and the CUDA libraries in your $LD_LIBRARY_PATH. Applications should be compiled on the Spur login nodes, but these must be run by submitting an SGE job to the compute nodes, both in accordance with TACC user policies and because the login nodes have no GPUs. The CUDA module should be loaded within your job script to ensure access to the proper libraries when your program runs.

Spur's GPUs are compute capability 1.0 devices. When compiling your code, make sure to specify this level of capability with:

nvcc -arch=compute_10 -code=sm_10

For further information on the CUDA compiler, please consult the documentation at: $TACC_CUDA_DIR/doc/nvcc.pdf.

For more information about using CUDA, please consult the documentation at: $TACC_CUDA_DIR/doc/CUDA_C_Programming_Guide.pdf.

For the complete CUDA API, please consult the documentation at: $TACC_CUDA_DIR/doc/CUDA_Toolkit_Reference_Manual.pdf.

Using the CUDA SDK on Spur

The NVIDIA CUDA SDK can be accessed by loading the CUDA SDK module:

spur$ module load cuda_SDK

This defines the environment variable $TACC_CUDASDK_DIR which can be used to access the libraries and executables in the CUDA SDK.

Using multiple GPUs in CUDA

CUDA contains functions to query the number of devices connected to each host, and to select among devices. CUDA commands are sent to the current device, which is GPU 0 by default. To query the number of available devices, use the function:

int devices;
cudaGetDeviceCount( &devices );

To set a particular device, use the function:

int device = 0;
cudaSetDevice( device );

Remember that any calls after cudaSetDevice() typically pertain only to the device that was set. Please see the CUDA C Programming Guide and Toolkit Reference Manual at $TACC_CUDA_DIR/doc/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf for more details. For a multi-GPU CUDA example, please see the code at: $TACC_CUDASDK_DIR/C/src/simpleMultiGPU/.

Debugging CUDA kernels

The NVIDA CUDA debugger, cuda-gdb, is included in the CUDA module. Applications must be debugged through a job using either the idev module or using a VNC session. Please see the relevant sections for more information on idev and launching a VNC session. For more information on the CUDA debugger, see: $TACC_CUDA_DIR/doc/cuda-gdb.pdf.

Using OpenCL on Spur

Spur has the NVIDIA implementation of the OpenCL v. 1.0 standard that is included in the NVIDIA CUDA SDK. To access it, first load both the CUDA and the CUDA SDK modules:

spur$ module load cuda cuda_SDK

OpenCL is contained within the $TACC_CUDASDK_DIR/OpenCL directory. When compiling, you should use the following include directory on the compile line:

spur$ g++ -I${TACC_CUDASDK_DIR}/OpenCL/common/inc

If you use the NVIDIA OpenCL utilities, also add the following directory and libraries on your link line:

spur$ g++ -L${TACC_CUDASDK_DIR}/OpenCL/common/lib -loclUtil_x86_64

For more information on OpenCL, please see the OpenCL specification at: $TACC_CUDASDK_DIR/OpenCL/doc/Khronos_OpenCL_Specification.pdf.

Using multiple GPUs in OpenCL

OpenCL contains functions to query the number of GPU devices connected to each host, and to select among devices. OpenCL commands are sent to the specified device. To query the number of available devices, use the following code:

cl_platform_id platform;
cl_device_id* devices;
cl_uint device_count;

oclGetPlatformID(&platform);
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &device_count);
cdDevices = (cl_device_id*)malloc(device_count * sizeof(cl_device_id) );
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, device_count, devices, NULL);

In OpenCL, multiple devices can be a part of a single context. To create a context with all available GPUs and to create a command queue for each device, use the above code snippet to detect the GPUs, and the following to create the context and command queues:

cl_context context;
cl_device_id device;
cl_command_queue* command_queues;

int i;
context = clCreateContext(0, device_count, devices, NULL, NULL, NULL);
command_queues = (cl_command_queue*)malloc(device_count * sizeof(cl_command_queue));
for (i=0; i < device_count; ++i) {
  cdDevice = oclGetDev(cxGPUContext, i);
  command_queue[i] = clCreateCommandQueue(context, device, 0, NULL);
}

For a multi-GPU OpenCL example, please see the code at: $TACC_CUDASDK_DIR/OpenCL/src/oclSimpleMultiGPU/.

Using the NVIDIA Compute Visual Profiler

The NVIDIA Compute Visual Profiler, computeprof can be used to profile both CUDA programs and OpenCL programs that are run using the NVIDIA OpenCL implementation. Since the profiler is X based, it must be run either within a VNC session or by ssh-ing into an allocated compute node with X-forwarding enabled. The profiler executable path should be loaded by the CUDA module. If the computeprof executable cannot be located, define the following environment variables:

spur$ export PATH=$TACC_CUDA_DIR/computeprof/bin:$PATH
spur$ export LD_LIBRARY_PATH=$TACC_CUDA_DIR/computeprof/bin:$LD_LIBRARY_PATH

Running Parallel VisIt on Spur

NOTE: these instructions are valid for VisIt versions 1.12 and later on Spur. If you need to run an earlier version of VisIt, please submit a consulting ticket for instructions.

After connecting to a VNC server on spur, as described above, do the following:

  • VisIt was compiled under the intel v10 compiler and mvapich v1.0.1 MPI stack. These must be loaded prior to running VisIt. Also, the default module 'CTSSV4' is incompatible with the VisIt server and must be removed. From the default environment, execute the following:

    vis2$ module delete mvapich mvapich2
    vis2$ module delete CTSSV4
    vis2$ module swap pgi intel/11.1
    vis2$ module load mvapich/1.0.1

  • If the vis module is not yet loaded, you must load it:

    vis2$ module load vis

  • Load the VisIt module:

    vis2$ module load visit

    If you want to load a specific version of visit, use module load visit/<version> where version is the specific version number

  • Launch VisIt:

    vis2$ vglrun visit

    If you loaded a specific version of visit, specify it on the command line:

    vis2$ vglrun visit -v <version>

When VisIt first loads a dataset, it will present a dialog allowing the user to select either a serial or parallel engine. Select the parallel engine. Note that this dialog will also present options for the number of processes to start and the number of nodes to use; these options are actually ignored in favor of the options specified when the VNC server job was started.

Preparing data for Parallel Visit

In order to take advantage of parallel processing, VisIt input data must be partitioned and distributed across the cooperating processes. This requires that the input data be explicitly partitioned into independent subsets at the time it is input to VisIt. VisIt supports SILO data (see SILO), which incorporates a parallel, partitioned representation. Otherwise, VisIt supports a metadata file (with an .visit extension) that lists multiple data files of any supported format that are to be associated into a single logical dataset. In addition, VisIt supports a "brick of values" format, also using the .visit metadata file, which enables single files containing data defined on rectilinear grids to be partitioned and imported in parallel. Note that VisIt does not support VTK parallel XML formats (.pvti, .pvtu, .pvtr, .pvtp, and .pvts). For more information on importing data into VisIt, see Getting Data Into VisIt; though this refers to VisIt version 1.5.4, it appears to be the most current available.

For more information on VisIt, see //wci.llnl.gov/codes/visit/home.html

Running Parallel ParaView on Spur

After connecting to a VNC server on Spur, as described above, do the following:

  1. ParaView was compiled under the intel v10 compiler and OpenMPI v1.3 MPI stack. These must be loaded prior to running ParaView. However, they are not loaded by default. You must load them manually:

    vis2$ module swap pgi intel
    vis2$ module swap mvapich openmpi/1.3

  2. If the vis module is not yet loaded, you must load it:

    vis2$ module load vis

  3. And then load the ParaView module:

    vis2$ module load paraview

  4. Set $NO_HOSTSORT environment variable to 1
    • (csh) vis2$ setenv NO_HOSTSORT 1
    • (bash) vis2$ export NO_HOSTSORT=1

  5. Launch ParaView:

    vis2$ vglrun paraview [paraview client options]

  6. Connect the server from within the ParaView client:
    1. Click the "Connect" button, or select File -> Connect
    2. If this is the first time you've used ParaView in parallel (or failed to save your connection configuration in your prior runs):
      1. Select "Add Server"
      2. Enter a "Name", e.g. "ibrun"
      3. Click "Configure"
      4. For "Startup Type" and enter the command: ibrun tacc_xserver pvserver [paraview server options] and click "Save"
    3. Select the name of your server configuration, and click "Connect"
    You will see the parallel servers being spawned and the connection establushed in the ParaView Output Messages window.

Preparing data for Parallel ParaView

In order to take advantage of parallel processing, ParaView data must be partitioned and distributed across the cooperating processes. While ParaView will import unpartitioned data and then partition and distribute it, best performance (by far) is attained when the input data is explicitly partitioned into independent subsets at the time it is loaded, enabling ParaView to import data in parallel. ParaView supports SILO data (see SILO), which incorporates a parallel, partitioned representation, as well as a comprehensive set of parallel XML formats, which utilize a metadata file to associate partitions found in separate files into a single logical dataset. In addition, ParaView supports a "brick of values" format enabling single files containing data defined on rectilinear grids to be partitioned and imported in parallel. This is not done with a metadata file; rather, the file is described to ParaView using a dialog that is presented when a file with a .raw extension is imported (this importer is also among the options presented when an unrecognized file type is imported). For more information on ParaView file formats, see VTK File Formats.

For more information on ParaView, see www.paraview.org

Running IDL on Spur

To run IDL interactively in a VNC session, connect to a VNC server on spur as described above, then do the following:

  • If the vis module is not yet loaded, you must load it:

    vis2$ module load vis

  • Load the IDL module:

    module load idl

  • Launch IDL:

    vis2$ idl

    or Launch the IDL virtual machine:

    vis2$ idl -vm

If you are running IDL in scripted form, without interaction, simply submit an SGE job to the 'vis' queue that loads IDL and runs your script. The 'vis' queue only allocates to spur's vis nodes.

If you need to run IDL interactively from an xterm on your local machine and outside of a VNC session, you will need to run an SGE job in the vis queue to allocate a Spur compute node. A vncserver job is an easy way to do this, as documented above in "Establishing Interactive Access Via VNC". The output, coming by default to ~/vncserver.out, will include the name of the node that has been allocated to you by SGE (search for "running on node"). Note that this will start a vncserver process on the compute node which you can safely ignore. Alternatively, you can avoid running the vncserver process by qsub'ing your own SGE job script containing two commands:

hostname
sleep n

where n is the number of seconds you wish to allocate the node for. This must be less than or equal to the time specified in the -l ht=hh:mm:ss SGE argument. See the section on "The SGE Batch System" of the Ranger User Guide for more information on submitting jobs via SGE's qsub command.

Once you have the name of the allocated node, you can ssh to it through the login node. From an X terminal window on your local machine.

mymachine$ ssh -Y spur.tacc.utexas.edu

will result in a command prompt on the spur login node. From there, ssh to the compute node:

spur$ ssh -Y vis[1-7,big]

This will result in a command prompt on the compute node. Commands that create X windows from that command prompt will create them on your local screen. Note that graphics programs run from this command prompt will be significantly slower than when run through a VNC session.

Running Amira on Spur

Amira runs only on node vis6 of spur. You must request this node explicitly in qsub, either in your script or on the command line using the argument "-l h=ivis6". Note the leading 'i'!

spur$ qsub -l h=ivis6 -A <TG-MyAcct> /share/doc/sge/job.vnc

After connecting to a VNC server on spur, as described above, do the following:

  • If the vis module is not yet loaded, you must load it: module load vis
  • Load the Amira module: module load amira
  • Launch Amira: vglrun amira

Last updated January 9, 2012