Ranch User Guide

System Overview
System Access
Transferring your files to Ranch

System Overview

HPC machines are used primarily for scientific computing and therefore their account owned disk space ($HOME directory size) is limited. This is also true for TACC's visualization systems. The Ranch system serves HPC and Vis community machines by providing a massive, high-performance file system for archiving files.

TACC's long-term mass storage solution is a Sun Microsystems StorageTek Mass Storage Facility named Ranch (ranch.tacc.utexas.edu). Ranch utilizes Sun's Storage Archive Manager Filesystem (SAM-FS) for migrating files to and from a tape archival system with a current storage capacity of 1 Petabyte(PB).

Architecture

Ranch's disk cache is built on a Sun ST6540 disk array containing approximately 50 Terabytes(TB) of spinning disk. This disk array is controlled by a Sun x4600 SAM-FS Metadata server, which has 16 CPUs and 32 GB of RAM.

A single Sun StorageTek SL8500 Automated Tape Library houses all of the offline archival storage. Each SL8500 library contains 10,000 tape slots and 64 tape drive slots. Each tape is capable of holding 1TB of uncompressed data, so when fully populated, a single SL8500 library can house 10 PB. Each SL8500 library also contains 4 handbots to manage tapes and move them to or from the tape drives. If necessary, up to 4 SL8500 libraries can be integrated into a single archival solution, allowing for a maximum offline storage capacity of 40 PB.

The current Ranch configuration has 10,000 tapes, and is capable of housing 10 PB of uncompressed data. Future upgrades will greatly increase this capacity.

System Access

Methods of Access

The preferred way of accessing Ranch, especially from scripts, is by using the TACC-defined environment variables $ARCHIVER and $ARCHIVE. These variables define the hostname of the current TACC archival system, $ARCHIVER, and each account's personal archival space, $ARCHIVE. These environment variables help ensure that scripts will continue to work, even if the system itself changes in the future.

Currently, direct login to Ranch is allowed so you can create directories and demigrate files from tape back to the disk subsystem for later transfer to TACC machines, or personal computers. However, since Ranch is an archive system, any files which have not been accessed recently will be stored on tape, so it is recommended that you use the 'stage' command documented below to retrieve files from tape before attempting to access them. We also recommend that you use 'tar' or another utility to bundle large numbers of small files together for more efficient storage and retrieval on Ranch.

Ranch access is not allowed from within job scripts on other TACC resources; data must be transferred from Ranch in order to be available to running jobs.

Logging into Ranch

From most TACC machines, you can access Ranch using rsh, as in the following example.

lonestar% rsh $ARCHIVER

The above method will usually not request a password. From the outside world, however, and from TACC systems where rsh is not available, use ssh by typing:

localhost% ssh ranch.tacc.utexas.edu

When using ssh, expect to type in a password.

File Systems

Ranch uses the Storage and Archive Manager File System (SAM-FS). SAM-FS contains several commands to manage the storage and location of files stored on Ranch. To get a full description of usage for any of the commands below, use the manpages on Ranch by logging in and typing "man <command>".

List of SAM-FS Commands

Command Description
stage retrieve files from tape and place in disk cache
sls similar to ls, with more migration information
sfind SAM-FS find
sdu du replacement - size of archived directory/file

Transferring your files to Ranch

High Speed File Transfers

SSH and BBCP

We support the use of ssh/scp and bbcp to transfer files to the Archive. The simplest way to transfer files to Ranch is to use the Unix scp command.

You can also use the Unix scp command to directly copy files to Ranch:

scp <file> ${ARCHIVER}:${ARCHIVE}/<filename>

where <file> is the name of the file to copy and <filename> is the path to the archive on Ranch. For large numbers of files, you may wish to use the tar command to create an archive of one or more directories before transferring the data to Ranch, or as part of the transfer process.

To use ssh to create a 'tar' archive file from a directory, you can use the following alternatives to copy files to Ranch

($ARCHIVER):tar cvf - <dirname> | ssh ${ARCHIVER} "cat > ${ARCHIVE}/<tarfile.tar>"

where <dirname> is the path to the directory you want to archive, and <tarfile.tar> is the name of the archive on Ranch.

You could add the -z option to the gzip command to create a compressed archive. In general, however, higher performance can be achieved without the use of compression.

bbcp -T '/usr/bin/rsh -l %U %H /usr/local/bin/bbcp' \
-S '/usr/bin/rsh -l %U %H bbcp' <file> ${ARCHIVER}:${ARCHIVE}/<filename>

where <file> is the name of the file to copy and <filename> is the path to the archive on Ranch. These options allow for transfer without the need to type a password. You can see all options by typing the following command:

bbcp -h

Here are a few bbcp options that you might find useful:

  • Like cp, bbcp has a -r option for recursively transferring directories.
  • Often during large transfers, the connection between systems is lost. The -a option gives bbcp the ability to pick up where it left off.
  • The -P option displays a progress message every seconds, which may also be useful during large transfers.

The bbcp man pages are available here: http://www.slac.stanford.edu/~abh/bbcp/.

The multistreaming transfer ability of bbcp makes it ideal for large files. It can break up the transfer into multiple simultaneous streams, thereby transferring data much faster than single-streaming utilities such as scp and sftp. For more information, see the man page.

For large amounts of data, create smaller tar files; perhaps breaking the data up by subdirectory. This will also make it more efficient to retrieve portions of your data, as needed. If you are concerned about space and need to compress the tar files, please try to do so when the system is not heavily loaded. We recommend that small files be tarred together and compressed, but you should try to keep tar files under 10 GB if at all possible (this reduces the chance of file corruption). Binary data does not compress, so you can save that step.

Staging Data

To stage data, (begin the process of retrieving from tape), before transferring back from Ranch, do:

ssh $ARCHIVER -w -r stage <file list>

Once this command completes, then do:

rcp $ARCHIVER:<file list> <file list>

Or, you can login to Ranch and issue the commands from there. Use the following command to identify files which are on tape or disk:

ranch$ sls -2
-rwxr-xr-x 1 username G-81769 349 May 4 2008 filename
--------- ----- -- -- dk ti
-rwx------ 1 username G-81769 349 Jun 14 2008 filename
O-------- ----- -- -- dk ti

The third line of the output will list attributes related to archiving:

Status Description
O The file is offline, removed from disk, and is only on tape.
P The file is offline with partial online
E The tape where the file resides has been flagged as "damaged." Contact TACC User Services.
- The file is online (and has not been copied to tape.)

Files in the offline state should be staged using the stage command before attempting to retrieve them.

Remote ls (rls)

The rls command, where available, allows you to view your files on a remote system. It can be used just like a normal ls. Be sure to include the $ARCHIVE variable to give rls the correct path. For example:

lslogin2$ ./rls -la test*
-rw-r--r-- 1 username support 30720 Sep 5 08:50 (DUL) /archive/username/test.tar.bz2

Last updated: March 8, 2012