SAN

General

A Storage Area Network (SAN) is a disk storage system that is accessible from multiple systems through a high-speed interconnect, known as a Fibre Channel (FC) network. The TACC Lonestar and Longhorn systems (and soon Maverick) are connected to a large SAN file system and have exclusive connectivity to it through a FC network; thereby providing dedicated channels for accessing files in a central storage location. The SAN network consists of an FC switch with special network connections (HBAs) to each HPC system, and connections to arrays of disks, forming a storage area network as shown in the Figure 1. A SAN system provides better throughput, larger capacity storage, and larger scalability (100's of Terabytes) than NFS file systems.

Figure 1. TACC Storage Area Network (SAN) - User View.

The TACC SAN is an allocated resource for projects that need access to large data sets. The SAN is ideal for storing data between batch jobs: simulation restart files and intermediate results. It can be used as a data repository for data mining, for research in climate and weather, bioinformatics, and other fields that need immediate "on-line" access to data. Visualization is another computational area that can take advantage of the SAN: users that generate a significant amount of simulation data on Lonestar and Longhorn, usually need the data to be accessible on Maverick, or a remote machine, for rendering and visualizing.

Requesting Space on the SAN

The SAN storage is not accessible to everybody. The Principal Investigator of a project must request a SAN space allocation. An allocation period lasts for one year, and may be renewed throughout the life of any project. Allocations can be requested online in the Add new resources of the Allocations section in the TACC web portal. Disk storage of 1/2GB to 1/4TB is awarded, according to the merits of the proposed usage. Once the space has been allocated, a directory is created for the project, with the size limited by a quota.

Using the SAN system

A directory for each project allocation is created with the pathname:

/san/hpc/<project_name>

where the is the name of the project that was awarded the allocation. (Don't remember you project name? A list of project names, along with usage, is reported at each login.) Files and directories within the SAN can be manipulated like any Unix file system. The SAN appears as an NFS mounted file system when it is listed with the df command:

   % df
   Filesystem       512-blocks    Free        %Used  Iused   %Iused  Mounted on
   corral:/san/hpc  10288005120   8373668544  19%    729056  1%      /san/hpc
While the SAN acts just like any other Unix file system, there are special SAN commands that MUST be used to obtain optimal transfer rates between a system's regular file system and a SAN project directory; otherwise, the transfer performance may be as much as a factor of 10 or more slower. To transfer files to and from the SAN system within any shell, use the sancp command instead of the Unix cp command. The sancp command has all the functionality of the cp command, including an identical syntax:

% sancp [OPTION]... SOURCE DEST

where either the SOURCE or DEST target includes a path to a SAN directory, and OPTION is any of the usually cp options found in CP(1) man page. There is one caveat when using the SAN system. The IBM linker can not write memory-mapped files to the SAN file system. Hence, when the linker (ld) is invoked to write an executable file (a.out file) in the SAN file system, it fails with and I/O error. The loader reports (incorrectly as a read error) the following error (e.g., for the following load command, xlf90 prog.o):

ld: 0711-711 ERROR: Input file prog.o is empty

Hence, users should not try to build executables in their SAN directory; however, executables may be stored and executed from SAN directories.

SAN Capacity and Performance