TACC Legacy Computing Program

 

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin operates a number of large-scale computational resources funded by the National Science Foundation. These resources typically enjoy a very busy production life of 4-5 years serving the United States academic research community. At the end of their production life, maintenance costs of these systems increase, and the scarce space in the TACC data center is better spent on newer hardware that delivers higher performance per watt.

Most supercomputers are installed in large configurations that require an enormous investment in power and cooling infrastructure, a convenient feature of modern supercomputer architecture is that many systems can be broken up into much smaller systems that make lower demands on infrastructure, and operate in more traditional machine rooms. While it no longer makes sense for TACC to operate these computing resources, there may be residual value in both educational and research applications to be obtained from the systems in places where advanced computing systems are otherwise limited (or completely unavailable).

In order to be good stewards of the public investment in scientific computing, the TACC Legacy Computing Program seeks to place this retired hardware where it can still do good. Starting in 2013 with the decommissioning of the Ranger supercomputer, TACC sent hundreds of the original Ranger servers both to destinations around the state, and to Southern Africa. In collaboration with the Centre for High Performance Computing (CHPC) in South Africa, Ranger sub-clusters were sent to a number of universities in several African Nations, where hundreds of students who had never previously had access to high performance computing received their first exposure to these technologies.

TACC has continued this program with more recent system retirements, benefitting a number of universities both in the USA and abroad.

Program Goals

  • Ensure good stewardship of public investment in scientific computing
  • Foster capacity building in HPC around the world
  • Smooth the "on-ramp" and integration between campus-level and national HPC resources
  • Disseminate best practices in HPC operations and software
  • Jumpstart new collaborations

Ultimately, the program is intended to improve computational science nationally by improving the utilization and efficiency of leadership systems, and to make operations of leadership systems easier by building both a base of more experienced users and a pipeline for staff with relevant experience.

Systems Currently Available

TACC currently has servers available for Legacy Program partners from its previously retired Lonestar-4 and Stampede systems.

Stampede was a Dell PowerEdge system based on Intel's E5 Sandy Bridge processor and equipped with Intel Xeon Phi coprocessors. A compute node consists of a Dell C8220z double-wide sled in a 4 rack-unit chassis with 3 other sleds. Each node contains two Xeon Intel 8-Core 64-bit E5-processors (16 cores in all) on a single board, as an SMP unit. The core frequency is 2.7GHz and supports 8 floating-point operations per clock period with a peak performance of 21.6 GFLOPS/core or 346 GFLOPS/node. Each node contains 32GB of memory (2GB/core). The memory subsystem has 4 channels from each processor's memory controller to 4 DDR3 ECC DIMMs, each rated at 1600 MT/s (51.2GB/s for all four channels in a socket). The processor interconnect, QPI, runs at 8.0 GT/s between sockets. The Intel Xeon Phi is a special production model with 61 1.1 GHz cores with a peak performance of 16.2 DP GFLOPS/core or 1.0 DP TFLOPS/Phi. Each coprocessor contains 8GB of GDDR5 memory, with 8 dual-channel controllers, with a peak memory performance of 320GB/s.

Lonestar-4 was a Dell PowerEdge system based on Intel's Westmere processors. Nodes consist of two Xeon Intel Hexa-Core 64-bit Westmere processors (12 cores in all) on a single board, as an SMP unit. The core frequency is 3.33GHz and supports 4 floating-point operations per clock period with a peak performance of 13.3 GFLOPS/core or 160 GFLOPS/node. Each node contains 24GB of memory (2GB/core). The memory subsystem has 3 channels from each processor's memory controller to 3 DDR3 ECC DIMMs, running at 1333 MHz.

UPDATE: While most resources have been allocated, we have a few components left from the Lonestar-4 and Stampede systems that have not yet been allocated. We anticipate a small number of additional systems becoming available toward the end of 2019 from a variety of smaller system retirements.

Program elements

Partner institutions will execute an MOU with TACC that will include the following commitments.

From TACC

  • Long-term loan of "retiring" TACC hardware
  • Brief consultation with TACC on facilities, software stack requirements
  • Access to training programs at TACC

From the Partner

  • Commitment to help large users in their community find out about national resources.
  • Explore opportunities for additional collaborations.
  • Commitment to report progress and success stories back to TACC periodically.

How do I apply?

To apply for servers, please send a short application via email to us at legacy@tacc.utexas.edu. The application should have a cover page with the name of the applying institution, and the email and phone information for a point of contact. The applications should contain a short narrative with:

  • Description of the Institution (<0.5 page).
  • Intended use of any equipment (education, research, potential users and fields of science, number of students impacted, etc.; no more than 1.5 pages).
  • Facilities description (<1 page, describe the physical environment in which you could host systems, including power, cooling, floor space, networking, available storage, etc.).

Limitations and other Warnings

This is hardware that has been retired for a reason. There is typically no remaining warranty on the components, and they are often near "end of life" for support from the original manufacturers. Failures can and will happen. This is not meant to be "production quality" equipment, and should not be thought of as a production alternative to buying new hardware – it will be less efficient to operate and less reliable than new systems. New software updates may not always work.

TACC has limited staff resources to support this once it is retired — staff are paid on grants to support the replacement systems. While TACC will make a best effort to provide some initial guidance, we can't take on a permanent support role in operating this equipment, you are on your own.

The hardware you will receive was once part of an integrated system, but is no longer. While TACC can provide some compute servers, and usually enough networking to let servers within a cabinet communicate to each other, you will typically need to provide your own external network connections, storage systems, software stack and licenses, power whips, etc. Success almost always means adding some resources from your local computing environment.

Past recipients and their stories

Retired systems have been distributed in support of research and education all over the world. Here are a few of the many examples of organizations taking part in this program.

South African Centre for High Performance Computing

Established in 2007, the Centre for High Performance Computing (CHPC) in South Africa currently has about 1,000 users from academia and industry. The center supports research from across a number of domains and participates in a number of large-scale international projects, such as CERN and the Square Kilometer Array (SKA) radio telescope project. The SKA Readiness Project is one such project; it aims to distribute HPC equipment to the eight African partner countries that will be hosting SKA together with South Africa. These systems will be used for training and preparation for the data processing requirements associated with SKA.

SKA Africa partner countries have already received HPC systems from three supercomputers during the first phase of deployment, including racks from TACC's Ranger system. The second phase of the SKA Readiness started in 2018, with the contribution and distribution of portions of TACC's Stampede system.

Dallas Federal Reserve

The Federal Reserve Bank of Dallas is exploring ways in which high performance computing might add value in support of their mission with 20 racks from Stampede. The Dallas Fed has limited space available for a cluster of this size in their existing compute space, and so they partnered with another organization to house their system. This highlights some of the challenges that organizations face when beginning to seriously explore the application of high performance computing in their business.

Individual Researchers

Many individual academic departments and researchers have also taken part in the TACC Legacy program, often with only a single rack of servers. Systems of this size don't require additional switches to interconnect nodes (beyond what is already contained in a rack), and have power and cooling demands that can often be satisfied in existing small computer rooms. At the same time these small systems can have an outsized impact on the productivity of individual researchers, and can form the basis for growth to much larger numerical simulations.