TACC will host the first Catapult cluster available outside of Microsoft, where the technology is already accelerating various enterprise, search and machine learning scenarios. This cluster will be a 384-node demonstration system that Microsoft Research is deploying in collaboration with TACC. The goal of the project is to investigate the use of FPGAs as computational accelerators to improve application performance, reduce power consumption, and open new avenues of investigation for researchers.
"TACC was a natural choice to deploy Catapult," said Bill Barth, director of TACC's High Performance Computing group. "As one of the largest providers of advanced computing systems for open data- and compute-intensive science in the US, TACC serves as a bridge between the computing community and researchers in engineering and science. The open research community doesn't have a large, publicly available FPGA system, so this will be quite exciting for both the scientists and engineers and the FPGA research computing communities."
Microsoft Research began Project Catapult as an experiment to improve the speed and efficiency of key algorithms used by Bing, Microsoft's search service, where it accelerates a portion of the web ranking stack by up to a factor of 40 over software.
"The typical approach to large scale commercial datacenters is to deploy millions of identical servers so that your services can move around as individual servers fail or demand changes," said Doug Burger, director client and cloud apps, Microsoft Technology and Research. "What we've shown is that FPGAs can be deployed at datacenter (and supercomputer) scale, and using them can result in much higher workload throughput. But we can only do so much…we're excited to unleash the creativity of academia to identify new, high-value applications and systems that can leverage datacenter-scale FPGA computing."
Earlier this year, Microsoft also applied Project Catapult to machine learning, accelerating a deep convolutional neural network (CNN) more than 3x over the performance of a previous port of CNNs to FPGAs done at Microsoft. CNNs are computationally intensive, but offer improved accuracy over other machine learning approaches in non-trivial recognition tasks such as large-category image classification and automatic speech recognition. When compared to GPGPUs, FPGAs offer a substantial power advantage in this application, with the Microsoft implementation using Project Catapult requiring only 25W per FPGA, versus 235W for GPGPU-based acceleration.
"TACC was a natural choice to deploy Catapult. As one of the largest providers of advanced computing systems for open data- and compute-intensive science in the US, TACC serves as a bridge between the computing community and researchers in engineering and science. The open research community doesn't have a large, publicly available FPGA system, so this will be quite exciting for both the scientists and engineers and the FPGA research computing communities."Although Microsoft is investing in a number of key applications internally, the potential application space, including HPC, that can benefit from FPGAs is huge, and needs to be explored.
Additionally, FPGAs can benefit from advances in programming tools, definitions of new libraries to support HPC applications, and new system abstractions that leverage programmable hardware.
"TACC understands how science and engineering users work, and the types of problems they are trying solve," says Barth.
"TACC will bring this expertise to the collaboration with Microsoft Research as we work together to first understand what algorithms are likely to be a good match for this type of system, and then work closely with users to implement their algorithms on the FPGAs. Long term, our hope is that this work will lead to a generalizable approach that presents software programmers with an easy-to-use programming abstraction."
The system consists of 384 2-socket Intel Xeon-based nodes, each with 64GB of memory and an Altera Stratix V FPGA with 8GB of local DDR3 memory. FPGAs communicate to their host CPUs via a PCIe Gen3 x8 connection, providing 8GB/s guaranteed-not-to-exceed bandwidth, and each FPGA can read and write data stored on its host node using this connection.
The FPGAs are connected to one another via a dedicated network using high-speed serial links. This network, called CatNet (for Catapult Network), forms a two dimensional torus within a pod of 48 servers, and provides low latency communication between neighboring FPGAs. This design supports the use of multiple FPGAs to solve a single problem, while adding resilience to server and FPGA failures.
- Two Xeon E5-2450, 2.1GHz, 8-core, 20MB Cache, 95W
- 64GB RAM
- Four 2TB 7.2k 3G SATA 3.5"; Two 480GB 6G Micron SATA SSD 2.5
- Intel 82599 10GbE Mezz Card
- Altera Stratix V FPGA Card
- Operating System: Windows Server 2012