Hadoop Distributed File System

Rustler is an experimental system for exploring new storage and data compute techniques and technologies. TACC used Rustler to explore Hadoop and HDFS based computing at TACC prior to Wrangler.

Rustler will be used to explore CEPH and Swift object stores to help researchers understand their role in the converging HPC and Big Data environments. Rustler will enable exploration of data streaming analysis environments, such as Apache Storm, for real-time event-driven data computing.

System Specs:
  • The system uses 64 Hadoop data nodes, each a Dell R720XD dual socketed Ivy Bridge servers with 128 GB of RAM and 16 1TB hard disks, as the compute and HDFS servers.
  • The system is controlled from two Hadoop Name Nodes with identical specifications supporting the YARN job manager.
  • Migration of data in and out of the HDFS file system is supported by the system login node, which has 34 TB of local storage space as a traditional UNIX file system to be used to migrate data to and from the primary HDFS storage system.