This award is to fund a transformational data intensive resource for the open science community called Wrangler. Big data is creating tremendous new scientific opportunities, but also many new challenges for data-driven science. The computational needs of large-scale data-driven science vary across domains and applications but there are some requirements that are widely applicable: capacious, high performance, reliable data storage; support for diverse data types and access methods; and support for embedded analytics that eliminate costly data movement. Wrangler is a high-performance system with an innovative embedded data analytics capability that far exceeds the capabilities typically available in more traditional large-scale computing systems.

Wrangler is anchored by large-scale flash storage that effectively supports computation on both structured and unstructured data. The storage is configured for ultra-high reliability using replication at two locations, unprecedented analytics capabilities and innovative NAND flash storage. The resource contains 3,000 Intel Haswell cores, offering the most powerful embedded analytics capabilities in the world for a wide range of data intensive science. Wrangler is connected at 100 Gbps to Internet2, the fastest available connection to the biggest research network. The project also offers a data docking service for receiving and ingesting data shipped on physical media.

Funding Source(s)

Related Link(s)

Dan Stanzione