IEEE BigData 2014

2014 IEEE International Conference on Big Data

Invited Speakers

Kristen Fagnan

Kirsten Fagnan is a high-performance computing and bioinformatics consultant at the Lawrence Berkeley National Lab's National Energy Research Scientific Computing Center (NERSC) and the Joint Genome Institute (JGI). The JGI generates large amounts of genomic sequence data that scientists comb through to find connections between gene and function. By understanding the genomes of microbes, fungi and plants, scientists gain insight into better biofuel production, carbon sequestration and other DOE mission areas. As a consultant, Fagnan is responsible for assisting scientists in managing their datasets, optimizing and debugging user code for data analysis, strategic project support of data management, training and maintaining software applications and libraries. Fagnan's interests are scientific computing, mathematical biology and education technology. Fagnan earned her PhD in Applied Mathematics at the University of Washington in 2010, her BA from UC Berkeley in 2002 and joined LBNL in 2010 as a petascale postdoctoral fellow.

Kirsten's Talk: Computing and Data Management at the Joint Genome Institute

Abstract: The DOE Joint Genome Institute (JGI) generates terabytes of genomic sequence data per day. The JGI User community send in samples that are sequenced by Illumina or PacBio systems and then processed by one of the JGI's many analysis pipelines. The JGI's users are provided with both the original sequence data as well as the cleaned up, assembled and annotated results. Tracking this data through the sometimes year+ long process of analysis, assembly and annotation was a challenge that the JGI was mostly able to overcome with its new hierarchical data management system, the JGI Archive and Metadata Organizer (JAMO). In this talk I will provide an overview of the data challenges at the JGI, the development and deployment of JAMO and how we've altered the JGI's data management policies.

Jessica Trelogan

Jessica Trelogan has an MA in Classics from the University of Texas at Austin, where she has also completed extensive graduate coursework in Geography. She is a Research Associate at the Institute of Classical Archaeology specializing in GIS and remote sensing with long experience in the application of those technologies to archaeological fieldwork, conservation, research and publishing. Currently she is also acting as curator of a large and complex data collection that represents several decades' worth of excavation, survey, and study at sites in Italy and Ukraine. She has presented papers related to that work at conferences in Computing in Archaeology, Digital Curation, and Digital Humanities.

Jessica's Talk: Unlocking the Power of High Performance Computing for Big Humanities Data Curation

Abstract: As digital collections continue to grow in size and complexity, the burden of their curation and management increasingly falls on the researchers who have created them or are de-facto charged with ensuring their access and preservation. While these domain experts are naturally the best equipped to manage their own collections, as the datasets grow ever more unwieldy, they may lack the technical expertise to make effective use of available resources for complex curation tasks. Curators at the Institute of Classical Archaeology (ICA) have made significant progress toward knocking down these knowledge barriers during an ECSS-supported allocation through XSEDE (Charge No. TG-HUM130001). Faced with the daunting task of archiving an actively evolving collection of approximately 1 million files amassed over 40 years of archaeological research, the ICA team needed a way to efficiently assess and cull their collection without interrupting ongoing research and publication work. Together with HPC and data collections experts at TACC, they developed a metadata extraction workflow and training manual for analyzing their collection in an HPC environment. This work has enabled the ICA curators with minimal technical training to iteratively assess their own collection as it continues to grow and evolve. As the domain expert involved in this project, and as a "non-traditional" HPC user, I will be discussing the experience of this fruitful partnership with an emphasis on the successes, pitfalls, and lessons learned.

Earl Joseph

Earl Joseph is IDC's Program Vice President for High-Performance Computing (HPC) and Executive Director of the HPC User Forum. He leads IDC's HPC technical computing team, driving research and consulting efforts associated with the United States, Europe and Asia-Pacific markets for technical servers and supercomputers, clouds, visualization and clustering. This research includes market sizing, market share, segmentation, tracking, trending, data center issues, and vendor analysis for multi-user technical server technology. Dr. Joseph advises IDC clients on the competitive, managerial, technological, integration and implementation issues for technical servers. He also founded and operates IDC's highly successful, high-end HPC User Forum.

Dr. Joseph's areas of expertise include technical computers from entry-level servers to high-end capability supercomputers, software, storage and networking solutions for technical computing. He has worked for four technical computing companies in multiple marketing and R&D roles. Dr. Joseph has a strong background in computer technologies and future directions in technical computing. Prior to joining IDC in 1999, Dr. Joseph spent 17 years in IT in the HPC market space, most recently at SGI and Cray Research. He's an industry veteran who has worked closely with leading HPC users around the world, focusing on their most critical market and technology requirements.

Dr. Joseph holds a Ph.D. from the University of Minnesota where his research focus was the strategic management of high technology firms, and an undergraduate degree in business and technology from the University of Minnesota.

Earl's Talk: IDC Update on How Big Data Is Redefining High Performance Computing

Abstract: IDC will present an overview of how big data is redefining computing and the high performance computing (HPC) space. First a short overview of the HPC market will be provided. Then a summary of the current major challenges in using big data today and the challenges of expanding the use of big data will be provided. The presentation will then cover a number of real-world examples of very large big data today and future uses of big data -- showing how many organizations are changing their fundamental approach to computing due to the opportunities provided by big data.

Vas Vasiliadis

Vas Vasiliadis is Director of Products, Communication and Development at the Computation Institute (CI). Vas has over 25 years of experience in operational and consulting roles, spanning strategy, marketing and product management. Most recently, Vas was a principal at Strategos, the innovation consulting firm founded by Gary Hamel, where he led Fortune 100 management teams in defining their growth agenda. Prior to Strategos, Vas led marketing efforts at Univa, a leading provider of grid and cloud computing solutions. Vas joined Univa's founding team shortly after inception and was instrumental in defining the product vision, raising venture capital and launching the company's initial products.

Vas' Talk: Data transfer and sharing with Globus