Published on May 29, 2015 by TACC


The Advances in Software and Hardware for Big Data to Knowledge Discovery (ASH) workshop aims to connect the latest hardware and software developments with the end users of big data. It focuses on the accessibility and applicability of the latest hardware and software to practical domain problems and hence directly facilitates domain researchers' data driven discovery. The issues in discussion include performance evaluation, optimizations, accessibility and usability of new technologies.

Hailed by some as the fourth paradigm in science, data-intensive science has brought a profound transformation to scientific research. Indeed, the data-driven discovery has already happened in various research fields, such as earth sciences, medical sciences, biology and physics, to name just a few. It is expected that a vast volume of scientific data captured by new instruments will be publically accessible for the purposes of continued and deeper data analysis. Big Data analytic will result in the development of many new theories and discoveries but will also require substantial computational resources in the process. However, many domain sciences still mostly rely on traditional experimental paradigms. It is often a major challenge to transform a solution obtained on a standalone server into a massively parallel one running on tens, hundreds, or even thousands of servers. It is a crucial issue to make the latest technology advancements in software and hardware accessible and usable to the domain scientists, especially those in the fields that traditionally lack computation and programming, but have nonetheless become the driving forces of scientific discovery.

Fueled by the big data analytics needs, new computing and storage technologies in hardware and software are also in rapid development and pushing for new high-end hardware for big data problems. These new hardware brings new opportunities for performance improvement but also new challenges. While those technologies have the potential to greatly improve the capabilities of big data analytics, such potential are often not fully realized. Due to the cost, sophistications of those technology, and limited initial application support, the new technologies often seem remote to the end users and are not fully utilized in the academia years after their invention. It is therefore very important to make those technologies understood and accessible by data scientists in a timely manner.

Meanwhile, comprehensive analytic software packages and programming environments, have become increasingly popular as open-source platforms for data analysis and need to be scaled and adapted for Big Data analysis. Those software not only provide collection of analytic methods but also have the potential to utilize new hardware transparently and reduce the efforts required of the end users. For examples, Recently members of the R and HPC communities have tried to step up to big data with R, resulting in methods for effectively adapting R to a variety of high-performance and high-throughput computing technologies. Parallel to these developments, a family of software frameworks (e.g., Apache Spark, Airavata) has been developed for executing and managing computational jobs and workflows on distributed computing resources, while providing web-based science gateways to assist domain scientists to compose, manage, execute, and monitor big data applications and workflows composed of these services.

This is the second time that the workshop will be held with IEEE Big Data Conference. The last year conference provided workshop participants with the option for one day registration to attend the workshop only or full conference registration to participate the full conference. All papers accepted by the workshop will be included in the IEEE Big data conference ( proceedings as well and archived in IEEE Xplore digital library. We are also looking for opportunities to invited extended version of workshop to be published in other journals and book chapters. Selected papers from the last year's workshop are invited to submit extended version to be published with Springer.

Topics of Interest

  • Adopting latest hardware technology with for Big Data analytics
  • Application and use cases in using cyber-infrastructure for Big Data in sciences and engineering
  • Performance tuning with new hardware infrastructure and software platform
  • Advances in hardware technology
  • Novel software platforms and models for big data collection management and analysis
  • Search and data retrieval on large scale data set
  • Service oriented architectures to enable data science
  • Science gateway for domain big data research
  • Big Data and interactive analysis languages (e.g., R, Python, and Matlab)

Workshop Schedule

The workshop schedule can viewed here.

Important Dates

Sept 14, 2015: Due date for full workshop papers submission

Sept 24, 2015: Notification of paper acceptance to authors

Oct 5, 2015: Camera-ready of accepted papers

October 29-Nov 1, 2015: Workshop date


Please submit a full-length paper (up to 8 pages, IEEE 2-column format) through the online submission system.


8.5" x 11" (DOC, PDF)
LaTex Formatting Macros

Accepted Papers

A database-based distributed computation architecture with Accumulo and D4M: an application of eigensolver for large sparse matrix Yin Huang, Yelena Yesha, and Shujia Zhou;

A novel symbolization technique for time-series outlier detection Gavin Smith and James Goulding;

Big Data Provenance: Challenges, State of the Art and Opportunities Jianwu Wang, Daniel Crawl, Shweta Purawat, Mai Nguyen, and Ilkay Altintas;

Immersive Visualization for Materials Science Data Analysis using the Oculus Rift Margaret Drouhard, Chad Steed, Steven Hahn, Thomas Proffen, Jamison Daniel, and Michael Matheson;

Join Algorithms on GPUs: A Revisit After Seven Years, Ran Rui, Hao Li, and Yicheng Tu;

Performance Evaluation of Enabling Logistic Regression for Big Data with R Ruizhu Huang and Weijia Xu;

Regularized and Sparse Stochastic K-Means for Distributed Large-Scale Clustering Vilen Jumutc, Rocco Langone, and Johan Suykens;

Scalable Dental Computing on Cyberinfrastructure Hui Zhang and Riqing Chen;

Shaping Data: Visualization Under Construction Oliver Bieh-Zimmert and Carsten Felden;

Skill Grouping Method: Mining and Clustering Skill Differences from Body Movement BigData Shinichi Yamagiwa, Yoshinobu Kawahara, Noriyuki Tabuchi, Yoshinobu Watanabe, and Takeshi Naruo;

Spatio-Temporal Similarity Search Method for Disaster Estimation Hideki Hayashi, Akinori Asahara, Natsuko Sugaya, Yuichi Ogawa, and Hitoshi Tomita;

Texture-Based Edge Bundling: A Web-Based Approach for Interactively Visualizing Large Graphs Jieting Wu, Lina Yu, and Hongfeng Yu;

Visual Analysis of Large-scale LiDAR Point Clouds Hui Zhang;

Volatility Matrix Inference in High-Frequency Finance with Regularization and Efficient Computations Jian Zou, Yunbo An, and Hong Yan;

Wrangler's User Environment Christopher Jordan, David Walling, Weijia Xu, Stephen Mock, Dan Stanzione, and Niall Gaffney;

Workshop Organization Committee

  • Weijia Xu
  • University of Texas at Austin
  • Hongfeng Yu
  • University of Nebraska
  • Hui Zhang
  • Indiana University

Technical Program Committee

  • Chris Aniszczyk
  • Twitter Inc.
  • Yong Chen
  • University of Nebraska
  • Wei Ding
  • Florida Polytechnic University
  • Adel Elmaghraby
  • University of Louisville
  • Heng Huang
  • University of Texas, Arlington
  • Xiaoyi Lu
  • The Ohio State University
  • Dhabaleswar K. Panda
  • The Ohio State University
  • Max Qian
  • JCVI
  • Smriti R. Ramakrishnan
  • Oracle
  • Zeqian Shen
  • eBay
  • Dan Stanzione
  • Texas Advanced Computing Center
  • Stephen Wong
  • Houston Methodist Hospital,
  • Weill Cornell Medical College
  • Jinrong Xie
  • University of California, Davis
  • Fang Zheng
  • IBM T. J. Watson Research Center
  • Jian Zou
  • Worcester Polytechnic Institute

Story Highlights

Faith Singer-Villalobos

Communications Manager | 512-232-5771