Latest News


Talking Parallel AI with Zhao Zhang

Published on January 30, 2018 by Aaron Dubrow

Zhao Zhang, Research Associate in the Data Mining & Statistics Group at the Texas Advanced Computing Center

Zhao Zhang is one of TACC's foremost experts in the application of advanced computing to AI. He received his Ph.D. in Computer Science from the University of Chicago and did a post-doc at AMPLab and the Berkeley Institute for Data Science (BIDS) at the University of California, Berkeley before joining TACC.

Why are machine and machine and deep learning methods valuable for researchers?

Machine learning (ML) and deep learning (DL) are two subsets of the field of artificial intelligence that have become critical methods for data-intensive scientific discovery, sometimes called "the Fourth Paradigm." These particular techniques will benefit scientific exploration in two ways: by helping scientists formulate hypotheses through the analysis of massive amounts of data; and by motivating novel approaches to reducing the dimensionality of complex simulations, making currently intractable problems solvable.

What can TACC offer researchers looking to apply ML/DL to their research problems?

TACC's mission is to enable discoveries that advance science and society through the application of advanced computing technologies. To accomplish this, we offer hardware, software, and computational expertise to facilitate domain research projects.

Our resources include a variety of large-scale advanced computing systems with diverse hardware including Intel Xeon Phi processors, Intel Xeon Scalable processors, NVIDIA GPUs, and FPGAs suitable for machine learning and deep learning.

Above and beyond our hardware, TACC currently supports a number of machine learning and deep learning frameworks that work across our many machines. For machine learning, we offer R, Python, Spark/MLlib, and Matlab. For deep learning, we currently offer Caffe, MXNet and TensorFlow, with more to come.

Finally, TACC offers computational expertise to help users port their applications to TACC resources and optimize their performance.

Are HPC systems a good fit for ML/DL problems?

HPC systems are a perfect fit for machine and deep learning applications, given their computation and communication patterns. Based on our benchmarking with a suite of image classification applications, we've found that HPC systems offer unparalleled performance in speeding up deep learning applications.

Zhang studies ways to apply the parallel computing capabilities of HPC systems to machine and deep learning frameworks and algorithms.

For instance, with the standard ImageNet-1k dataset, we can finish a 90-epoch ResNet-50 training in 20 minutes using 2,048 Intel Xeon Phi processors. And using 2,048 Intel Xeon Scalable processors, we can finish a 100-epoch AlexNet training in 11 minutes. These results are comparable to the performance on 1,024 NVIDIA P100 GPUs and are among the fastest that have been achieved anywhere. In other words, TACC offers the scientific community computational capabilities comparable to the world's leading deep learning corporations.

What are some examples of existing ML/DL/AI projects at TACC?

Academic researchers have used our systems to train image classifiers to identify brain tumors and to search through aerial images for potential breeding grounds for mosquitos that carry Zika. We also are enabling projects to automatically identify traffic patterns in Austin using video analysis and to determine which factors are most critical in forecasting extreme weather events.

What is TACC planning in ML/DL for the coming year in this area?

We will soon deploy a new system, Maverick2, for the open science community that will contain 96 NVIDIA 1080 Ti which are well-suited to machine and deep learning workflows.

We are also working with Intel to optimize distributed deep learning training using TensorFlow on Stampede2, TACC's largest system and the fastest at any university in the U.S. And based on requests from users, we'll be installing new deep learning frameworks, including PyTorch and Keras, for use on our resources.

Finally, we are collaborating closely with domain scientists on a few pilot machine learning and deep learning applications, in particular to combine AI with simulation to reduce the dimensionality of particularly difficult problems.

How can researchers learn more about ML/DL at TACC?

Researchers looking to learn more about the hardware, software and expertise available at TACC should visit the Deep Learning at TACC page and the Wrangler user guide for machine learning with Spark. These offer step-by-step instructions on how to get started using TACC systems for machine and deep learning.

Additionally, TACC hosts regular training sessions on these topics, as well as an in-depth Summer Institute covering machine learning and deep learning. (Learn more on the TACC Learning Portal.)

TACC always welcomes in-person conversations between researchers and our staff to help get new projects up and running.

What are you most excited about in this area?

Many scientific domains have already explored and even exploited machine learning and deep learning as daily data science methods, and they have been effective in many cases. At the same time, we also see a novel interaction between deep learning and traditional simulation to reduce the dimensionality in computation-hungry simulations. The best is yet to come in this area.

A lot of industry and academic effort has been put into machine and deep learning. Techniques are changing rapidly, as are the hardware, software, and applications that enable AI, and most of the work is open-source. This fast-changing development and the open-source practice will guarantee scientific users can benefit from the rapid pace of innovation.

What would you like to tell researchers thinking of applying ML/DL to their problems?

If you have a machine learning and deep learning application already, consider moving it to TACC, as we offer computing power, storage, and reliable services. If you are thinking about machine learning or deep learning to solve your domain challenges, please talk to TACC staff. We're here to help.

If you've heard of machine learning and deep learning, but aren't sure what they are or how they can be used for your research, please consider attending one of our training sessions or our Summer Institute where we teach the basics of ML and DL with real examples. We look forward to helping you solve your toughest challenges.

This feature is part of a TACC Special Report on Artificial Intelligence. From health and safety to meteorology and cybersecurity, TACC supercomputers are helping researchers apply machine learning and deep learning to basic and applied science. Learn more about TACC's efforts in this rapidly evolving area.

Read more of the AI Report Features


Faith Singer-Villalobos

Communications Manager | 512-232-5771

Aaron Dubrow

Science And Technology Writer

Jorge Salazar

Technical Writer/Editor | 512-475-9411