Latest News


JavaScript findLinksAndSetTargets()
(to open ext. links in new window)

Story Highlights

TACC is assembling a large dataset of COVID-19-related tweets: capturing 40 million tweets a day, cleaning and analyzing the data, and making the dataset available to researchers through a Github repository.

The center's HPC resources make fast analysis of the dataset possible. Initial work created a shareable set of n-grams (most used terms) from the tweet collection. Future projects include a searchable public database, entity analysis, and event detection.

UT Austin social science researchers are beginning to use the dataset to explore misinformation and racist messaging on Twitter, and to understand how communities share (or don't share) information.


Faith Singer

Communications Manager

Jorge Salazar

Technical Writer/Editor