Weijia Xu

Research Engineer
Manager, Data Mining and Statistics

Data Intensive Computing


Dr. Weijia Xu is the group lead for Data Mining & Statistics group. Prior to joining TACC, he obtained a master's degree in Biological Sciences and a doctoral degree in Computer Science from The University of Texas at Austin.

Dr. Xu's main research interest is in the field of large scale information management and analysis. The goal of his research is to enable data driven discoveries through developing new methods and applications that facilitate the data to knowledge transfer process. Dr. Xu has extensive experiences in working with domain scientists in database and analytical methods development. Dr. Xu has over thirty peer-reviewed conference and journal publications in similarity based data retrieval, data analysis and information visualization with data from various scientific domains.

Areas of Research

Big Data

Data Science

Cloud Computing

Large scale data analysis with statistical and computational methods and applications

Information Visualization and Visual Analytics

Similarity based indexing and retrieval

Selected Publications & Presentations

For a full list and most recent publication please refer to https://goo.gl/HvCInH

Weijia Xu, Ruizhu Huang, Maria Esteva, Jawon Song, Ramona Walls (2016) "Content-based Comparison for Collections Identification", in Proceedings of International Conference on Big Data (BigData2016), Dec. 5-8, Washington DC, USA

Amit Gupta, Weijia Xu, Natalia Ruiz-Juri, and Kenneth Perrine (2016) "A Workload Aware Model of Computational Resource Selection for Big Data Applications", in Proceedings of International Conference on Big Data (BigData2016), Dec. 5-8, Washington DC, USA

Weijia Xu, Ruizhu Huang, Hui Zhang, David Walling and Yaakoub El-Khamra Empowering R with High Performance Computing Resources for Big Data Analytics information, (2016) book chapter in "Conquering Big Data with High Performance Computing", R. Arora ed. Springer. pp.191-218

Ruizhu Huang and Weijia Xu, "Performance evaluation of enabling logistic regression for big data with R," Big Data (Big Data), 2015 IEEE International Conference on, Santa Clara, CA, 2015, pp. 2517-2524. doi: 10.1109/BigData.2015.7364048

Lee Thompson, Weijia Xu, and Daniel Miranker (2014). The Adaptive Projection Forest: Using Adjustable Exclusion and Parallelism in Metric Space Indexes In Proceedings of IEEE Big Data 2014 Conference, Oct 26-30, Washington DC, USA

Weijia Xu, Maria Esteva, Suyog D Jain, Varun Jain (2014). Interactive Visualization for Curatorial Analysis of Large Data Collections, Information Visualization April 2014 vol. 13 no. 2 159-183

Lee Thompson, Weijia Xu, and Daniel Miranker (2013). Fast Scalable Selection Algorithms for Large Scale Data. In Proceedings of IEEE Big Data 2013 Conference, Oct 6-9, Santa Clara, CA, USA

Nicholas Woodward and Weijia Xu (2012). On Automatically Tagging Web Documents from Examples. In Proceedings of ACM Special Interest Group on Information Retrieval (SIGIR) 2012 Conference, Portland, Oregon, Aug 11-16, 2012, pp. 1111-1112.

Shang Lei, Weijia Xu, Stuart Ozer, and Robin Gutell (2012). Structural Constraints Identified with Covariation Analysis in ribosomal RNA PLoS ONE 7(6): e39383. doi:10.1371/journal.pone.0039383

Weijia Xu, Wei Luo, Nicholas Woodward (2012). Analysis and Optimization of Data Import with Hadoop. In Proceedings of 9th High Performance Grid and Cloud Computing (HPGC'12) in conjunction with IEEE International Parallel and Distributed Processing Symposium (IPDPS'12) May 21-25, 2012, Shanghai, China, pp.1058-1066.

Maria Esteva, Weijia Xu, Suyog D Jain, Jenifer L Lee, Wendy K Martin (2011). Assessing the Preservation Condition of Large and Heterogeneous Electronic Records Collections with Visualization. International Journal of Digital Curation, 6(1):45-57

Xu, W., Esteva, M. and Jain, S.D. (2010). Visualizing Personal Digital Collections. In Proceedings of IEEE/ACM Joint Conference on Digital Library (JCDL'2010), June 21-25 2010, Gold Coast, Australia, ACM, New York, NY, USA, pp.169-172.

Xu, Weijia, Ozer, Stuart, Gutell Robin R (2009). Covariant Evolutionary Event Analysis for Base Interaction Prediction Using Relational Database Management System for RNA. Lecture Notes in Computer Science, 2009, vol 5566 pp 200-216

Xu, W., Miranker, D.P., Ramakrishnan, S., Mao, R., and Willard, W. (2008). Anytime K-Nearest Neighbor Search for Database Applications. In Proceedings of the First International Workshop on Similarity Search and Applications (SISAP '08). IEEE Computer Society, Washington, DC, USA, 139-148.

Mao, R., Xu, W., Singh, N. & Miranker, D.P. (2005). An Assessment of a Metric Space Database Index to Support Sequence Homology. International Journal on Artificial Intelligence Tools, 14(5): 867-885

Xu, W., Briggs, W. J., Padolina, J., Liu, W., Linder, C. R. & Miranker, D.P. (2004) Using MoBIoS' Scalable Genome Joins to Find Conserved Primer Pair Candidates Between Two Genomes, Bioinformatics 20:I355-I362

Xu, W. & Miranker, D.P. (2004) A Metric Model of Amino Acid Substitution, Bioinformatics, 20(8):1214-1221


Ph.D., Department of Computer Science,
University of Texas at Austin

M.A., School of Biological Sciences,
University of Texas at Austin

B.S., Biochemistry & Molecular Biology,
Peking University