Weijia Xu’s CURRICULUM VITAE
Ph.D., Computer Science, University of Texas at Austin
M.S., Computer Science, University of Texas at Austin
M.A., Biological Science, University of Texas at Austin
B.S., Biochemistry & Molecular Biology, Peking University, China
Sep. 2013 to Present, Research Scientist, Group Manager
Apr. 2007 to Aug. 2013 Research Associate,
Data Mining & Statistics
Texas Advanced Computing Center
University of Texas at Austin
Lead the research group to develop methods and application supports for big data analysis including data mining, machine learning, network analysis, statistics analysis for various domain fields.
Research new methods in the mining, comparison and curation, and visualization of digital information; Collaborate with domain experts in areas including astronomy, archeology, biology, computation sciences, computational linguistic, education, geological science, information science, and transportation;
Manage high performance computing and cloud computing infrastructure including Hadoop and Spark cluster.
Sep. 2005 to Mar. 2007, Research Engineering/Science Associate II
Department of Computer Sciences
University of Texas at Austin
Designed and implemented algorithm for metric space indexing and similarity search.
Instructor May 2010 to present
University of Texas at Austin
Course taught: Introduction to Data Mining and Analysis Methods
A 12-hour course on data mining methods and their applications for Summer Statistic Institute organized by the Division of Statistic and Scientific Computation.
Lecturer Aug 2009 to present
Division of Statistic and Scientific Computation
University of Texas at Austin
Couse taught: Visualization and Data Analysis for Scientists and Engineers (SSC 374E/394E)
A three credit hour course for both undergraduate and graduate programs offered annually. Content of the course covers theories and methods in basic statistical analysis, data clustering, and data classification and information visualization.
Journal Article and Book Chapter
Weijia Xu, Ruizhu Huang, Hui Zhang, David Walling and Yaakoub El-Khamra Empowering R with High Performance Computing Resources for Big Data Analytics information, (2016) book chapter in “Conquering Big Data with High Performance Computing”, R. Arora ed. Springer. pp.191-218
Weijia Xu, Maria Esteva, Suyog D Jain, Varun Jain (2014). Interactive Visualization for Curatorial Analysis of Large Data Collections, Information Visualization April 2014 vol. 13 no. 2 159-183.
Shang Lei, David Gardner, Weijia Xu, Jamie Cannone, Daniel Miranker, Stuart Ozer, and Robin Gutell (2013). Two Accurate Sequence, Structure, and Phylogenetic Template-Based RNA Alignment Systems. BMC System Biology Vol 7:(Suppl 4):S13 Oct. 23. 2013.
Travis Brown, Jason Baldridge, Maria Esteva, and Weijia Xu (2012). The Substantial Words Are in the Ground and Sea: Computationally Linking Text and Geography. Texas Studies in Literature and Language, 54:3, September 2012.
Shang Lei, Weijia Xu, Stuart Ozer, and Robin Gutell (2012). Structural Constraints Identified with Covariation Analysis in ribosomal RNA PLoS ONE 7(6): e39383. doi:10.1371/journal.pone.0039383.
Xu, Weijia and Maria Esteva (2011). Finding Stories in the Archive through Paragraph Alignment. Literary and Linguistic Computing, 26(3):359-363 (Extended version presented at DH2010).
Maria Esteva, Weijia Xu, Suyog D Jain, Jenifer L Lee, Wendy K Martin (2011). Assessing the Preservation Condition of Large and Heterogeneous Electronic Records Collections with Visualization. International Journal of Digital Curation, 6(1):45-57 (Revised version presented at 6th IDCC).
Xu, Weijia, Ozer, Stuart, Gutell Robin R (2009). Covariant Evolutionary Event Analysis for Base Interaction Prediction Using Relational Database Management System for RNA. Lecture Notes in Computer Science, 2009, vol 5566 pp 200-216 (reprint of the paper presented at SSDBM 09).
Ramakrishnan, S.R., Mao, R. Nakorchevskiy, A.A., Prince, J.T., Willard, W.S., Xu, W., Marcotte, E.M. and Miranker, D.P. (2006). A Fast Coarse Filtering Method for Protein Identification by Mass Spectrometry. Bioinformatics 22 (12): 1524.
Mao, R., Xu, W., Singh, N. & Miranker, D.P. (2005). An Assessment of a Metric Space Database Index to Support Sequence Homology. International Journal on Artificial Intelligence Tools, 14(5): 867-885 (Reprint of the paper in BIBE03).
Miranker, D.P., Briggs, W.J., Mao, R., Ni, S., and Xu, W. (2004). Biosequence Use Cases in MoBIoS SQL. IEEE Data Engineering Bulletin, September 2004, 27 (3): 3-11.
Xu, W., Briggs, W. J., Padolina, J., Liu, W., Linder, C. R. & Miranker, D.P. (2004). Using MoBIoS' Scalable Genome Joins to Find Conserved Primer Pair Candidates Between Two Genome. Bioinformatics 20:I355-I362 (reprint of the paper presented in ISMB/ECBB 04).
Xu, W. & Miranker, D.P. (2004). A Metric Model of Amino Acid Substitution. Bioinformatics, 20(8):1214-1221.
Peer-Reviewed Conference Articles and Presentations
Weijia Xu, Ruizhu Huang, Maria Esteva, Jawon Song, Ramona Walls (2016) “Content-based Comparison for Collections Identification”, in Proceedings of International Conference on Big Data (BigData2016), Dec. 5-8, Washington DC, USA
Amit Gupta, Weijia Xu, Natalia Ruiz-Juri, and Kenneth Perrine (2016) “A Workload Aware Model of Computational Resource Selection for Big Data Applications”, in Proceedings of International Conference on Big Data (BigData2016), Dec. 5-8, Washington DC, USA
Weijia Xu, Natalia RuizJuri, Amit Gupta, Amanda Deering, Chandra Bhat, James Kuhr, and Jackson Archer (2016) “Supporting Large Scale Connected Vehicle Data Analysis using Hive”, in Proceedings of International Conference on Big Data (BigData2016), Dec. 5-8, Washington DC, USA
Weijia Xu, Amit Gupta, Pankaj Jaiswal, Crispin Taylor and Patti Lockhart (2016) “Web Application for Extracting Key Domain Information for Scientific Publications using Ontology”, in Proceedings of International Conference on Biological Ontology (ICBO2016), Aug. 1-4, Corvallis, Oregon, USA.
Ruizhu Huang, Weijia Xu and Rober McLay (2016) “A Web Interface for XALT Log Data Analysis”, in Proceedings of the 5th Extreme Science and Engineering Discovery Environment (XSEDE) Conference, July 17-21, 2016, Miami, FL, USA
Maria Esteva, Sandra Sweat, Robert McLay, Weijia Xu and Sivakumar Kulasekaran (2016) “Data Curation with a Focus on Reuse” In Proceedings of Joint Conference on Digital Libraries (JCDL) June 19-23, Newark, New Jersey, USA.
Li Yang and Weijia Xu (2016) “Computation-Aided Analysis on Film Credits” In Proceedings of the Digital Humanities 2016 (DH2016), July 11-16, Krakow, Poland
Christopher Jordan, David Walling, Weijia Xu, Stephen Mock Niall Gaffney and Dan Stanzione "Wrangler's user environment: A software framework for management of data-intensive computing system," Big Data (Big Data), 2015 IEEE International Conference on, Santa Clara, CA, 2015, pp. 2479-2486.
Ruizhu Huang and Weijia Xu, "Performance evaluation of enabling logistic regression for big data with R," Big Data (Big Data), 2015 IEEE International Conference on, Santa Clara, CA, 2015, pp. 2517-2524. doi: 10.1109/BigData.2015.7364048
Yu Qian, Hyunsoo Kim, Shweta Purawat, Jianwu Wang, Rick Stanton, Alexandra Lee, Weijia Xu, Ilkay Altintas, Robert Sinkovits, and Richard H. Scheuermann (2015). FlowGate: Towards Extensible and Scalable Web-based Flow Cytometry Data Analysis. In Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure (XSEDE '15). ACM, New York, NY, USA, Article 5, 8 pages.
Shuo Xu and Weijia Xu (2015) The System for Recognizing Chemical Names and Detecting Chemical Passages in Patent Documents. In Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, pp. 82-87, Sevilla, Spain, Sep. 9-11, 2015, 2015
Weijia Xu, Wei Luo, Nicolas Woodward, and Yan Zhang (2015). Supporting Data Driven Access through Automatic Keyword Extraction and Summarization in 2015 IEEE International Congress on Big Data (BigData Congress), pp.704-707, June 27 2015-July 2 2015
Amit Gupta, Weijia Xu, Kenneth Perrine, Dennis Bell, and Natalia Ruiz-Juri (2014). On Scaling Time Dependent Shortest Path Computations for Dynamic Traffic Assignment In Proceedings of IEEE Big Data 2014 Conference, Oct 26-30, Washington DC, USA.
Lee Thompson, Weijia Xu, and Daniel Miranker (2014). The Adaptive Projection Forest: Using Adjustable Exclusion and Parallelism in Metric Space Indexes In Proceedings of IEEE Big Data 2014 Conference, Oct 26-30, Washington DC, USA
Lee Thompson, Weijia Xu, and Daniel Miranker (2013). Fast Scalable Selection Algorithms for Large Scale Data. In Proceedings of IEEE Big Data 2013 Conference, Oct 6-9, Santa Clara, CA, USA (acceptance rate: 15%).
Yaakoub El-Khamra, Niall Gaffney, David Walling, Eric Wernert, Weijia Xu, and Hui Zhang (2013). Performance Evaluation of R with Intel Xeon Phi Coprocessor. In Workshop Proceedings of IEEE Big Data 2013: Benchmark, Performance Optimization and Emerging Hardware for Big Data System Oct 6-9, Santa Clara, CA, USA.
Weijia Xu, Maria Esteva, Jessica Trelogan and Todd Swinson (2013). A Case Study on Entity Resolution for Distant Processing of Big Humanities Data. In Workshop Proceedings of the IEEE Big Data 2013: Big Data in the Humanities, Oct 6-9, Santa Clara, CA, USA.
Esteva, Maria, Jeffrey Felix Tang, Weijia Xu and Karthik Anantha Padmanabhan. "Data mining for big archives analysis: A case study." In Proceedings of the American Society for Information Science and Technology 50, no. 1 (2013): 1-10. doi:10.1002/meet.14505001076
Maria. Esteva, Jessica. A. Trelogan, Weijia. Xu, Andrew. J. Solis and Nicolas. E. Lauland, (2013). Lost in the Data: Aerial Views of an Archaeological Collection. Presented at the Digital Humanities 2013, Lincoln, Nebraska, USA, 2013.
David Gardner, Weijia Xu, Jamie Cannone, Daniel Miranker, Stuart Ozer, and Robin Gutell (2012). An Accurate Scalable Template-based Alignment Algorithm. In Proceedings of 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM2012), Philadelphia, PA, Oct. 2012, pp. 237-243.
Nicholas Woodward and Weijia Xu (2012). On Automatically Tagging Web Documents from Examples. In Proceedings of ACM Special Interest Group on Information Retrieval (SIGIR) 2012 Conference, Portland, Oregon, Aug 11-16, 2012, pp. 1111-1112.
Weijia Xu, Wei Luo, Nicholas Woodward (2012). Analysis and Optimization of Data Import with Hadoop. In Proceedings of 9th High Performance Grid and Cloud Computing (HPGC’12) in conjunction with IEEE International Parallel and Distributed Processing Symposium (IPDPS’12) May 21-25, 2012, Shanghai, China, pp.1058-1066.
Stuart Ozer, Kishore J. Doshi , Weijia Xu, Robin R Gutell (2011). rCAD: A Novel Database Schema for the Comparative Analysis of RNA. In Proceedings of 2011 IEEE Conference On e-Science, Stockholm, Sweden, Dec 2-8, 2011, IEEE Computer Society, Washington, DC, USA, pp.15-22.
Weijia Xu, Ame Wongsa, Jung Lee, Lei Shang, Jamie J Cannone, Robin R Gutell (2011). RNA2DMap: A Visual Exploration Tool of the Information in RNA's Higher-Order Structure. In Proceedings of 2011 IEEE Conference on Bioinformatics and Biomedicine(BIBM’11), Atlanta, GA, Nov. 12-15, 2011, IEEE Computer Society, Washington, DC, USA, pp. 613-617.
Yanan Jiang, Weijia Xu, Lee P Thompson, Robin R Gutell, Daniel P Miranker (2011). RNA Sequence Alignment with Structural Information. In Proceedings of 2011 IEEE Conference on Bioinformatics and Biomedicine (BIBM’11), Atlanta, GA, Nov. 12-15, 2011, IEEE Computer Society, Washington, DC, USA, pp. 618-622.
Brandt Westing, Benjamin Urick, Maria Esteva, Freddy Rojas, Weijia Xu (2011). Integrating Multi-touch in High-Resolution Display Environments. In State of the Practice Reports (SC '11). ACM, New York, NY, USA, Article 8, 9 pages, DOI=10.1145/2063348.2063359.
Xu, Weijia, Esteva, Maria Jain, Suyog D, Jain, Varun (2011). Analysis of Large Digital Collection with Interactive Visualization. In Proceedings of 2011 IEEE Conference on Visual Analytics Science and Technology (VAST’11), Oct 23-28, Providence, RI, USA IEEE Computer Society, Washington, DC, USA, pp.241-250.
Bae, Jae, Xu, Weijia, and Esteva, Maria (2011). Facilitating Understanding of Large Document Collections. In Proceedings of 11th International Conference on Document Analysis and Recognition (ICDAR 2011), Sep. 18-21, Beijing China. IEEE Computer Society, Washington, DC, USA, pp. 1334-1338.
Dhananjay, P., Xu, W., Esteva, M., Eijkhout, V. (2011). Bisecting Tensor Decomposition to Discover Theme Changes Over Time. In Proceedings of International Council for Scientific and Technical Information Conference 2011, Beijing, China, June 7-8, pp. 176-180.
Xu, W., Thompson, L. P., Miranker, D.P. (2011). Empirical Evaluation of Excluded Middle Vantage Point Forest on Biological Sequences Workload. In Proceedings of the 1st Workshop on New Trends in Similarity Search (NTSS '11/EDBT’11), Prasad Deshpande, Deepak P, Kjell Orsborn, and Silvia Stefanova (Eds.). ACM, New York, NY, USA, 26-31. DOI=10.1145/1966865.1966873.
Daruru, S., Dhandapani, S., Gupta, G., Iliev, I., Xu, W., Navratil, P., Marin, P. and Ghosh, J. (2010). Distributed, Scalable Clustering for Detecting Halos in Terascale Astronomy. In Workshop Proceedings of IEEE International Conference on Data Mining: Knowledge Discovery Using Cloud and Distributed Computing Platforms (KDCloud’10) Dec. 2010, Sydney, Australia, IEEE Computer Society, Washington, DC, USA, p.p. 138-147.
Esteva, M., Xu, W., Jain, S.D., Lee, J L, Martin, W K (2010). Assessing the Preservation Condition of Large and Heterogeneous Electronic Records Collections with Visualization. In Proceedings of 6th International Digital Curation Conference (IDCC’10), Dec 6-8 2010, Chicago, IL, USA.
Esteva, M., Xu, W. (2010). Finding Stories in the Archive through Paragraph Alignment. Digital Humanities 2010 (DH’2010), July 7-10 2010, London, UK.
Xu, W., Esteva, M. and Jain, S.D. (2010). Visualizing Personal Digital Collections. In Proceedings of IEEE/ACM Joint Conference on Digital Library (JCDL’2010), June 21-25 2010, Gold Coast, Australia, ACM, New York, NY, USA, pp.169-172.
Xu, W., Esteva, M. and Jain, S.D (2010). Visualization for Archival Appraisal of Large Digital Collections. In Proceedings of Archiving 2010, June 1-4, 2010, Den Haag, Netherlands, Society for Imaging Science and Technology (IST), pp. 157-162.
Walker, E., Xu, W. and Chandar, V. (2009). Composing and Executing Parallel Data-flow Graphs Using Shell Pipes. In Proceeding of 4th Workshop on Workflows in Support of Large-Scale Science (WORKS’09), ACM, New York, NY, USA, Article 11, 10 pages.
Esteva, M., Xu, W, Sreevelsan-Nair, J., Athalye, A., Hade, M. (2009). Finding Narratives of Activities through Archival Bond in Electronically Stored Information (ESI). DESI III Global E-Discovery/E-Disclosure Workshop at the 12th International Conference on Artificial Intelligence and Law, Casa Convalescència, Barcelona, Spain, June 8, 2009.
Xu, W. and Sreevalsan-Nair, J. (2009). Visual Representation of Multiple Associations in Data Using Constrained Graph Layout. In Proceeding of Seventh Theory and Practice of Computer Graphics 2009 Conference(EPCG’09), June 17-19 2009, Cardiff, UK, pp. 65-68.
Xu, W., Ozer, S., Gutell R. R. (2009). Covariant Evolutionary Event Analysis for Base Interaction Prediction using Relational Database Management System for RNA. In Proceedings of 21st International Conference on Scientific and Statistical Database Management (SSDBM 09), New Orleans, USA, June 2-4, 2009 pp. 200-216.
Xu, W. and Gather, K.P. (2008). On Interactive Visualization with Relational Database. In Conference Compendium of Info Vis 2008, Columbus, Ohio, USA. Oct. 24-31, IEEE Computer Society, Washington, DC, USA, p.p. 116-117.
Xu, W., Miranker, D.P., Ramakrishnan, S., Mao, R., and Willard, W. (2008). Anytime K-Nearest Neighbor Search for Database Applications. In Proceedings of the First International Workshop on Similarity Search and Applications (SISAP '08). IEEE Computer Society, Washington, DC, USA, 139-148.
Mao, R., Xu, W. Willard, W., Ramakrishnan, S., and Miranker, D. (2006). MoBIoS Index: Support Distance-Based Queries in Bioinformatics. In Proceedings of the 2006 Workshop on Intelligent Computing & Bioinformatics of the Chinese Academy of Sciences (WICB2006). November 12-14, 2006. Hefei, Anhui, China.
Xu, W., Miranker, D.P., Mao, R., & Wang, S. (2006). On Integrating Peptide Sequence Analysis and Relational Distance-Based Indexing. In Proceedings of 6th IEEE Symposium on Bioinformatics and Bioengineering (BIBE’06), Arlington, VA, USA Oct.16-18, 2006, IEEE Computer Society, Washington, DC, USA, pp.27-34.
Mao, R., Xu, W., Ramakrishnan, S., Nuckolls, G. and Miranker, D.P. (2005). On Optimizing Distance-Based Similarity Search for Biological Databases. In Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference (CSB’05), IEEE Computer Society, Washington, DC, USA, pp. 351-361.
Xu, W., Briggs, W. J., Padolina, J., Liu, W., Linder, C. R. & Miranker, D.P. (2004). Using MoBIoS' Scalable Genome Joins to Find Conserved Primer Pair Candidates Between Two Genomes. In Proceedings of 12th International Conference on Intelligent Systems for Molecular Biology (ISMB’04), Jul 31-Aug. 4, 2004, Glasgow, UK. pp. 355-362.
Mao, R., Xu, W., Singh, N. & Miranker, D.P. (2003). An Assessment of a Metric Space Database Index to Support Sequence Homology. In Proceedings of the 3rd IEEE Symposium on Bioinformatics and Bioengineering (BIBE’03), IEEE Computer Society, Washington DC, USA, pp. 375-382.
Miranker, D.P., Xu, W. & Mao, R. (2003). MoBIoS: a Metric-Space DBMS to Support Biological Discovery. In Proceedings of the 15th International Conference on Scientific and Statistical Database Management (SSDBM '03), Silvia Nittel and Dimitrios Gunopulos (Eds.), IEEE Computer Society, Washington DC, USA, pp. 241-244.
Invited Talks and Presentations
“Scalable Big Data Analysis with HiveQL”, Guest Lecturer (MGMT 635), School of Management, New Jersey Institute of Technology, Nov. 2nd, 2016,
“Supporting Efficient Content Based Collection Comparison for Curation”, presentation, Sep. 13, 2016, SciDataCon2016, Denver, Colorado.
“Improving similarity based retrieve for large data set”, Invited Speaker, Dec. 4, 2015, Texas State University
“Scaling R Computation for Big Data processing with XSEDE Resources”, Tutorial Lecturer, July 27, 2015. St. Louis, MO, USA.
“Super-R: Supercomputing & R for Data-Intensive Analysis, panel presentation”, Nov 19, 2014, Supercomputing Conference 2014, New Orleans, LA, USA
“Super-R: Supercomputing & R for Data-Intensive Analysis, panel presentation”, Jun 24, 2014, International Supercomputing Conference 2014, Leipzig, Germany
“Super-R: Supercomputing and R for Data-Intensive Analysis”, panel presentation, Nov. 20, 2013, Supercomputing conference 2013, Denver, CO, USA
“Visual Analytics for Digital Curation.” Invited Panel Presentation MARAC Regional Spring Conference, Erie, PA, on April 27, 2013.
“Exercises in Machine Learning as a Tool for Archivists and Records Managers.” Invited Talk. NAGARA E-Records Forum, Austin, Texas on April 11 2013.
“Supporting Scientific Research with Cloud Computing.” Invited Talk. Bio-IT World Cloud Computing Summit, San Francisco, Sep 10-13, 2012.
“Supporting Dynamic Access and Analysis of Large Scale Digital Collections with MapReduce.” Invited Panel Presentation. Session 404 in Annual Meeting of American Archivist Society (SAA12), San Diego, CA, USA, Aug 10, 2012.
“Big data and Cloud Computing.” Invited Talk. Beijing Document Service, Beijing, China, July 20, 2012.
“Big data and Cloud Computing.” Invited Talk. The Institute of Scientific and Technical Information of China (ISTIC), Beijing China, July 19, 2012.
“Enabling Dynamic Hadoop Services at TACC.” Invited Talk. ACM SIGKDD Austin Charter, Austin, Texas, USA, June 20, 2012.
“Integrating Digital Analysis and Access with Cloud Computing.” Invited Panel Presentation. The 38th IASSIST Conference, Washington DC, USA, June 8, 2012.
“Supporting Dynamic Access and Analysis of Large Scale Digital Collections with MapReduce.” Invited Panel Presentation. E-Records Forum 2012 Sponsored by the National Association of Government Archives and Records Administrators, Austin, Texas, USA, April 26, 2012.
“New Methods for Document Collection Analysis in Visual Analytics and Cloud Computing.” Invited Talk. Beijing Document Service, Beijing, China, September 16, 2011.
“Data mining with dynamic Hadoop cluster at Texas Advanced Computing center.” Invited Talk. Shenzhen University, Shenzhen, Guangdong, China, September 14, 2011.
“Interactive Visualization for Large Hierarchical Data.” Invited Talk. Division of Biomedical Informatics Retreat, University of Texas Southwestern Medical center, Dallas, Texas, USA, May 6 2011.
“Managing RNA Sequences with SQL Server.” Invited Talk. Microsoft E-science Workshop, the Renaissance Computing Institute, Chapel Hill, NC, USA, October 21, 2007.
“DIMACS Workshop on Biomolecular Networks: Topological Properties and Evolution.” Invited Poster Presentation. Newark, NJ, May11-13, 2005.
“IMA/RECOMB Satellite Workshop on Comparative Genomics.” Invited Poster Presentation. Minneapolis, MN, October 10-14, 2003.
Other Media Coverage and Online Article