Published using Google Docs
program
Updated automatically every 5 minutes

UT Bioinformatics Symposium Logo, a man and a woman discuss results on a computer screen

Speakers

Designing Vaccines for Cancer

Dr. Jeremy C. Smith. Oak Ridge National Laboratory and University of Tennessee, Knoxville.

We illustrate the use of 3D modelling and machine learning (ML) in the design of personalized vaccines against cancer. The methods combine physicochemical descriptors with simulation, modelling and ML. We are able to accurately determine T-cell receptor epitope as well as antigen:MHC affinity and thus predict immunogenicity. Our approaches have been shown to shrink pancreatic cancer tumors in mice.

Prof. Jeremy C. Smith specializes in computational molecular biophysics, with an emphasis on the simulation of biological molecules. A native of England, after postdoctoral work with Martin Karplus at Harvard in 1989 he set up a group in biomolecular simulation at the French Atomic Energy Commission in Saclay. In 1998 he moved to the University of Heidelberg, Germany, where he held a Chair in Computational Biology. In 2006, he became the first Governor’s Chair at the University of Tennessee in a joint position with Oak Ridge National Laboratory. He has published >500 articles and has been cited >70,000 times. He is a Fellow of the Royal Society of Chemistry and Associate Editor of Biophysical Journal.


Data-Driven Population Health Surveillance

Dr. Heidi Hanson. Group Leader, Biostatistics and Multiscale System Modeling, Oak Ridge National Laboratory.

Large amounts of health and environmental data across heterogenous populations are needed rapidly identify vulnerable populations and provide near real-time situational readiness for public health threats. However, the effective development of near real-time population health surveillance remains hindered by numerous challenges. Data complexity and regulatory hurdles related to health data privacy prevent pooling of data across health care institutions in the US. Integration of diverse types of social and environmental determinants of health data across space and time requires advanced analytical methods and computational workflows. Computational limitations prevent scaling algorithms to the population level and have hindered the development and deployment of population health research tools. In her presentation, Dr. Hanson will critically examine some of these obstacles, drawing on current projects to illustrate innovative solutions. She will also propose new strategies to expedite progress in near real-time population health surveillance, emphasizing the need for interdisciplinary collaboration. This discussion aims to provide insights for leveraging these complex datasets effectively, thereby enhancing their impact on population health.

Heidi Hanson, PhD is a Senior Scientist and Group Lead of Biostatistics and Biomedical Informatics at Oak Ridge National Laboratory. She is a demographer and life course epidemiologist, with expertise in analysis of population health data. She leads the joint NCI-DOE Modeling Outcomes using Surveillance data and Scalable AI for Cancer (MOSSAIC) and the “Data-Driven Population Health Surveillance at Scale for Pandemic Readiness” project EHRLICH. Her previous projects have taken advantage of large health databases such as the Utah Population Database (UPDB), Demographic Health Survey (DHS), Centers for Medicare and Medicaid Services (CMS), National Health and Nutrition Examination Survey (NHANES), and the Surveillance, Epidemiology, and End-Results (SEER) Program.


Deep Learning Models for Biological Data Integration

Dr. Joshua L. Phillips. Professor, Department of Computer Science, Center for Computational and Data Science, Middle Tennessee State University.

Microbiological studies now regularly gather multiple sources of data such as the metagenomic, metabolomic, and metatranscriptomic to better understand complex ecological relationships, but linking these data domains together is a complex task for which we may turn to deep learning for a solution. We have developed deep learning models for analyzing high-throughput amplicon sequencing data which, unlike past approaches, are not limited to analyzing single sequences in isolation, and then utilize recent deep contrastive learning techniques to link amplicon data models with corresponding LC/MS or GC/MS data models. We establish a methodology which can be utilized to pre-train models across multiple experimental data sets and then fine-tune models for various specific tasks much like recent LLMs. Our models also provide explanations of how they accomplish their requisite tasks via attention attribution which provides insights into putative biological mechanisms, many of which align with known ecological relationships or could be the focus of future experimental verification.

Joshua L. Phillips is currently a Professor in the Department of Computer Science at Middle Tennessee State University. Phillips’ research interests are at the intersection of AI and computational biology. Particular examples include neurobiologically inspired models of working memory for robust AI, unsupervised machine learning applied to computational electrostatics for protein engineering and HIV vaccine development, AI-inspired methods for accelerating in silico protein folding, and transformer models for permutation-equivariant multi-omics data integration. He has helped secure over $3.5 million in funding from NSF for student scholarships and undergraduate/graduate research in applied machine learning. He aided development of MTSU’s B.S. in Data Science program, and continues to aid development of the Data Science (M.S.) and Computational and Data Science (Ph.D.) graduate programs. Phillips has a B.S. in Computer Science (2002) from Middle Tennessee State University, an M.S. in Computer Science (2004) from Vanderbilt University, and a Ph.D. in Electrical Engineering and Computer Science (2012) from the University of California, Merced. He worked as a Nicholas C. Metropolis Post-doctoral Fellow at Los Alamos National Laboratory (2012-2014) before joining the Department of Computer Science at MTSU in 2014.


Wild Pathosystems for Evolutionary Applications in Plant Pathology

Dr. Gautam Shirsekar. Assistant Professor, Department of Entomology and Plant Pathology, University of Tennessee Institute of Agriculture.

Crop wild relatives (CWRs) are an important source of disease-resistance genes for our crops. The absence of spatio-temporal heterogeneity in host composition coupled with pathogens' rapid evolvability, results in frequent disease resistance breakdowns in crop pathosystems.

In contrast, wild pathosystems are characterized by host, pathogen, and environmental spatio-temporal heterogeneity and the absence of significant disease outbreaks. The lack of studies on the wild plant pathosystems means we know very little about how this resistance durability is achieved. My research program seeks  to understand how these pathosystems evolve to maintain co-evolutionary dynamics in the equilibrium. I will investigate North American wild grape (Vitis spp.) - downy mildew (Plasmopara viticola) pathosystem with the goal of comprehensively understanding eco-evolutionary processes and underlying genetic architecture that are major drivers of coevolution. N. American Vitis sp. is characterized by range-wide interspecific hybridization and yet unknown interaction diversity with P. viticola. I look forward to integrating tools from population and ecological genomics, plant pathology, molecular biology, and computational biology to achieve my goal. In the long term, I believe that the evolutionary principles learned in my research program by using wild pathosystems will play a significant role in the management of emerging virulence in crop pathosystems.

I am a plant pathologist interested in understanding co-evolutionary patterns in host-pathogen interactions. I joined the Department of Entomology and Plant Pathology (EPP) at UTK in February 2024. I came to EPP from Prof. Detlef Weigel’s lab at the Max Planck Institute for Biology, Tuebingen, Germany. Before that, I earned my Ph.D. at Ohio State University, where I studied the biochemical aspects of host-pathogen interactions in the rice-blast pathosystem. In my research program at the UTK, I apply field-based observations, lab studies, and computational biology to understand eco-evolutionary processes affecting wild plant pathosystems. In the long term, I intend to leverage my fundamental understanding of coevolutionary dynamics in a pathosystem to develop durable and sustainable crop disease management strategies.


Lightning Talks

Use and Accuracy of Data Scraping For Compilation Of Healthcare Services Information

Phoebe Tran, Ph.D., MS. Assistant Professor, Department of Public Health, University of Tennessee, Knoxville.

Access to accurate, up-to-date information on healthcare providers is critical for clinicians to refer patients to providers that best fit patient needs, such as insurance coverage and location. Frequent changes in healthcare provider availability make it burdensome for clinicians to manually verify current data. Data scraping techniques can automate the collection of public information about healthcare providers, supplementing directories compiled by the Centers for Medicare and Medicaid Services or local and state governments. In our study, we describe using a data scraping process to identify outpatient cardiac rehabilitation facilities in Tennessee, cross-referenced with a manually verified list from the Tennessee Association of Cardiovascular and Pulmonary Rehabilitation (TACVPR). We found that while data scraping can expand upon existing lists, further refinement is needed for the process to be fully accurate. We propose integrating AI-based automated data verification to enhance accuracy by detecting discrepancies between multiple data sources. This approach has the potential to improve access to care by reducing the time clinicians spend identifying suitable healthcare providers.


The Time to Move Randomized Crossover Trial

Samantha F. Ehrlich, PhD, MPH. Associate Professor. Kinesiology, Recreation, and Sport Studies, University of Tennessee, Knoxville.

Physical activity (PA) helps regulate blood glucose levels, and as a non-pharmacological option, is attractive for individuals with pregnancy hyperglycemia. The Time to Move Randomized Crossover Trial will answer, ‘when is the best time of day for my PA?’ using state-of-the-art technology to measure glycemia (continuous glucose monitors, CGM, for glucose readings every 5 minutes), PA (ActiGraph devices, for minute by minute assessment of sedentary behavior and movement, by intensity level), and dietary intake (digital uploads of all food and beverages consumed, in conjunction with 24-hr dietary recalls). Participants complete the following conditions: 30 minutes of moderate intensity walking/stepping in the morning (i.e., between 5-9am, within 30-40 minutes of starting breakfast), 30 minutes of moderate intensity walking/stepping in the late afternoon/evening (between 4-8pm, within 30-40 minutes of starting dinner), and no PA. Glucose values over the 24-hour cycle (e.g., overall, daytime, nighttime, before breakfast, and after all meals) are then compared across conditions. Our trial’s findings will identify the optimal timing of PA for individuals with pregnancy hyperglycemia. Since the order of the conditions is randomized and the trial includes of multiple, timestamped repeated measurements, we have worked closely with bioinformatics experts to process and prepare our data for analyses.


High-Throughput Sequencing at the UT Genomics Core (UTGC)

Veronica A. Brown. Lab Manager. University of Tennessee Genomics Core Facility, Knoxville; Office of Research, Innovation & Economic Development, University of Tennessee, Knoxville.

The University of Tennessee Genomics Core is a shared-resource, cost-recovery facility within ORIED, offering high-throughput DNA and RNA library prep and sequencing on Illumina instruments, to both UT researchers and off-campus users.


Optimizing Cost-Effective Gene Expression Phenotyping Approaches in Cattle Using 3’ mRNA Sequencing

Ruwaa Mohamed. Graduate student. Genome Science & Technology, Bredesen Center, University of Tennessee, Knoxville.

The rapidly declining cost of next-generation sequencing presents opportunities for population-level molecular phenotyping. While the cost of whole transcriptome sequencing has declined recently, its required sequencing depth still makes it an expensive choice for wide-scale molecular phenotyping. We aim to optimize 3′ mRNA sequencing (3′ mRNA-Seq) approaches for collecting cost-effective proxy molecular phenotypes for cattle from easy-to-collect tissue samples (i.e., whole blood). We used matched samples for 15 Holstein male calves in a heat stress trail to identify the 1) best library preparation kit (Takara SMART-Seq v4 3′ DE vs. Lexogen QuantSeq) and 2) optimal sequencing depth (0.5 to 20 million reads/sample) to capture gene expression phenotypes most cost-effectively. Takara SMART-Seq v4 3′ DE outperformed Lexogen QuantSeq libraries across all metrics: number of quality reads, expressed genes, informative genes, differentially expressed genes, and 3′ biased intragenic variants. Serial downsampling analyses identified that as few as 8.0 million reads per sample could effectively capture most of the between-sample variation in gene expression. However, progressively adding more reads did provide marginal increases in recall across metrics. These 3′ mRNA-Seq reads can also capture animal genotypes that could be used as the basis for downstream imputation. The 10 million read downsampled groups called an average of 104,386 SNPs and 20,131 INDELs, many of which segregate at moderate minor allele frequencies in the population. This work demonstrates that 3′ mRNA-Seq with Takara SMART-Seq v4 3′ DE can provide an incredibly cost-effective (<$25/sample) approach to quantifying molecular phenotypes (gene expression) while discovering sufficient variation for use in genotype imputation. Ongoing work is evaluating the accuracy of imputation and the ability of much larger datasets to predict individual animal phenotypes. (full paper: https://doi.org/10.1101/2024.06.18.599599)


Empowering Research with Bioinformatics Consulting

Ryan Kuster, PhD. Bioinformatics Consultant. Entomology and Plant Pathology, University of Tennessee, Knoxville.

UTK Bioinformatics Consulting is a flexible, hourly service that provides assistance across a broad range of research goals, supporting every step of bioinformatics analysis—from experimental design to sequencing and data analysis. This presentation will share examples of collaborative projects to demonstrate how tailored consulting services empower researchers to overcome technical barriers, streamline data analysis, and achieve research milestones. In addition, consulting enhances the bioinformatics skill sets of faculty, staff, and students. Whether it’s creating optimized ‘omics’ workflows, developing custom code for niche datasets, or offering training and teaching, bioinformatics consulting delivers the expertise needed to transform raw data into meaningful biological conclusions while strengthening the bioinformatics community at the University of Tennessee.


Quantifying Fine Motor Deficits in Rett Syndrome Mice Using DeepLabCut & Kinematics Analysis

Mohamed Mahrous. Krishnan Lab, 2nd year PhD Student. Genome Science & Technology, Bredesen Center, University of Tennessee, Knoxville.

Rett syndrome (RTT) is a neurodevelopmental disorder primarily affecting females, caused by mutations in the X-linked MECP2 gene. RTT patients exhibit mosaic MECP2 expression, which contributes to their range of phenotypes, including significant motor impairments. To elucidate the motor deficits associated with MECP2 mutations, this study investigates the reaching performance of female heterozygous Mecp2 mutant (Mecp2het) mice compared to wild-type (WT) controls, employing the single pellet reaching and grasping assay over a 14-day period. To achieve precise, quantitative assessment of motor execution, we utilized DeepLabCut (DLC), a deep-learning-based, markerless pose estimation tool. DLC was trained on a subset of video frames to recognize key points on the mice’s limbs, allowing for high-throughput analysis of hand trajectory during reaching tasks. This automated tracking enabled the extraction of kinematic metrics, such as speed, straightness, and acceleration, which we used to evaluate and differentiate the hand motions of mutant and WT mice. The use of DLC not only accelerated the analysis—reducing months of manual tracking into hours—but also provided enhanced accuracy and consistency in the resulting data allowing us to look at and compare the fine motor skills of mice during complex motor tasks.


Decoding Chaos: mapping radiation-induced double-strand breaks

Heng Li. PhD candidate in McCord Lab. Biochemistry & Cellular and Molecular Biology, University of Tennessee, Knoxville.

The three-dimensional genome organization is fundamental for biological processes such as cell differentiation and cancer progression, yet our knowledge of the alteration of chromosome architecture after radiation-induced damage is limited. Our previous genome-wide chromosome conformation capture (Hi-C) studies have revealed features of chromatin organization at 30 min and 24 h after ionizing radiation exposure, but the specific genomic landscape surrounding break sites has yet to be explored. Here, we report the first quantitative mapping of double-strand breaks (DSBs) across the genome after ionizing radiation exposure in both fibroblasts and lymphocytes by END-seq. Our comprehensive and integrative bioinformatic analysis of enriched peaks reveals that radiation results in more DNA damage in lymphocytes than fibroblasts, in line with computational predictions regarding the effect of initial 3D genome organization of cells on the impacts of radiation damage. At a megabase-scale, DSBs are more frequent in opening genome regions than in compact regions, which is consistent with previous computational simulation research. Combining with our Hi-C data, we also find extremely high levels of break strength at TAD boundaries and more contacts around break sites. Our findings extend to understanding the precise genome reorganization during DNA damage and repair process.


Reduce coding barrier for bioinformatic users

Guang He, PhD. Civil Engineering and Environmental Science, University of Tennessee, Knoxville.

Bioinformatics is being involved in a broad scope of research topics. However, the coding barrier impede non-expertise bioinformatic user (e.g., soil microbiologist) to use powerful bioinformatic tools. Bioinformaticians have established different communities, aiming to develop reproducible bioinformatic pipelines and reduce the coding barrier for non-expertise bioinformatic users. I will introduce a great example of such community, Nf-Core, where reproducible pipelines for amplicon, rna, metagenome sequencing are available. Generalized analysis could be accomplished with a single command.


High Performance & Scentific Computing

Davin Bostic. Manager. Office of Innovative Technologies, High Performance & Scientific Computing, University of Tennessee, Knovxille. http://oit.utk.edu/hpsc

The resources provided by OIT HPSC includes three clusters, two for open research and one for processing sensitive information. Each cluster has a Lustre file system with petabyte scale storage capacity. These resources can be used by faculty and student researchers and for academic coursework. To obtain access to use the resources, claim your account at the ISAAC User Portal by choosing Request an ISAAC Account in the menu on the left.


Modeling Microbial Predator-Prey Interactions: From lab to cluster to Global Ocean

Eric Carr, PhD. Microbiology, University of Tennessee, Knoxville. http://www.talmygroup.com/people.html

Understanding the structure and function of microbial ecosystems on a large scale is essential for unraveling the complexities of carbon cycling in the ocean. In collaboration with lab and field researchers, we utilize the MIT General Circulation Model (MITGCM) coupled with Darwin, a trait-based ecosystem model, to explore microbial dynamics such as biochemical niche formation, predation, and viral influences. By deploying ISAAC NG to run these large-scale simulations and manage the extensive data analysis, we aim to probe the impacts of microbial interactions on global ocean processes. This approach enables us to investigate whether these idealized models accurately reflect real-world ocean processes and to establish a link between microbial community dynamics and large-scale nutrient and carbon cycling, advancing our understanding of marine ecosystems and their role in the Earth’s climate system.