ACM BCB 2019 Program
• September 7, 2019
• September 8, 2019
• September 9, 2019
• September 10, 2019
Keynote Lecture: 60 minutes (45 minutes for talk and 15 minutes for Q and A)
Highlight Talks: 25 minutes (20 minutes for talk and 5 minutes for Q and A)
Main Conference Regular Paper: 25 minutes (20 minutes for talk and 5 minutes for Q and A)
Main Conference Short Paper: 15 minutes (12 minutes for talk and 3 minutes for Q and A)
By the Numbers
• 7 workshops
• 7 tutorials
September 8 – September 10:
• 3 keynotes
• 5 highlights
• 42 regular papers
• 19 short papers
W: The Sixth International Workshop on Computational Network Biology: Modeling, Analysis, and Control (CNB-MAC 2019)
Dr. Byung-Jun Yoon and Dr. Xiaoning Qian, Texas A&M University, Dept. Electrical & Computer Engineering;
Dr. Tamer Kahveci, University of Florida, Dept. Computer and Information Science and Engineering;
Dr. Ranadip Pal, Texas Tech University, Electrical and Computer Engineering
Abstract: Next-generation high-throughput profiling technologies have enabled more systematic and comprehensive studies of living systems. Network models play crucial roles in understanding the complex interactions that govern biological systems, and their interactions with the external environment. The inference and analysis of such complex networks and network-based analysis of large-scale measurement data have already shown strong potential for unveiling the key mechanisms of complex diseases as well as for designing improved therapeutic strategies. At the same time, the inference and analysis of complex biological networks pose new exciting challenges for computer science, signal processing, control, and statistics. We organize the Sixth International Workshop on Computational Network Biology: Modeling, Analysis, and Control (CNB-MAC 2019) in conjunction with ACM-BCB 2019. The previous CNB-MAC workshops have been successfully held in conjunction with ACM-BCB 2014, ACM-BCB 2015, ACM-BCB 2016, ACM-BCB 2017, and ACM-BCB 2018, attracting a fair number of researchers interested in computational network biology.
The workshop aims to provide an international scientific forum for presenting recent advances in computational network biology that involve modeling, analysis, and control of biological systems under different conditions, and system-oriented analysis of large-scale OMICS data. The full-day workshop will solicit (i) highlights that present advances in the field that have been reported in recent journal publications, (ii) extended abstracts for poster presentation at the workshop, which will provide an excellent venue for quick dissemination of the latest research results in computational network biology, and (iii) original research papers that report new research findings that have not been published elsewhere. Full length original research papers accepted for presentation at the workshop will be published in a supplement issue in partner journals that will be identified after the workshop proposal is accepted. The first and the second CNB-MAC workshops have partnered with EURASIP Journal on Bioinformatics and Systems Biology, and the third CNB-MAC workshop partnered with BMC Bioinformatics, BMC Systems Biology, and BMC Genomics. The fourth and fifth CNB-MAC workshops partnered with BMC Bioinformatics, BMC Systems Biology, BMC Genomics, and IET Systems Biology. The main emphasis of the proposed workshop will be on rigorous mathematical or computational approaches in studying biological networks, analyzing large-scale OMICS data, and investigating mathematical models for human-microbiome-environment interactions.
W: The 2019 Computational Structural Bioinformatics Workshop (CSBW 2019)
Nurit Haspel, UMass Boston;
Dong Si, University of Washington Bothell;
Lin Chen, Elizabeth City State University
Abstract: The 2019 Computational Structural Bioinformatics Workshop will be held in conjunction with
ACM-BCB. The rapid accumulation of macromolecular structures presents a unique set of challenges and opportunities in the analysis, comparison, modeling, and prediction of macromolecular structures and interactions. This workshop aims to bring together researchers with expertise in bioinformatics, computational biology, structural biology, data mining, machine learning, optimization, and high-performance computing to discuss new results, techniques, and research problems in computational structural bioinformatics. Selected submissions will be invited to publish extended versions of their papers in a special issue in MDPI Molecules. Journals used in previous years included the International Journal of Data Mining and Bioinformatics (2007), BMC Structural Biology (2009, 2012) the Journal of Bioinformatics and Computational Biology (2011), Journal of Computational Biology (2015, 2016) and Molecules (2017, 2018).
W: 8th Workshop on Computational Advances in Molecular Epidemiology (CAME 2019)
Yury Khudyakov, Centers for Disease Control and Prevention;
Ion Mandoiu, University of Connecticut;
Pavel Skums and Alex Zelikovsky, Georgia State University
The CAME workshop provides a forum for presentation and discussion of the latest computational research in molecular epidemiology. This multidisciplinary workshop will bring together field practitioners of molecular epidemiology, molecular evolutionists, population geneticists, medical researchers, bioinformaticians, statisticians and computer scientists interested in the latest developments in algorithms, mining, visualization, modeling, simulation and other methods of computational, statistical and mathematical analysis of genetic and molecular data in the epidemiological context.
Molecular epidemiology is essentially an integrative scientific discipline that considers molecular biological processes in specific epidemiological settings. It relates molecular biological events to etiology, distribution and prevention of disease in human populations. Over the years, molecular epidemiology became extensively fused with mathematical and computational science and immensely benefited from this tight association. The workshop will review the latest advancements in the application of mathematical and computational approaches to molecular epidemiology.
W: 8th International Workshop on Parallel and Cloud-based Bioinformatics and Biomedicine (ParBio)
Prof. Mario Cannataro, Dep. of Medical and Surgical Sciences, University Magna Graecia, Catanzaro, ITALY;
Prof. Wes J. Lloyd, School of Engineering and Technology, University of Washington, Tacoma, USA;
Dr. Giuseppe Agapito, Dep. of Medical and Surgical Sciences, University Magna Graecia, Catanzaro, ITALY
Due to the availability of high-throughput platforms (e.g. next generation sequencing, microarray and mass spectrometry) and clinical diagnostic tools (e.g. medical imaging), a recent trend in Bioinformatics and Biomedicine is the ever-increasing production of experimental and clinical data.
Considering the complex analysis pipelines often used in biomedical research, there is a main bottleneck that involves the storage, integration, and analysis of experimental data, as well as their correlation and integration with publicly available data banks. While parallel computing and Grid computing may offer the computational power and the storage to face this overwhelming availability of data, Cloud Computing is a key technology to hide the complexity of computing infrastructures, to reduce the cost of the data analysis task, and especially to change the overall model of biomedical research and health provision.
High-performance infrastructures may offer the huge data storage needed to store experimental and biomedical data, while parallel computing can be used for basic pre-processing (e.g. parallel BLAST, mpiBLAST) and for more advanced analysis (e.g. parallel data mining). In such a scenario, novel parallel architectures (e.g. CELL processors, GPUs, FPGA, hybrid CPU/FPGA) coupled with emerging programming models may overcome the limits posed by conventional computers to the mining and exploration of large amounts of data. On the other hand, these technologies yet require great investments by biomedical and clinical institutions and are based on a traditional model where users often need to be aware and face different management problems, such as hardware and software management, data storage, software ownership, and prohibitive costs (different professional-level applications in the biomedical domain have a high starting cost that prevent many small laboratories to use them).
The Cloud Computing technology, that is able to offer scalable costs and increased reachability, availability and easiness of application use, and possibility to enforce collaboration among scientists, is already changing the business model in different sectors and now it has begun to be adopted in the bioinformatics and biomedical domains. However, many problems remain to be solved, such as availability and safety of the data, privacy-related issues, availability of software platforms for rapid deployment, and the execution and billing of biomedical applications.
The goal of ParBio 2019 is to bring together scientists in the fields of high performance and cloud computing, computational biology and medicine to discuss, among others, the parallel implementation of bioinformatics and biomedical applications and problems and opportunities of moving biomedical and health applications on the cloud. Moreover, big data analytics issues in healthcare and bioinformatics will be addressed. The workshop will focus on research issues, problems and opportunities of moving biomedical and health applications on the cloud, as well as on the opportunity to define guidelines and minimum requirements for a Biomedical Cloud. Moreover, the workshop will discuss about parallel and distributed management and analysis of molecular and clinical data that more and more need to be integrated and analyzed in a joint way.
W: Workshop on Microbiomics, Metagenomics, and Metabolomics (MMM)
Soha Hassoun, Department of Computer Science at Tufts University;
Yasser El-Manzalawy, Geisinger Health System and the Pennsylvania State University;
Georg Gerber, Harvard Medical School, Massachusetts Host-Microbiome Center, and Brigham and Women’s Hospital;
David Koslicki, Pennsylvania State University;
Gail Rosen, Electrical and Computer Engineering at Drexel University
Microbiota are ecological communities of microorganisms found throughout nature. In humans and animals, microbiota communities can reside on or within the body, and exist in a commensal or mutualistic relationship with their host to impact physiological functions and play critical roles in the host’s development. These microbial communities can be very complex. One such example is the intestinal microbiota, comprising hundreds of species that interact with other microorganisms in the community as well as their host. Recent studies have demonstrated that microbiota impacts a wide range of physiological processes, including digestion, development of the immune system, and inflammation. Further, significant alterations in the intestinal microbiota composition has shown to correlate with several diseases, including obesity diabetes, cancer, asthma, and even autism spectrum disorder. Characterizing the microbiota and understanding its relation to health and disease stand to significantly improve human health.
Efforts to characterize microbiota have greatly benefited from technical advances in DNA sequencing. In particular, low-cost culture-independent sequencing has made metagenomic and metatranscriptomic surveys of microbial communities practical, including bacteria, archaea, viruses, and fungi associated with the human body, other hosts, and the environment. The resulting data have stimulated the development of many new computational approaches to meta-omic sequence analysis, including metagenomic assembly, microbial identification, and gene, transcript, and pathway metabolic profiling. Further, recent advances in untargeted metabolomics have stimulated the development of many tools that enhance the functional profiling of microbial communities.
W: Machine Learning Models for Multi-omics Data Integration (MODI)
Abed Alkhateeb, School of Computer Science at the University of Windsor, Canada;
Luis Rueda, School of Computer Science at the University of Windsor, Canada
A peer-reviewed proceedings workshop in cutting-edge machine learning approaches and applications in multi-omics data in which researchers in the field showcase and discuss their advanced approaches. The workshop will be half-day long of oral presentations of the accepted papers. We are aiming to host approximately 9 to 12 high-quality accepted works in the field. Each talk will last approximately 15 to 20 minutes, including question/answer session. A coffee with snack break will take place in the middle of the workshop for refreshment, discussions and networking.
The advancement in genome sequencing has helped reveal relevant information about genomic variants in protein functions, spectrums and diseases. Integrative approaches using machine learning and deep learning are applied to rebuild system biology networks of multi-omics including but not limited to DNA and RNA variants (SNPs, indels, CNA, CNV and exons, among others), protein-protein interactions networks and clinical information. Current techniques focus on integrating different molecules to (1) predict the outcomes of diseases such as survivability, progression, and type/subtype of the disease; (2) understand the behavior of molecules and build protein-protein interactions to create or repurpose drugs, in the context of precision medicine. However, the contribution of those different molecules must be deeply analyzed to target the cause rather than just the correlated factors of those molecules. The underlying computational models are aimed to learn the weights of the relationships and contributions of these different omics.
W: Biological ontologies and knowledge bases (BiOK)
Jin Chen, University of Kentucky, United States;
Jiajie Peng, Northwestern Polytechnical University, China
In “Omics” era of the life sciences, it is cost-effective to collect diverse types of genome wide data, which represent the information at various levels of biological systems, including data about genome, transcriptome, epigenome, proteome, metabolome, molecular imaging, molecular pathways, different population of people and clinical/medical records. Currently, big challenge is to represent and use the knowledge contained in the massive data.
A bio-ontology provides standardized and structured vocabulary terms for the scientific community to describe biomedical entities in a domain. In recent years, numerous biomedical ontologies have been developed to represent knowledge about anatomy, molecular function, human phenotype, disease, clinical diagnosis and other areas. Biomedical Ontologies have been proven very useful for knowledge representation, entity annotation, data sharing and data integration et al. in biomedical research.
Knowledge bases are increasingly being used to extract deep biological knowledge and understanding from massive biological data. Knowledge bases can provide information on underlying mechanisms, which statistical inference methods cannot gain insight into. This improvement is largely due to knowledge bases providing a validated biological context for interpreting the ocean of omics.
The biomedical ontologies and knowledge bases workshop provides a vibrant environment for researchers to share their research findings, report novel methods, and discuss the challenges and opportunities in the related fields.
T: Employing Deep Learning to Study Biomolecules (EDL)
Daniel Veltri (National Institutes of Health) and Kevin Molloy (James Madison University)
Abstract: Deep learning and neural networks are at the frontier of machine learning and artificial intelligence. The applications of these methods are vast and include many applications within bioscience. Most recently, Google’s Deep Mind project stunned the protein structure prediction community with their performance in the last critical assessment of structural prediction (CASP) competition.
The objective of this tutorial is three-fold. First, the tutorial will introduce students and researchers that attend ACM-BCB to the deep learning framework Keras. This library, which is built on top of Google’s TensorFlow, will be shown utilizing the Python and R programming languages. Second, the tutorial will allow attendees to learn the basic concepts of convolutional neural networks, recurrent neural networks, and transfer learning. Hands on examples and sample code will be provided for each of these separate topics using classic example problems (such as image recognition/classification and text-mining/ sentiment analysis). Third, a hands-on
example of using multiple concepts collectively will be employed through an in class competition to identify peptides exhibiting antimicrobial properties. An accompanying website allows attendees to upload and evaluate their models on the fly and further see how it ranks/compares to others.
T: Low-dimensional Representation of Biological Sequence Data (LRBS)
Richard Tillquist (University of Colorado, Boulder)
Abstract: Systems of interest in bioinformatics and computational biology tend to be large, complex, interdependent, and stochastic. As our ability to collect sequence data at finer resolutions improves, we can better understand and predict system behavior under different conditions. Machine learning algorithms are a powerful set of tools designed to help with this understanding. However, many of the most effective of these algorithms are not immediately amenable to application on symbolic data. It is often necessary to map biological symbols to real vectors before performing analysis or prediction using sequence data. This tutorial will cover several techniques for embedding sequence data. Common methods utilizing k-mer count vectors and binary vector representations will be addressed along with state of the art methods based on neural networks, like BioVec, and graph embeddings, like Node2Vec and multilateration. Slides, datasets, and code from the tutorial will be made freely available for future use on GitHub. The materials for this tutorial have been partially funded by the NSF ISS BIGDATA grant No. 1836914.
T: Extracting structure from contaminated symbolic data (ESSD)
Antony Pearson (University of Colorado, Boulder)
Abstract: Symbolic data is the epitome of modern biological datasets. Modern sequencing technologies produce millions of reads giving insights on genome sequence, transcription levels, epigenetic modifications, and much more. To analyze those sequences one usually makes assumptions on their underlying structure, e.g., that the number of reads is Poisson, or that transcription factor binding events are independent at nonoverlapping promoters. These types of assumptions are often not exactly correct in reality. In fact, even when they are valid, a small amount of data "contamination'" may make them appear untrue. The traditional approach to questioning assumptions on data has been hypothesis testing. This approach has various shortcomings however, particularly it does not give room for a null hypothesis to be "approximately true".' This tutorial introduces a statistical methodology to assess assumptions on symbolic data that may be contaminated. It will demonstrate the applicability of this rather new methodology with publicly available DNA methylation data from ENCODE to question the common but unconscious assumption that methylation of CpGs is exchangeable. Data and code for this tutorial, in the form an iPython Notebook, will be made available via GitHub.
T: Machine learning for biomarker discovery in cancer pharmacogenomics data (MLP)
Arvind Singh Mer, Petr Smirnov and Benjamin Haibe-Kains (University of Toronto)
Abstract: Over the past decade there has been an explosion in the availability of massive datasets combining drug screening with high-throughput molecular profiling in cancer model systems. These datasets have become a rich community resource which can be leveraged for biomarker discovery, in-silico validation, drug repurposing, drug method of action prediction, and to train statistical machine learning models for drug response prediction. However, this data poses unique challenges during analysis and requires methods that are robust to the noise inherent in the drug sensitivity assays. Furthermore, irreproducibility of some findings across studies strongly motivates integrative analysis across studies. Fortunately, tools have been developed implementing bioinformatics and machine learning methods designed specifically for the analysis of pre-clinical pharmacogenomics data.
In this tutorial, participants will become familiar with common preclinical cancer models (such as cell-line, patient derived xenografts and organoids) and publicly available large pharmacogenomics datasets. Next, in the hands on session, they will be introduced to the tools and packages published for analysis of these datasets, with a focus on tools written in R. Furthermore, after becoming familiar with the challenges posed by the noise in the pharmacological assays observed in high-throughput pharmacogenomics, participants will gain hands on experience using these datasets for the purpose of biomarker discovery and validation as well as building machine learning models predictive of drug response. A focus will be on translational research, validating discoveries from in vitro datasets using in vivo pharmacogenomic and clinical datasets. The hands on sessions will be conducted primarily in R and RStudio.
T: Causal Inference in Biomedical Data Analytics: Basics and Recent Advances (CIBD)
May D. Wang and Hang Wu (Georgia Institute of Technology)
Abstract: Causal questions are being answered every day in the biomedical domain, and have significant impact on biomedical experimentation design, data analysis, and healthcare decision making. It’s thus important for biomedical researchers to help answer these questions by applying data-driven causal inference algorithms on large-scale biomedical data.
In this tutorial, we focus on identifying the (quantitative) causal effect of interventions. We will introduce a) introduction of causal inference, and popular frameworks for formulating causal inference; b) basics of causal effect identification algorithms; c) state-of-the-art methods by incorporating deep learning and machine learning; d) applications in biomedical data analytics, as well as challenges and opportunities moving forward.
T: You wrote it, now get it used: Publishing your software with Galaxy and Bioconda (PGB)
Daniel Blankenberg (Cleveland Clinic)
Abstract: You’ve written software, published the code, and described it in a paper. Now, how do you make your software stand out and actually get used? This tutorial introduces two technologies that can make it easy to deploy by researchers around the world and greatly increase your software’s reach.
Bioconda is a platform for packaging and publishing bioinformatics software using Conda. The Conda package manager does what previous language and platform specific packagers (e.g., pip, CPAN, CRAN, Bioconductor, apt-get) have done, but in a language and OS agnostic, and much more streamlined way. Tools in Bioconda are easy for infrastructure providers and other researchers to deploy and use. We will introduce Conda and Bioconda principles, and then guide participants through packaging a tool with Bioconda.
Participants will package their newly created Bioconda tool for Galaxy, a widely deployed platform for data integration and analysis in life science research. We will define and test the Bioconda-encapsulated tool for Galaxy and then publish it in the Galaxy Toolshed, where any Galaxy administrator can then install it with a button click.
This will be hands-on. Please bring a wifi-enabled laptop. Instructors will work with participants to install needed software before the conference.
T: Integer Linear Programming in Computational and Systems Biology (ILP)
Dan Gusfield (University of California, Davis)
Abstract: Integer Linear Programming is a versatile modeling and optimization technique that is increasingly used in computational biology in non-traditional ways, most importantly and inventively as a computational tool and language to model and study biological phenomena, to analyze biological data, and to extract biological insight from the models and the data. Integer linear programming is often very effective in solving *instances* of biological problems on realistic data of current importance, even for hard computational problems that lack a worst-case efficient solution method. The effectiveness of the best modern ILP solvers on problem instances of importance in biology opens huge opportunities and could have a truly transformative effect on computation in biology and perhaps medicine.
The goal of the tutorial is to introduce and detail *modeling* and *solving* of real problems in computational biology using integer linear programming. We will illustrate some concepts using a commercial ILP solver from Gurobi Optimization, to solve specific ILPs that we formulate.
New Machine Learning Algorithms for Genome Annotation
Mark Borodovsky, PhD, Georgia Institute of Technology
Abstract: Rapid accumulation of genomic, transcriptomic and protein information creates new opportunities as well as challenges for integration of OMICS data in genome annotation algorithms. Models of prokaryotic genome organization should account for leaderless transcription, non-Shine-Dalgarno ribosomal binding sites and genes horizontally transferred from other species. Automatic parameterization of these more complex models becomes possible via process of incremental expansion of model architecture & step-wise relaxation of restrictions on subsets of parameters upon moving along the steps of iterative training. Accuracy of prediction of eukaryotic genes with complex exon-intron structures could be improved by integrating process of ab initio gene predictions with search for putative orthologues proteins which footprints on genomic DNA are iteratively used in training and prediction. Due to unevenness of evolutionary conservation along a single amino acid chain, parallel use of spliced alignment algorithms for proteins of the same family allows to identify elements of gene structure encoding conserved domains with higher accuracy.
I will talk about new genome annotation algorithms: i/ GeneMarkS-2, a part of PGAP, the prokaryotic genome annotation pipeline developed and implemented at NCBI and ii/ eukaryotic self-training gene finder GeneMark-EP utilizing footprints of orthologous proteins in iterative parameterization of HMM statistical models of genome organization.
Biography: Mark Borodovsky received PhD in Applied Mathematics at the Moscow Institute of Physics and Technology. His thesis project was on developing methods of statistically optimal control for systems with incomplete information. He started research in computational biology at the Moscow Institute of Molecular Genetics in 1985.
In 1990, he moved to Georgia Tech to continue original work on the GeneMark family algorithms for structural annotation of prokaryotic and eukaryotic genomes. The long term goal is to infer relationships between structural patterns formed by evolution in linear bimolecular sequence with the 3D biological functions Reaching this goal requires new machine learning algorithms integrating genomic, transcriptomic and protein data.
Borodovsky is a Founder of the Bioinformatics graduate programs both at Georgia Tech and, more recently, at the Moscow Institute of Physics and Technology. He served as a Chair of the ACM SIGBio Advisory Board 2010-2015.
WABI Keynote Talk
Inferring the evolutionary history of gene repertoires
Nadia El-Mabrouk, PhD, Université de Montréal
Abstract: During evolution, genes are mutated, duplicated, lost and passed to organisms through speciation or Horizontal Gene Transfer (HGT). In addition, their organization in the genome is modified through inversions, transpositions, translocations and other rearrangement events. Understanding how gene order and content have evolved is essential for deciphering gene functions and interactions, with important biological implications. Ideally, all available information on gene sequence and organization should be considered in a single prediction method. However, gene sequence and gene order information are often considered separately. Indeed, inferring rearrangement events modifying gene organization is the purpose of the genome rearrangement field, while inferring losses, duplication and HGT events modifying gene content is the purpose of the gene tree – species tree reconciliation field. In this presentation, I will discuss this issue and present avenues for developing a unifying approach considering both gene orders and gene trees in the purpose of inferring the evolutionary history of gene repertoires.
Biography: Nadia El-Mabrouk is full professor at the Computer Science Department and member of the Centre de Recherche Mathématiques at the University of Montreal. She holds a Ph.D. in theoretical Computer Science from the University Paris VII, obtained in 1996. Nadia has a longstanding experience in developing algorithms for comparative genomics and especially genome rearrangements, gene tree reconstruction and Gene tree/Species tree reconciliation. She is involved, each year, in the program committee of some of the most popular conferences in computational biology such as RECOMB, ISMB, WABI and APBC. She has organized two RECOMB Comparative Genomics Workshops in Montreal. After chairing the Population Genomics and Molecular Evolution track at ISMB, she is, since 2019, acting as ISMB Proceeding Co-chair. Her research appears in a variety of computer science, bioinformatics and life science journals, among them IEEE/ACM, Molecular Biology and Evolution, Bioinformatics, Nature Scientific Reports and BMC-Genomics.
Decoding Epigenomic Programs in Immunity and Cancer
Christina Leslie, PhD, Memorial Sloan Kettering Cancer Center
Abstract: Dysregulated epigenetic programs are a feature of many cancers, and the diverse differentiation states of immune cells as well as their dysfunctional states in tumors are in part epigenetically encoded. We will present recent analysis work and computational methodologies from our lab to decode epigenetic programs from genome-wide data sets.
In a recent collaborative work, we characterized chromatin states governing CD8 T cell dysfunction in cancer and reported that tumor-specific T cells differentiate to dysfunction through two discrete chromatin states: an initial plastic state that can be functionally rescued (i.e. through immunotherapy) and a later fixed state that is resistant to therapeutic reprogramming. We now follow up on this work by presenting a computational methodology to decipher transcriptional programs governing chromatin accessibility and gene expression in normal and dysfunctional T cell responses through a large-scale analysis of published data from mouse tumor and chronic viral infection models. This modeling shows that in all these systems, T cells commit to becoming dysfunctional early after an immune challenge, rather than first mounting and then losing an effector response. Through scRNA-seq analysis, we characterize the phenotypic diversity of this common trajectory from plastic to fixed dysfunction.
We will also present a recent collaboration with the Sawyers lab on FOXA1 mutants in prostate cancer, showing that somatic alterations in this pioneer transcription factor lead to altered differentiation programs, through analysis of ATAC-seq and ChIP-seq in mouseprostate organoid systems.
Finally, we will describe a novel machine learning approach called BindSpace to leverage massive in vitro TF binding data from SELEX-seq experiments through a joint embedding of DNA k-mers and TF labels, leading to improved prediction of TF binding.
Biography: Christina Leslie did her undergraduate degree in Pure and Applied Mathematics at the University of Waterloo in Canada. She was awarded an NSERC 1967 Science and Engineering Fellowship for graduate study and did a PhD in Mathematics at the University of California, Berkeley, where her thesis work dealt with differential geometry and representation theory. She won an NSERC Postdoctoral Fellowship and did her postdoctoral training in the Mathematics Department at Columbia University in 1999-2000. She then joined the faculty of the Computer Science Department and later the Center for Computational Learning Systems at Columbia University, where she began to work in computational biology and machine learning. In 2007, she moved her lab to Memorial Sloan Kettering Cancer Center, where she is currently a Member of the Computational and Systems Biology Program as well as a Professor of Physiology, Biophysics, and Systems Biology at Weill Cornell Medical College.
Dr. Leslie is widely known for her work developing computational methods to study the global regulation of gene expression and the dysregulation of gene expression programs in cancer. A major methodological contribution of her lab was the introduction of k-mer based string kernels for prediction problems involving biological sequences. In addition, since many layers of gene regulation are mediated by DNA and RNA sequence signals, the Leslie lab has pioneered machine learning strategies to combine sequence and expression data to infer gene regulatory programs.
Funding Agency Panel
Scientific Review Officer
Biodata Management and Analysis (BDMA) Study Section
Center for Scientific Review (CSR)
National Institutes of Health (NIH)
Veerasamy “Ravi” Ravichandran
Division of Biophysics, Biomedical Technology, and Computational Biology (BBCB) National Institute of General Medical Sciences (NIGMS)
National Institutes of Health (NIH)
Program Director, Division of Information and Intelligent Systems (IIS)
Directorate for Computer & Information Science & Engineering (CISE)
National Science Foundation (NSF)
Fellow of ACM and IEEE
Professor of Computer Science and Biomedical Engineering
University of Virginia
Former Program Director, National Science Foundation
Large-Scale Machine Learning Algorithms for Biomedical Data Science
Heng Huang, PhD, University of Pittsburgh
Abstract: Data science is accelerating the translation of biological and biomedical data to advance the detection, diagnosis, treatment, and prevention of diseases. However, the unprecedented scale and complexity of large-scale biomedical data have presented critical computational bottlenecks requiring new concepts and enabling tools. To address the challenging problems in current biomedical data science, we proposed several novel large-scale machine learning models for multi-dimensional data integration, heterogeneous multi-task learning, longitudinal feature learning, etc. Meanwhile, to deal with the big data computations, we proposed new asynchronous distributed stochastic gradient and coordinate descent methods for efficiently solving convex and non-convex problems, and also parallelized the deep learning optimization algorithms with layer-wise model parallelism.
We applied our new large-scale machine learning models to analyze the multi-modal and longitudinal Electronic Medical Records (EMR) for predicting the heart failure patients’ readmission and drug side effects, integrate the neuroimaging and genome-wide array data to recognize the phenotypic and genotypic biomarkers, and detect the histopathological image markers and the multi-dimensional cancer genomic biomarkers in precision medicine studies.
Biography: Dr. Heng Huang is a John A. Jurenko Endowed Professor in Computer Engineering at the Department of Electrical and Computer Engineering at the University of Pittsburgh, and also a Professor in Biomedical Informatics at University of Pittsburgh Medical Center. Dr. Huang received his PhD degree in Computer Science at Dartmouth College. His research areas include machine learning, big data mining, health informatics, medical image analysis, bioinformatics, neuroinformatics, and precision medicine.
September 8, 2019
September 9, 2019
September 10, 2019
Sept. 8 (Long means 25min + 5 min for questions, answers, and change over)
WABI Session 1 Molecules and surfaces
Title: Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs
Authors: Milad Miladi, Martin Raden, Sebastian Will, and Rolf Backofen
Title: Quantified uncertainty of the flexible protein-protein docking algorithm
Authors: Nathan Clement
Title: pClay: A Precise Parallel Algorithm for Comparing Molecular Surfaces
Authors: Georgi D. Georgiev, Kevin F. Dodd and Brian Y. Chen
WABI Session 2 Alignment
Title: Validating Paired-end Read Alignments in Sequence Graphs
Authors: Chirag Jain, Haowen Zhang, Alexander Dilthey, and Srinivas Aluru
Title: Bounded-length Smith - Waterman alignment
Authors: Alexander Tiskin
WABI Session 3 Read mapping
Title: Context-Aware Seeds for Read Mapping
Authors: Hongyi Xin, Mingfu Shao, and Carl Kingsford
Title: Read Mapping on Genome Variation Graphs
Authors: Kavya Vaddadi, Rajgopal Srinivasan, and Naveen Sivadasan
WABI Session 4 Genomes I: Sequences
Title: Synteny paths for assembly graphs comparison
Authors: Evgeny Polevikov and Mikhail Kolmogorov
Title: Faster pan-genome construction for efficient differentiation of naturally occurring and engineered plasmids with plaster
Authors: Qi Wang, R. A. Leo Elworth, Tian Rui Liu, and Todd J. Treangen
Title: Finding all maximal perfect haplotype blocks in linear time
Authors: Jarno Alanko, Hideo Bannai, Bastien Cazaux, Pierre Peterlongo, and Jens Stoye
WABI Session 5 Genomes II: Rearrangement
Title: Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem
Authors: Yutong Qiu, Cong Ma, Han Xie, and Carl Kingsford
Title: Weighted Minimum-length Rearrangement Scenarios
Authors: Pijus Simonaitis, Annie Chateau, and Krister M. Swenson
Title: THE FUTURE OF WABI -- YOUR INPUT
WABI Session 6 Chromosomes and cells
Title: Topological data analysis reveals principles of chromosome structure in cellular differentiation
Authors: Natalie Sauerwald, Yihang Shen, and Carl Kingsford
Title: Inferring diploid 3D chromatin structures from Hi-C data
Authors: Alexandra Gesine Cauer, Grkan Yardımci, Jean-Philippe Vert, Nelle Varoquaux, and William Stafford Noble
Title: A Combinatorial Approach for Single-cell Variant Detection via Phylogenetic Inference
Authors: Mohammadamin Edrisi, Hamim Zafar, and Luay Nakhleh
Title: Jointly embedding multiple single-cell omics measurements
Authors: Jie Liu, Yuanhao Huang, Ritambhara Singh, Jean-Philippe Vert, and William Stafford Noble
WABI Session 7 Phylogenomics I: Treelike evolution
Title: Building a Small and Informative Phylogenetic Supertree
Authors: Jesper Jansson, Konstantinos Mampentzidis, and Sandhya Thekkumpadan Puthiyaveedu
Title: TRACTION: Fast non-parametric improvement of estimated gene trees
Authors: Sarah Christensen, Erin Molloy, Pranjal Vachaspati, and Tandy Warnow
Title: Rapidly Computing the Phylogenetic Transfer Index
Authors: Jakub Truszkowski, Olivier Gascuel, and Krister M. Swenson
WABI Session 8 Phylogenomics II: Non-treelike evolution
Title: Empirical Performance of Tree-based Inference of Phylogenetic Networks
Authors: Zhen Cao, Jiafan Zhu, and Luay Nakhleh
Title: Consensus Clusters in Robinson-Foulds Reticulation Network
Authors: Alexey Markin and Oliver Eulenstein
Title: Better Practical Algorithms for rSPR Distance and Hybridization Number
Authors: Kohei Yamada, Zhi-Zhong Chen, and Lusheng Wang
WABI Session 9 Phylogenomics III: Non-treelike evolution
Title: Alignment- and reference-free phylogenomics with colored de-Bruijn graphs
Authors: Roland Wittler
Title: A New Paradigm for Identifying Reconciliation-Scenario Altering Mutations Conferring Environmental Adaptation
Authors: Roni Zoller, Meirav Zehavi, and Michal Ziv-Ukelson
BCB and WABI Posters
Modeling Phytoplankton Movement and Fitness in Lakes
Amy R. Lazarte (Reed); Samuel B. Fey (Reed); Anna Ritz (Reed)
Application of Comparative Biosequence Analysis to Understand Antibiotic Resistance in Superbugs
Guinevere Sieradzki (Milwaukee SOE); Sabrina Mierswa (Milwaukee SOE); Jung Lee (Milwaukee SOE)
A novel workflow for semi-supervised annotation of cell-type clusters in mass cytometry data
Abhinav Kaushik (Stanford); Diane Dunham (Stanford); Monali Manohar (Stanford); Kari Nadeau (Stanford); Sandra Andorf (Stanford)
Evaluation of five sentence similarity models on electronic medical records
Qingyu Chen (NIH); Jingcheng Du (UTHealth); John Wilbur (NIH); Sun Kim (NIH); Zhiyong Lu (NIH)
Refinement of G protein-coupled receptor structure models: Improving the prediction of loop conformations and drug binding
Bhumika Arora (IIT Bombay); Venkatesh Kareenhalli (IIT Bombay, Monash U); Patrick Sexton (Monash U)
Augmenting Quality Assurance Measures in Radiation Oncology with Machine Learning
Malvika Pillai (UNC); Karthik Adapa (UNC)
Toward a sequence-based physicochemical approach to variable-length B-cell epitope prediction for antipeptide paratopes recognizing flexibly disordered targets: insights drawn from protein folding modeled as polymer collapse
Salvador Eugenio Caoili (U of Philippines Manila)
Scalable Statistical Introgression Mapping Using Approximate Coalescent-Based Inference
Qiqige Wuyun (MSU); Nicholas Vankuren (U of Chicago); Marcus Kornforst (U of Chicago); Sean Mullen (Boston U); Kevin Liu (MSU)
Contact-assisted protein threading: an evolving new direction
Sutanu Bhattacharya (Auburn); Debswapna Bhattacharya (Auburn)
Pangenome-Wide Association Studies with Frequented Regions
Buwani Manuweera (Montana State); Indika Kahanda (Montana State); Brendan Mumey (Montana State); Joann Mudge (NCGR); Thiruvarangan Ramaraj (NCGR); Alan Cleary (NCGR)
l_0DL: Joint Image Gradient l_0-norm with Dictionary Learning for limited-angle CT
Moran Xu (Southeast U); Dianlin Hu (Southeast U); Weiwen Wu (Chongqing U)
Predicting G-quadruplexes from DNA sequences using multi-kernel convolutional neural networks
Mira Barshai (Ben-Gurion); Yaron Orenstein (Ben-Gurion)
PubTator Central: Automated Concept Annotation of Biomedical Full Text Articles
Chih-Hsuan Wei (NCBI); Alexis Allot (NCBI); Robert Leaman (NCBI); Zhiyong Lu (NCBI)
Majority Vote Cascading: a Semi-Supervised Framework for Improving Protein Function Prediction.
John Lazarsfeld (Tufts); Jonathan Rodriguez (Tufts); Mert Erden (Tufts); Yuelin Liu (Tufts); Lenore Cowen (Tufts)
Integration of heterogeneous experimental data improves global map of human protein complexes
Jose Lugo-Martinez (CMU); Jörn Dengjel (Fribourg); Ziv Bar-Joseph (CMU); Robert F. Murphy (CMU)
Long Non-coding RNA Based Cancer Classification using Deep Neural Networks
Abdullah Mamun (FIU); Ananda Mondal (FIU)
PEARL: Prototype Learning via Rule Learning
Tianfan Fu (GIT); Tian Gao (IBM); Cao Xiao (IQVIA); Tengfei Ma (IBM); Jimeng Sun (GIT)
Reducing Redundancy in Biological Sequence Alignment using Cache Optimization
Evan Stene (UC Denver); Farnoush Banaei-Kashani (UC Denver)
Simulating Uncertainty of Early Warning Scores in Early Sepsis Detection
Ali Jazayeri (Drexel); Muge Capan (Drexel); Christopher Yang (Drexel); Siddhartha Nambiar (NCSU); Maria Mayorga (NCSU); Julie Ivy (NCSU); Ryan Arnold (Drexel)
Pseudotime Based Analysis of Cancer Dynamics
Tasmia Aqila (FIU); Ananda Mondal (FIU)
Community Based Cancer Biomarker Identification from Gene Co-expression Network
Raihanul Tanvir (FIU); Mona Maharjan (FIU); Ananda Mondal (FIU)
Explanation of Machine Learning Models Using Improved Shapley Additive Explanation
Yasunobu Nohara (Kyushu U); Koutarou Matsumoto (Saiseikai Kumamoto); Hidehisa Soejima (Saiseikai Kumamoto); Naoki Nakashima (Kyusu U)
Transmembrane Topology Identification by Fusing Evolutionary and Co-evolutionary Information with Cascaded Bidirectional Transformers
Zhen Li (CUHKSZ); Chongming Ni (Tsinghua U); Sheng Wang (KAUST)
Dynamic interaction network inference from longitudinal microbiome data
Jose Lugo-Martinez (CMU); Daniel Ruiz Perez (BioRG); Giri Narasimhan (FIU); Ziv Bar-Joseph (CMU)