Over the last many years, many AICML researchers (including PIs, PDFs, students and research programmers) have been actively engaged with many teams of medical researchers, exploring ways to use patient data to produce classifiers that can make accurate predictions about future patients.
These projects involve using various information about a patient with the goal of predicting some relevant property of that patient.
We seek ways to "learn" these predictors, from historical data, often augmented with other prior biological data (such as metabolic or signaling pathways).
See also Amii pages on: enhancing cancer care fMRI-based analysis
Predicting Characteristics of Kidney Transplants - retired
Our colleagues at ATAGC (the Alberta Transplant Applied Genomics Centre) are trying to better understand why some transplants will respond well, while others will not. To obtain the relevant information, they have performed many hundreds of kidney transplants, of both mice and of men, and have been tracking the developments, recording both histological information as well as gene expressions values (microarrays) of biopsies -- both protocol biopsies (obtained just after the transplant) and biopsies for cause (obtained after patients develop some symptoms). This team has also defined a set of "coherent gene sets", called Pathogenesis-Based Transcript Sets (PBTs), which are intended to summarize the 50K expression values in a microarray, using just a few dozen values.
Relevant Technologies: Histology, Microarrays, Clinical
Active Collaborative Projects:
- Predicting Glomerular filtration rate (GFR) from microarray data
- Predicting lesions, reject/ no-reject and rejection type from microarray data
- Better understanding of gene sets (PBTs): why are mice PBTs coherent, but human PBTs are not? Can we make the human PBTs more coherent?
- ATAGC team: Phil Halloran MD, Michael Mengel MD, Banu Sis MD, Jeff Reeve, Konrad Famulski
- AICML Team: Russ Greiner, Nasimeh Asgarian, Saman Vaisipour (PhD Student), Sheehan Khan (PhD Student)
- M. Hajiloo, D. Moulavi et al. Combining gene expression and interaction network data to improve kidney lesion score prediction. International Journal of Bioinformatics Research and Applications (IJBRA), to appear. (link)
- F. Mahdavifard. Making Gene Sets More Coherent. MSc Thesis, October 2009 (link)
Predictions for Cancer Patients
Our colleagues at the Cross Cancer Institute (part of the Alberta Health Services) are trying to better understand various cancers, and in particular, determine which patients should receive which treatment. This team has amassed a wealth of important data: gene expression values (microarray) on flash frozen tumor specimens from patients and SNP (single nucleotide polymorphism) profiles from hundreds of patients and controls.
Relevant Technologies: Microarrays, SNPs
Active Collaborative Projects:
- Building classifiers to predict breast cancer relapse using microarray profiles
- Predicting the sub phenotypes of cancer (eg, ER, Her2, or triple negative status) from microarray data
- Building classifiers to predict breast cancer risk using SNP profiles from genome scans
- Building classifiers to predict sub phenotypes in breast cancer
- Integration of SNP and microarray signatures to identify signatures of prognostic and predictive value
- Predicting relapse from sub-cellular location of certain adhesion proteins
- Cross Cancer Institute Breast Cancer Research Team: Sambasivarao Damaraju (PhD), John Mackey (MD), Kathryn Graham (PhD), Badan Sehrawat (PhD)
- Cross Cancer Institute Prostate Cancer Research Team: Sambasivarao Damaraju (PhD), Matthew Parliament (MD), Badan Sehrawat (PhD)
- Cell Biology Department team: Manijeh Pasdar (PhD), Zackie Aktary (PhD Student)
- AICML Team: Russ Greiner, Nasimeh Asgarian, Saman Vaisipour (PhD Student), Sheehan Khan (PhD Student), Babak Damavandi (MSc Student), Mohsen Hajiloo (PhD Student), Metanat Hooshsadat (MSc Student), Farzad Sangi (MSc Student), Roman Eisner
Joint Grants (with Drs. Damaraju and Mackey)
- A Genome-Wide Search for Identification of Breast Cancer Risk Factors and Prognostic Markers Using Single Nucleotide Polymorphisms (PI: Dr. S. Damaraju).
- Genome-Wide Single Nucleotide Polymorphism Based Association Studies in non Metastatic Breast Cancer (PI: Dr. S. Damaraju).
- Novel genetic markers of breast cancer risk (PI: Dr. J Mackey).
- Novel genetic and virologic markers in breast cancer (PI: Dr. J Mackey).
- Identification and validation of pathways associated with failure of standard adjuvant therapy in early stage breast cancer (PI: Dr J. Mackey)
- B. Sehrawat, M.C. Sridharan et al. Potential novel candidate polymorphisms identified in genome-wide association study for breast cancer susceptibility, Human Genetics, March 2011. (link)
- J. Listgarten, S. Damaraju et al. Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Clinical Cancer Research (CCR), April 2004. (link)
- N. Asgarian, X. Hu et al. Learning to Predict Relapse In Invasive Ductal Carcinomas based on the Subcellular Localization of Junctional Proteins. Breast Cancer Research and Treatment, September 2009. (link)
- S. Damaraju, D. Murray et al.Association of DNA Repair and Steroid Metabolism Gene Polymorphisms with Clinical Late Toxicity in Patients Treated with Conformal Radiotherapy for Prostate Cancer. Clinical Cancer Research (CCR), April 2006. (link)
- F. Mirzazadeh. Using SNP Data to Predict Radiation Toxicity for Prostate Cancer Patients.MSc Thesis, University of Alberta, February 2010 (link)
Producing and Analyzing Metabolomic Profiles
Metabolites are organic compounds that are used or produced during metabolism. Under different conditions, different organisms may change their metabolism. Using the advanced technology of NMR spectroscopy, each urine sample generates a unique metabolomic signature. This metabolomic signature can help diagnosis of diseases and can help doctors to choose the best treatment.
Relevant Technologies: Metabolomics
- CS department: David Wishart
- Cross Cancer Institute: Vickie Baracos, Cynthia Stretch (PhD Student)
- AICML Team: Russ Greiner, Nasimeh Asgarian, Thomas Eastman (MSc Student), Siamak (Mohsen) Ravanbakhsh (PhD Student), Roman Eisner
- Predicting cachexia from patient metabolic profile (using NMR spectra of urine)
- Computing metabolic profiles from NMR spectra
- AAET/Genome Alberta grant
- N. Psychogios, D.D. Hau et al. The Human Serum Metabolome, PLoS One, February 2011.(link)
- R. Eisner, C. Stretch et al. Learning to predict cancer-associated skeletal muscle wasting from 1H-NMR profiles of urinary metabolites, Metabolomics Journal 2010. (link)
- S. Ravanbakhsh, B. Poczos, and R. Greiner. A Cross-Entropy Method that Optimizes Partially Decomposable Problems: A New Way to Interpret NMR Spectra, National Conference on Artificial Intelligence (AAAI), July 2010. (link)
- C. Knox, V. Law et al. DrugBank 3.0: a comprehensive resource for 'omics' research on drugs, Nucleic Acids Res. 2011 (link)
- D. Wishart, M. Lewis et al. The Human Cerebrospinal Fluid Metabolome. Journal of Chromatography B-Analytical Technologies in the Biomedical and Life Sciences, August 2008.(link)
- Wishart DS, Knox C et al. HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res. 2009 (link)
- D.S. Wishart, D. Tzur et al. HMDB: The Human Metabolome Database. Nucleic Acids Res. 2007(link)
- C. Knox, S. Shrivastava et al. BioSpider: a web server for automating metabolome annotations. Pacific Symposium on Biocomputing 2007 (link)
- G. Van Domselaar, P. Stothard et al. BASys: a web server for automated bacterial genome annotation. Nucleic Acids Research (NAR), July 2005. (link)
- D. Wishart, R. Greiner. Computational Approaches to Metabolomics: An Introduction". Pacific Symposium on Biocomputing, August 2007. (link)
- S. Ravanbakhsh. A Stochastic Optimization Method for Partially Decomposable Problems, with Application to Analysis of NMR Spectra. MSc Thesis, September 2009 (link)
- T. Eastman. A Disease Classifier for Metabolic Profiles Based on Metabolic Pathway Knowledge. MSc Thesis, University of Alberta, February 2010. (link)
- R. Eisner, J. Xia et al. Prediction of Cancer-Associated Skeletal Muscle Wasting Using Targeted Profiling of Urinary Metabolites. Metabolomics Society Meeting, August 2009. (link) (Poster)
Brain Tumour Analysis Project (BTAP) - retired
Our colleagues at Cross Cancer Institute (part of the Alberta Cancer Board) are analyzing brain tumour patient data, in order to better understand tumour behaviors, treatment effects, and likely patient outcomes, towards using this analysis to design more efficient and effective treatments for patients. This team has assembled hundreds of expert-labeled Magnetic Resonance (MR) patient scans.
Relevant Technologies: Image Analysis
Active Collaborative Projects (please also see the project website):
- Predicting the survival time for each patient
- Segmenting the MR scans, to identify the location of the tumour (autonomous, and semi-automated)
- Predicting the tumour growth
- Producing a user-friendly interactive database, allowing both physicians and cancer researchers obtain relevant information about tumours
Partners (see the full list):
- Cross Cancer Institute team: Albert Murtha (MD), Brock Debenham, Nelson Leong, Jonathan Livergant
- AICML Team: Russ Greiner (PhD), Ross Mitchell (PhD), Maysam Heydari (MSc Student), Bret Hoehn, Maike Sussmann, Sinja Kaefer
- CS department: Jörg Sander (PhD), Karteek Popuri, Dana Cobzas (PhD)
- Others: Tibor Kesztyus (MD, PhD)
- RIP (since 2005)
- The Virtual Biopsy Project: Non-Invasive Molecular Diagnosis in Glioblastoma (from Terry Fox Research Institute), 2010-2011
Intelligent Diabetes Management
A typical patient with Type I diabetes must give him/herself insulin injections several times a day, to keep his/her blood glocose level (BG) in an acceptable range. The amount of each injection (for each type of insulin) depends on a formula that uses his/her current BG, as well as other factors, including the amount of carbohydrates that s/he is about to consume, and the anticipated exercise, as well as previous BG levels and responses. As this specific formula can vary from patient to patient (and for a single patient, over time), patients maintain an "extended glucose log" (EGL), that records all of this information (glucose readings, insulin dose, carbohydrate intake and exercise) at several times. The patient's diabetes team will periodically examine this EGL, to adjust the formula for this patient. Unfortunately, time is limited, which means this important feedback may be infrequent (or perhaps even unavailable for patients in 3rd-world countries).
Our goal is a tool that can automate this adjustment process: given relevant patient information (age, gender, BMI, ...), and his/her recent GL, modify the current parameters of formula. Our initial task is to replicate the health care suggestions. Later, we will explore ways to improve this process, to provide suggestions that better keep the patient's BG in the acceptable range, using techniques from Artificial Intelligence.
- Department of Medicine: Dr. Edmond Ryan (MD)
More information can be found at the Project Website
Microarrays are a way to assay a large amount of biological material using high-throughput screening methods. We have analyzed a variety of microarray data, including DNA microarrays.
Single Nucleotide Polymorphisms
Single Nucleotide Polymorphisms (SNPs) are variations in single nucleotides in DNA sequences. Analyzing SNP data usually requires dealing with very large feature sets, and requires dimensionality reduction.
Metabolomics is the study of small molecules in the body on a large scale. The study of Metabolomics brings up many data processing issues, dependent on the technology used to analyze samples (e.g. NMR, MS), and the type of body fluid being analyzed (e.g. Urine, Blood).
Image Analysis is the automated extraction of useful information from Imaging data.
Histology is the study of cells. We have analyzed histological data of breast cancer patients using machine learning techniques.
Subcellular Localization refers to the location that a protein does the majority of its work within the cell.
Clinical Features are those gathered from clinical studies, and describe high-level characteristics of patients (e.g. age, sex). These descriptors are often included with other analyses (e.g. Microarray, SNP), and provide valuable information about the patients health.
- Patient Characteristics
- Disease type (eg, Hormone Receptor Status)
Foundational Machine Learning Issues
Dealing with the above projects requires addressing many fundamental questions, in the field of machine learning
- Learning predictors that incorporate prior knowledge -- including metabolic/signalling pathways
- High dimensional data with few sample cases (Large-p Small-n)
- Dimensionality reduction
- Understanding and improving "coherence" -- eg, of gene sets (PBTs)
- Make use of the structure of data in learners
- eg, the labeled of neighbouring voxels in the MRI scans are related to each other; suggesting the use of perhaps Conditional Random Fields. How to be more efficient and more accurate?
- Unclear and noisy ground truth
- eg. Different experts provide different segmentations, treatments affect prognosis outcome and behaviour of disease
- How to learn with minimum cost -- eg, Active Learning