1 of 28

Predicting Miscarriage Using Vaginal Microbiomes

By: Mariam Elsharkawy, Faith Egbeni, Joann Tran, Kacee Chan, Yuyan Huang, and Gayathri Govind

Lead TAs: Leen Arnaout & Oluwaseun Adegbite

Assistant TA: Anjali Singh

2 of 28

01

Overview

03

02

04

Vaginal Microbiome and Miscarraige

Introduction

Supervised vs. Unsupervised

Data & Algorithms

Results & Analysis

Impact

Real World

Applications

3 of 28

Introduction

4 of 28

What is a Miscarriage?

  • Miscarriage, or the failure of a pregnancy to continue beyond the 20th week, mostly occur due to the fetus not developing as expected.
  • 10-15% of all pregnancies end in miscarriage
  • It can be physically challenging for the pregnant person due to a number of factors:

* Inability to clear miscarriage on its own

* Could lead to infertility

Introduction

5 of 28

Causes of Miscarriage

  • Chromosomal abnormalities cause about 50% of all miscarriages in the first trimester(13 weeks) of pregnancy.
  • In recent data, research has also shown that one of the most common causes of miscarriages can be dysbiosis, or imbalances of the vaginal microbiome.

Introduction

6 of 28

Vaginal Microbiome

Imbalance of healthy and unhealthy bacteria in vagina increases likelihood of miscarriage

Introduction

7 of 28

Objective

Develop different types of machine learning models to predict the likelihood of a miscarriage based on a patient’s vaginal microbiome

Introduction

8 of 28

Data & Algorithms

9 of 28

Data Used

  • Data from 2019 Imperial College London study

  • The data contained DNA sequences from microbiome samples and whether a miscarriage occured
  • Needed to clean data so we could predict miscarriage

Data & Algorithms

10 of 28

Data Prep

Data & Algorithms

11 of 28

Algorithm:

decision tree

Data & Algorithms

12 of 28

Algorithm:

random forest

Data & Algorithms

13 of 28

Algorithm:

logistic regression

Data & Algorithms

14 of 28

Unsupervised Learning Algorithm:

k means clustering

Data & Algorithms

15 of 28

Results & Analysis

16 of 28

Principal Components Analysis (PCA):

Results & Analysis

  • Dimensionality-reduction method that is used to reduce large data sets
  • Transforms a large set of variables into a smaller one that still contains most of the information in the large set.
  • PCA is only a reliable method if the data are at least interval scaled and approximately normally distributed.

17 of 28

Accuracy Scores

  • Code: classifier.score(X_test, y_test)
  • The closer the score is to 1, the more accurately the algorithm can predict the likelihood of miscarriage from microbiome data (true positive / number of samples)
  • Used to objectively compare different algorithms

Results & Analysis

18 of 28

Accuracy Scores

Results & Analysis

19 of 28

Comparing Algorithms

Results & Analysis

20 of 28

Final Algorithm Choice

Results & Analysis

More Reliable / Consistent

21 of 28

Quantifiable Results

Results & Analysis

22 of 28

Our Best Algorithm

Results & Analysis

Logistic Regression

  • Most reliable / consistent algorithm
  • S-curve is good at approximating datasets with binary classifications

23 of 28

Limitations

24 of 28

Limitations & Resolutions

Limitations:

  • Data bias

-large enough sample size

-information about the patients

  • Unknown sampling method
  • Misrepresentation of the population

Resolutions:

  • Larger sample data set
  • 1000 samples per class
  • Randomly sampled

25 of 28

Applications

26 of 28

Applications

Understand harmful bacteria in microbiome

Prevent miscarriages

27 of 28

Works Cited

28 of 28

Thanks!

Do you have any questions?

CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik