1 of 20

1

A Mixed Convolutional Neural Network for

Pre-miRNA Classification

Supervised By

Prof. Dr. Md. Al Mehedi Hasan

Dept. of Computer Science & Engineering

Rajshahi University of Engineering & Technology

Presented By

Abu Zahid Bin Aziz

Dept. of Computer Science & Engineering

Rajshahi University of Engineering & Technology

2 of 20

Outline

  • Introduction
  • Motivation
  • Purpose
  • Previous works
  • Dataset collection
  • Data preprocessing
  • Architecture of our CNN model
  • Method evaluation metrics
  • Hyperparameter Tuning
  • Results
  • Future scope
  • References

2

3 of 20

Introduction

  • MicroRNAs (miRNAs) are small, non-coding (≈22 nt) RNAs.
  • They can play significant roles in RNA splicing and translational repression.
  • The formation of pre-miRNA from mi-RNA has two different pathaways –
    1. Canonical pathway
    2. Mirtron pathway
  • Our work focuses on differentiating mirtrons from canonical mi-RNAs using nucleotide sequences.

3

4 of 20

Motivation

    • Predictors that were developed using laboratory techniques had some demerits such as-

1. Costly.

2. Time consuming.

3. Required experienced professionals for maintenance.

    • These problems encouraged us for applying computational methods in this task.

4

5 of 20

Purpose

    • To find a straightforward way to classify mirtrons from canonical mi-RNAs.
    • To investigate CNN’s performance on biological sequence inputs.
    • To investigate whether mixed CNN can produce better result than a sequential CNN model.
    • To employ a CNN model which will produce better performances than the existing ones.

5

6 of 20

Previous works

  • Machine earning methodologies
    • SVM classifier by Ng et al. [1]
    • Random Forest (RF) classifier, “MiPred” [2]
    • microPred” by Batuwita et al. [3]
    • SVM classifier by Rorbach et al. [4](Mirtron classifier)

  • Deep learning methodologies
    • RNN classifier by Park et al. [5]
    • CNN classifier by Zheng et al. [6](Mirtron classifier)

6

7 of 20

Dataset collection

In this investigation we used two datasets. A summary is given below:

**This is the dataset used in Rorbach et al. and Zheng et al.’s work.[4,6].

7

Dataset Name

No. of Canonical miRNAs

No. of Mirtrons

Total

miRBase

707

216

923

Putative mirtrons

0

201

201

Merged Dataset

707

417

1124

8 of 20

Data preprocessing

  • Our data processing stage consisted of two steps:

1. One-hot encoding 2. Padding

8

9 of 20

CNN Architecture(General)

Fig.1: A general CNN model.

9

10 of 20

CNN Architecture(Mixed)

Fig.2: Architecture of our mixed CNN model.

10

11 of 20

Hyperparameter tuning

  • In our investigation we tuned three parameters:

1. Number of iterations (10000)

2. Dropout probability (0.40)

3. Learning rate (0.0001)

  • we analyzed the value of loss function after each iteration to tune the number of iterations.
  • We employed k-fold cross-validation(k=5) and grid search for learning rate and dropout probability.

11

12 of 20

Hyperparameter tuning (cont’d)

Fig. 3: Surface plot of grid search results for learning rate and dropout

probability. The deep red color suggests the highest accuracy.

12

13 of 20

Method evaluation metrics

  • Sensitivity
  • Specificity
  • F1-Score
  • Mathews Correlation Coefficient (MCC)
  • Accuracy
  • Area Under Curve (AOC)

13

14 of 20

Results

A comparison between the performance of our model and the existing model is given below:

14

Metrics

CNN model[6]

Mixed CNN model

Sensitivity

0.871

0.872

Specificity

0.970

0.977

F1-Score

0.916

0.911

MCC

0.845

0.869

Accuracy

0.920

0.941

AUC

0.908

0.916

15 of 20

Results (cont’d)

ROC curve comparison:

Fig.4: A comparison between the ROC curve of our model and the existing model.

15

16 of 20

Publication

Based on the findings of our work, a paper was accepted and presented in the 3rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE 2019).

16

17 of 20

Future scope

  • Taking account of mature miRNAs.
  • Tuning more parameters by using k=10 in cross-validation.
  • Providing a web server for the model.
  • Investigating CNN’s performance in other biological datasets.

17

18 of 20

References

1. K. L. S. Ng and S. K. Mishra, “De novo svm classification of precursor micrornas from genomic pseudo hairpins using global and intrinsic folding measures,” Bioinformatics, vol. 23, no. 11, pp. 1321–1330, 2007.

2. P. Jiang, H. Wu, W. Wang, W. Ma, X. Sun, and Z. Lu, “Mipred: classification of real and pseudo microrna precursors using random forest prediction model with combined features,” Nucleic acids research, vol. 35, no. suppl 2, pp. W339–W344, 2007.

3. R. Batuwita and V. Palade, “micropred: effective classification of pre-mirnas for human mirna gene prediction,” Bioinformatics, vol. 25, no. 8, pp. 989–995, 2009.

18

19 of 20

References (cont’d)

4. G. Rorbach, O. Unold, and B. M. Konopka, “Distinguishing mirtrons from canonical mirnas with data exploration and machine learning methods,” Scientific reports, vol. 8, no. 1, p. 7560, 2018.

5. S. Park, S. Min, H.-S. Choi, and S. Yoon, “Deep recurrent neural network-based identification of precursor micrornas,” in Advances in Neural Information Processing Systems, 2017, pp. 2891–2900.

6. X. Zheng, S. Xu, Y. Zhang, and X. Huang, “Nucleotide-level convolutional neural networks for pre-mirna classification,” Scientific reports, vol. 9, no. 1, p. 628, 2019.

19

20 of 20

THANK YOU

20