1 of 20

A Mixed Convolutional Neural Network for

Pre-miRNA Classification

Supervised By

Prof. Dr. Md. Al Mehedi Hasan

Dept. of Computer Science & Engineering

Rajshahi University of Engineering & Technology

Presented By

Abu Zahid Bin Aziz

Dept. of Computer Science & Engineering

Rajshahi University of Engineering & Technology

2 of 20

Outline

Introduction
Motivation
Purpose
Previous works
Dataset collection
Data preprocessing
Architecture of our CNN model
Method evaluation metrics
Hyperparameter Tuning
Results
Future scope
References

3 of 20

Introduction

MicroRNAs (miRNAs) are small, non-coding (≈22 nt) RNAs.
They can play significant roles in RNA splicing and translational repression.
The formation of pre-miRNA from mi-RNA has two different pathaways –

Canonical pathway
Mirtron pathway

Our work focuses on differentiating mirtrons from canonical mi-RNAs using nucleotide sequences.

4 of 20

Motivation

Predictors that were developed using laboratory techniques had some demerits such as-

1. Costly.

2. Time consuming.

3. Required experienced professionals for maintenance.

These problems encouraged us for applying computational methods in this task.

5 of 20

Purpose

To find a straightforward way to classify mirtrons from canonical mi-RNAs.
To investigate CNN’s performance on biological sequence inputs.
To investigate whether mixed CNN can produce better result than a sequential CNN model.
To employ a CNN model which will produce better performances than the existing ones.

6 of 20

Previous works

Machine earning methodologies

SVM classifier by Ng et al. [1]
Random Forest (RF) classifier, “MiPred” [2]
“microPred” by Batuwita et al. [3]
SVM classifier by Rorbach et al. [4](Mirtron classifier)

Deep learning methodologies

RNN classifier by Park et al. [5]
CNN classifier by Zheng et al. [6](Mirtron classifier)

7 of 20

Dataset collection

In this investigation we used two datasets. A summary is given below:

**This is the dataset used in Rorbach et al. and Zheng et al.’s work.[4,6].

Dataset Name	No. of Canonical miRNAs	No. of Mirtrons	Total
miRBase	707	216	923
Putative mirtrons	0	201	201
Merged Dataset	707	417	1124

8 of 20

Data preprocessing

Our data processing stage consisted of two steps:

1. One-hot encoding 2. Padding

9 of 20

CNN Architecture(General)

Fig.1: A general CNN model.

10 of 20

CNN Architecture(Mixed)

Fig.2: Architecture of our mixed CNN model.

11 of 20

Hyperparameter tuning

In our investigation we tuned three parameters:

1. Number of iterations (10000)

2. Dropout probability (0.40)

3. Learning rate (0.0001)

we analyzed the value of loss function after each iteration to tune the number of iterations.
We employed k-fold cross-validation(k=5) and grid search for learning rate and dropout probability.

12 of 20

Hyperparameter tuning (cont’d)

Fig. 3: Surface plot of grid search results for learning rate and dropout

probability. The deep red color suggests the highest accuracy.

13 of 20

Method evaluation metrics

Sensitivity
Specificity
F1-Score
Mathews Correlation Coefficient (MCC)
Accuracy
Area Under Curve (AOC)

14 of 20

Results

A comparison between the performance of our model and the existing model is given below:

Metrics	CNN model[6]	Mixed CNN model
Sensitivity	0.871	0.872
Specificity	0.970	0.977
F1-Score	0.916	0.911
MCC	0.845	0.869
Accuracy	0.920	0.941
AUC	0.908	0.916

15 of 20

Results (cont’d)

ROC curve comparison:

Fig.4: A comparison between the ROC curve of our model and the existing model.

16 of 20

Publication

Based on the findings of our work, a paper was accepted and presented in the 3^rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE 2019).

17 of 20

Future scope

Taking account of mature miRNAs.
Tuning more parameters by using k=10 in cross-validation.
Providing a web server for the model.
Investigating CNN’s performance in other biological datasets.

18 of 20

References

1. K. L. S. Ng and S. K. Mishra, “De novo svm classification of precursor micrornas from genomic pseudo hairpins using global and intrinsic folding measures,” Bioinformatics, vol. 23, no. 11, pp. 1321–1330, 2007.

2. P. Jiang, H. Wu, W. Wang, W. Ma, X. Sun, and Z. Lu, “Mipred: classification of real and pseudo microrna precursors using random forest prediction model with combined features,” Nucleic acids research, vol. 35, no. suppl 2, pp. W339–W344, 2007.

3. R. Batuwita and V. Palade, “micropred: effective classification of pre-mirnas for human mirna gene prediction,” Bioinformatics, vol. 25, no. 8, pp. 989–995, 2009.

19 of 20

References (cont’d)

4. G. Rorbach, O. Unold, and B. M. Konopka, “Distinguishing mirtrons from canonical mirnas with data exploration and machine learning methods,” Scientific reports, vol. 8, no. 1, p. 7560, 2018.

5. S. Park, S. Min, H.-S. Choi, and S. Yoon, “Deep recurrent neural network-based identification of precursor micrornas,” in Advances in Neural Information Processing Systems, 2017, pp. 2891–2900.

6. X. Zheng, S. Xu, Y. Zhang, and X. Huang, “Nucleotide-level convolutional neural networks for pre-mirna classification,” Scientific reports, vol. 9, no. 1, p. 628, 2019.