1 of 23

Identification of RNA Pseudouridine Sites using Deep Learning Approaches

Supervised By

Dr. Md. Al Mamun

Professor

Dept. of Computer Science & Engineering

Rajshahi University of Engineering & Technology

Presented By

Abu Zahid Bin Aziz

Roll: 1503047

Dept. of Computer Science & Engineering

Rajshahi University of Engineering & Technology

2 of 23

Outline

  • Introduction
  • Motivation
  • Objectives
  • Literature Review
  • Workflow
  • Dataset collection
  • Data preprocessing
  • Architecture of our CNN model
  • Hyperparameter tuning
  • Results
  • Webserver Implementation
  • Future scope
  • Publication
  • References

2

3 of 23

Introduction

3

  • Pseudouridine (Ψ) is the most prevalent RNA modification found in both prokaryotes and eukaryotes.
  • They have been confirmed to occur in rRNA, mRNA, tRNA, and nuclear/nucleolar RNA.
  • Identifying them has vital significance in drug development, gene therapy, academic research etc.
  • In this work, we proposed a multi-stage convolutional neural network for identifying Ψ-sites.

14/02/2021 Pseudouridine Identification by Deep Learning 3

4 of 23

4

  • Previous predictors that were developed using laboratory techniques had some demerits such as-
    • Costly
    • Time consuming
    • Required experienced professionals for maintenance.

  • These problems encouraged us for applying computational methods in this task.

Motivation

14/02/2021 Pseudouridine Identification by Deep Learning 4

5 of 23

5

  • To find an effective computational method to identify Pseudouridine sites.
  • To investigate CNN’s performance on biological sequence inputs.
  • To investigate whether multi-stage CNN model can produce better result than a sequential CNN model.
  • To employ a CNN model which will produce better performances than the existing ones.

14/02/2021 Pseudouridine Identification by Deep Learning 5

Objectives

6 of 23

6

  • Ppus(2015): SVM classifier by Li et al.[1]
  • iRNA-Pseu(2016): Improved from previous classifier by Chen et al.[2]
  • PseUI(2018): SVM classifier by He et al.[3]
  • iPseu-CNN(2019): CNN classifier by Tahir et al.[4]
  • XG-Pseu(2019): Gradient boosting based method Liu et al.[5]
  • iPseu-Layer(2020): An ensemble model Mu et al. [6]

14/02/2021 Pseudouridine Identification by Deep Learning 6

Literature Review

7 of 23

7

14/02/2021 Pseudouridine Identification by Deep Learning 7

Workflow

8 of 23

8

  • Data were collected for three different species which are H. sapience, S. cerevisiae and M. musculus.

Species

Benchmark Samples

Independent samples

Length of Nucleotides

H. Sapiens(HS)

990

200

21

S. Cerevisiae(SC)

628

200

31

M. Musculus(MM)

944

None

21

14/02/2021 Pseudouridine Identification by Deep Learning 8

Dataset Collection

9 of 23

  • Our data preprocessing had only one step- Binary ‘one-hot’ encoding.
  • We did it in two ways:
    • General ‘one-hot’ encoding
    • Merged-seq ‘one-hot’ encoding
  • General ‘one-hot’ encoding converted the inputs into (N, 4) matrix.

9

14/02/2021 Pseudouridine Identification by Deep Learning 9

Data Preprocessing

10 of 23

  • Merged-seq ‘one-hot’ encoding converted our inputs into (N, 12) matrix.

10

14/02/2021 Pseudouridine Identification by Deep Learning 10

Data Preprocessing(Contd.)

11 of 23

11

14/02/2021 Pseudouridine Identification by Deep Learning 11

CNN Architecture(General)

12 of 23

12

14/02/2021 Pseudouridine Identification by Deep Learning 12

CNN Architecture(Multi-stage)

13 of 23

13

  • In our investigation we tuned a number of hyperparameters.
  • We employed k-fold cross-validation(k=10) and grid search for that.

Hyperparameters

Ranges of Values

Datasets

HS_990, MM_944

SC_628

Batch Size

[10,20,30,40]

10

10

No. of Epochs

[10,50,100,200]

100

100

No. of Channels

[5,7,9,10,11]

11

9

Filter Height

[3,5,7,9]

9

7

Learning Rate

[0.001,0.0003,0.0005,0.00057,0.0001]

0.0005

0.0001

Dropout Probability

[0.4,0.45,0.5,0.55,0.6]

0.6

0.4

14/02/2021 Pseudouridine Identification by Deep Learning 13

Hyperparameter Tuning

14 of 23

14

14/02/2021 Pseudouridine Identification by Deep Learning 14

Hyperparameter Tuning(Contd.)

15 of 23

15

Predictors

Independent Datasets

S_200

H_200

AC(%)

SN(%)

SP(%)

MCC

AC(%)

SN(%)

SP(%)

MCC

iRNA-Pseu[2]

60.00

63.00

57.00

0.20

61.50

58.00

65.00

0.23

PseUI[3]

68.50

65.00

72.00

0.37

65.50

63.00

68.00

0.31

iPseu-CNN[4]

73.50

68.76

77.42

0.47

69.00

77.72

60.81

0.40

iPseu-Layer[6]

72.50

68.00

77.00

0.45

71.00

63.00

79.00

0.43

Ours(General)

75.00

67.00

83.00

0.50

72.5

80.00

65.00

0.44

Ours (Merged-seq)

76.50

80.00

73.00

0.53

74.00

73.00

75.00

0.48

14/02/2021 Pseudouridine Identification by Deep Learning 15

Results

*AC= ACCURACY, SN=SENSITIVITY, SP=SPECIFICITY, MCC= MATTHEWS CORRELATION COEFFICIENT

16 of 23

16

14/02/2021 Pseudouridine Identification by Deep Learning 16

Results(Contd.)

17 of 23

17

14/02/2021 Pseudouridine Identification by Deep Learning 17

Results(Contd.)

18 of 23

18

14/02/2021 Pseudouridine Identification by Deep Learning 18

Webserver Implementation(Step-1)

19 of 23

19

14/02/2021 Pseudouridine Identification by Deep Learning 19

Webserver Implementation(Step-2)

20 of 23

20

  • Taking account of RNAs of other species.
  • Taking account of other RNA modifications.
  • Investigating CNN’s performance in other biological datasets.
  • Implementing other encoding techniques.

14/02/2021 Pseudouridine Identification by Deep Learning 20

Future Scopes

21 of 23

  • Based on the findings of our work, a paper was accepted and presented in the iEEE Region 10 Symposium (TENSYMP) 2020.
  • The paper is available in the iEEE Xplore digital library.

21

14/02/2021 Pseudouridine Identification by Deep Learning 21

Publication

22 of 23

[1] Y.-H. Li, G. Zhang, and Q. Cui, “Ppus: a web server to predict pusspecific pseudouridine sites,” Bioinformatics, vol. 31, no. 20, pp. 3362–3364, 2015.

[2] W. Chen, H. Tang, J. Ye, H. Lin, and K.-C. Chou, “iRNA-Pseu: Identifying rna pseudouridine sites,” Molecular Therapy-Nucleic Acids, vol. 5, p.e332, 2016.

[3] J. He, T. Fang, Z. Zhang, B. Huang, X. Zhu, and Y. Xiong, “PseUI: pseudouridine sites identification based on rna sequence information,” BMC bioinformatics, vol. 19, no. 1, p. 306, 2018.

[4] M. Tahir, H. Tayara, and K. T. Chong, “iPseU-CNN: Identifying RNA pseudouridine sites using convolutional neural networks,” Molecular Therapy-Nucleic Acids, vol. 16, pp. 463–470, 2019.

[5] Liu K, Chen W, Lin H. XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites. Molecular Genetics and Genomics. 2020;295(1):13–21.

[6] Mu Y, Zhang R, Wang L, Liu X. iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model. Interdisciplinary Sciences: Computational Life Sciences. 2020; p. 1–11.

14/02/2021 Pseudouridine Identification by Deep Learning 22

References

23 of 23

Thank You