1 of 11

DR MUHAMMAD SHAFIQ

  • Machine Learning + Genome-Wide Analysis
  • Predicting Gene Family Functions
  • (In Silico Approach)

2 of 11

WHY PREDICT GENE FUNCTION?

  • • Wet-lab experiments are costly & time-consuming
  • • Genome sequencing is faster than functional validation
  • • Need computational prediction before experiments

3 of 11

WHAT IS GENOME-WIDE ANALYSIS?

  • • Identification of all gene family members
  • • Domain & motif analysis
  • • Phylogenetic relationships
  • • Gene structure & duplication

4 of 11

DATA USED FOR ML MODELS

  • • DNA & protein sequences
  • • Conserved domains (Pfam/SMART)
  • • RNA-Seq expression profiles
  • • Promoter cis-elements
  • • Subcellular localization

5 of 11

HOW MACHINE LEARNING HELPS

  • • Converts gene features into numerical data
  • • Learns patterns from known genes
  • • Predicts functions of unknown genes

6 of 11

POPULAR ML ALGORITHMS

  • • Random Forest
  • • Support Vector Machine (SVM)
  • • Neural Networks
  • • Gradient Boosting
  • • Graph-based learning

7 of 11

CO-EXPRESSION NETWORK APPROACH

  • • Genes with similar expression → similar function
  • • Network clustering + ML
  • • Guilt-by-association principle

8 of 11

CROSS-SPECIES PREDICTION

  • • Models trained on Arabidopsis/Rice
  • • Applied to non-model plants
  • • Conserved gene families

9 of 11

WHY WET-LAB NOT NEEDED INITIALLY

  • • In silico predictions reduce experiments
  • • Rank top candidate genes
  • • Focus validation on most important genes

10 of 11

APPLICATIONS

  • • Stress-responsive gene families
  • • Metabolic pathways
  • • Transcription factor families
  • • Crop improvement

11 of 11

CONCLUSION

  • • Genome-wide analysis provides data
  • • Machine learning finds hidden patterns
  • • Together they predict gene functions efficiently