1
Non-canonical start-codons and where to find them
Students:
Valeria Rubinova (BI)
Bogdan Sotnikov (KRSU, BI)
Supervisor:
Olga Bochkareva (ISTA)
2
Introduction
Most genes in bacteria have a canonical ATG start codon. But in some genes in bacterial DNA, non-canonical GTG and TTG start codons have been observed. We decided to study this issue and research which genes prefer the non-canonical start codons.
Is there a relationship between the function of a gene and its start codon?
Or maybe bacteria that differ in ecological niches prefer different start codons?
3
Aim and objectives of the project:
Aims:
studying the occurrence of non-canonical start codons
Objectives:
4
Pipeline:
Fig. 1 Principal scheme of our workflow
5
GenBank parsing:
We downloaded whole genome and scaffold assemblies of 34 bacteria (generalists and specialists) from GenBank.
List of bacteria studied in the project:
Generalists: | Specialists: |
Bacillus pumilus, Bacillus subtilis, Bacillus thuringiensis, Lactococcus lactis, Enterococcus faecalis, Agrobacterium tumefaciens, Klebsiella pneumoniae, Escherichia coli, Acinetobacter baumannii, Stenotrophomonas maltophilia, Pseudomonas aeruginosa, Pseudomonas fluorescens, Pseudomonas putida | Salinibacter ruber, Mycoplasma bovis, Corynebacterium pseudotuberculosis, Bifidobacterium bifidum, Staphylococcus haemolyticus, Staphylococcus simulans, Weissella cibaria, Lactobacillus salivarius, Chlamydia trachomatis, Burkholderia cenocepacia, Neisseria gonorrhoeae, Neisseria meningitidis, Histophilus somni, Vibrio anguillarum, Vibrio campbellii, Vibrio cholerae, Enterobacter hormaechei, Xanthomonas campestris, Xylella fastidiosa, Francisella tularensis, Leptospira interrogans |
6
Next, we divided all bacteria into 3 groups:
The list of bacteria from group 1 was submitted to the input of the pipeline PanACoTA (PanACoTA prepare) to filter out related assemblies, using the Mash genetic distance.
Fig. 2 Group 1
Fig. 3 Group 3
Assembly filtering:
7
Re-annotation and construction of orthologous groups:
We have used proteinortho for orthologous groups construction
We have re-annotated the sequences with Prokka for avoiding the batch effect
8
Sequences’ alignment:
We have used two aligners: PRANK and MUSCLE
Complete results have been received using MUSCLE
Fig. 4 Fragment of Vibrio Campbelli alignment (screenshot from AliView tool)
9
Descriptive statistics:
Fig. 5 Proportion of Escherichia coli orthologous rows of given start-codons in pangenome fractions
Fig. 6 E. coli gene frequency spectrum
10
Computing difference between start-codons into COGs:
Unit of observation is this case is an assembly with given start codon in given COG.
We have used Kruskal — Wallis test for finding difference between three start-codon groups.
Post-hoc analysis was done using Mann-Whitney U-test with Bonferroni correction.
Start-codon frequency in given COG with given start-codon was downsampled.
COG
Start-codon
Weighted frequency
11
Results:
We had 3 hypotheses about the dependence of the distribution of start codons on:
Distribution of start codons
Hypotheses 1 and 3 were not confirmed: no correlation was found between the representation of start codons depending on the size of the genome and whether the bacterium belongs to generalists and specialists.
Fig. 7 Distribution of non-canonical SCs proportion
On the figure, bacteria are grouped by taxa. Representatives of taxon Pseudomonadota had a lower percentage of genes with non-canonical start codons compared to representatives of taxon Bacillota considered in this work.
12
The y-axis shows the proportion of genes with a start codon of a given type that have a given function. For example, 5% of all genes with the GTG start codon are responsible for cell wall biogenesis.
Fig. 8 Proportion of Vibrio campbellii SCs vs COG function
Distribution of start codons in genes with different COG category:
We analyzed the distribution of genes by COGs within each organism (we took all the genes with a given start codon (ATG, GTG, TTG) and looked at what proportion of them belonged to one or another COG) (the relative percentage was calculated).
13
Fig. 8 Proportion of Vibrio campbellii SCs vs COG function
Distribution of start codons in genes with different COG category:
In genes responsible for:
the relative percentage of the canonical ATG start codon is greater than that of non-canonical ones.
In genes responsible for:
the relative percentage of non-canonical start codons GTG or TTG is greater than the canonical one.
14
Future plans:
VL created function for a filtering “bad” alignments
VL filtered alignments using the PanACoTA pipeline
BS worked on statistical analysis
Other objectives have been contributed by authors equally
15
Contacts:
GitHub:
Media:
16
Acknowledgements: