Allele normalisations
in the European Variation Archive
The basics
Simple variations
The most basic case Single Nucleotide Variation stays the same
1 1000 A T → 1 1000-1000 A T
Multiple Nucleotide Variation coordinates represent the whole change
1 1000 ATC GGT → 1 1000-1002 ATC GGT
1 1000 ACTC AGCT → 1 1001-1003 CTC GCT
Insertions and deletions
Insertions coordinates represent the inserted nucleotides
1 1000 A ATC → 1 1001-1002 - TC
1 1000 AG ATC → 1 1001-1002 G TC
Deletions coordinates represent the deleted nucleotides
1 1000 ATC A → 1 1001-1002 TC -
1 1000 ATC GC → 1 1000-1001 AT G
Allele alignment
Following dbSNP rules, try to generate alleles as left-aligned as possible
1 1000 AGTTC AGCC → 1 1000-1003 AGTT AGC
1 1000 AGTT AGC → 1 1002 -1003 TT C
Co-located variants
1 1000 TGACGTAACGATT T,TGACGTAACGGTT,TGACGTAATAC
Results in 3 different variants
1 1000-1011 TGACGTAACGAT - → last T removed
1 1010-1010 A G → last TT removed, then the prefix
1 1008-1012 CGATT TAC→ common prefix removed