ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
2
UK Biobank small insertion-deletion (indel) allele counts for CanVar genes
3
4
MethodologyExemplar VEP indel nomenclature
5
Variants overlapping the coding regions (±25 bp into the introns and UTRs) were extracted from the population VCF (pVCF) file for the final exome data release of UK Biobank for selected transcripts for all CanVar genes. Variant coordinates were converted from GRCh38 to GRCh37 using the UCSC LiftOver tool and subsequently left-normalised against the GENCODE GRCh37 reference genome.
6
HGVSc (single position)HGVSc (multiple positions)
7
Insertionsc.123_124insACGTN/A
8
Deletionsc.123delc.123_128del
9
Duplicationsc.123dupc.123_128dup
10
Observed allele counts were generated for each variant and stratified according to the top-level ethnicity term of the individuals in which the variants were observed. Counts were generated for both the whole UK Biobank cohort, as well as for female participants only. Position-specific QC of variants was conducted to count variants only if total base coverage at that position exceeded 10 reads.Delinsc.123delinsACGTc.123_128delinsACGT
11
12
13
14
15
The Ensembl Variant Effect Predictor (VEP) was used to predict functional consequence and to determine the HGVSc and HGVSp descriptions for each variant; it is these annotations that are listed in this spreadsheet.
16
17
18
Usage
19
This spreadsheet contains the resulting allele counts for any indel variants identified in UKB; frequencies for SNVs, which are not included here, can be found on the respective variant page on the main CanVar website (https://canvaruk.org).
20
21
22
23
It is recommended to first filter according to gene of interest and then identify the variant of interest using either genomic coordinates or the respective HGVSc nomenclature. Care should be taken to ensure that searches based on HGVSc descriptions of variants match the VEP nomenclature as closely as possible to avoid overlooking variants. Guidance on the precise nomenclature is given in green to the right of this worksheet.
24
25
26
27
28
The whole-cohort and female-only counts are listed in two separate worksheets within this document and should be selected as appropriate for the gene of interest.
29
30
31
Please note that indel events deeper than 25 bp into the intron or UTR may be present in UK Biobank but are not listed here, and so their absence from this document does not mean they do not exist in the general population.
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100