ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
Listing Criteria for SNP Inclusion
2
into the ISOGG Y-DNA Haplogroup Tree - 2019-2020
3
4
The entire work is identified by the Version Number and date given on the Main Page. Directions for citing the document are given at the bottom of the Main Page.
5
Version History Last revision date for this specific page: 20 November 2018
6
7
Links:Main Page
8
9
Introduction
10
These recommendations are to assure that there is a uniform set of criteria for accepting new mutations for inclusion on the ISOGG Y-DNA haplogroup tree.
11
Because of the abundance of alternatives now available, only single nucleotide polymorphisms (SNPs) are being accepted, and not insertions or deletions (indels) for new additions. In exceptional cases, other variants may be considered for inclusion on a case by case basis if they can be clearly demonstrated to have equivalent properties to SNPs, but the burden of proof required will be much higher and at the discretion of the committee.
12
13
Special Coding for Interpreting SNP status
14
15
Added SNPs are color coded red and defined as SNPs that have met all of the criteria for inclusion and did not appear on last year's tree.
16
SNPs under Investigation are color coded gray and are SNPs that have not yet been fully accepted on the tree because additional testing is needed to confirm adequate positive samples and/or correct placement on the tree. While there are no rules for including gray items, those that fail 4, 6 or 7 of the quality guidelines below are not added to the tree.
17
SNPs found solely from next generation sequencing are colored either black or red and shown in italics; they indicate quality, consistent reads found in Y sequencing. These are not confirmed by Sanger sequencing or other testing and sometimes may not be amenable to either process.
18
SNP(s) printed in bold in a subclade: The criteria for a representative SNP printed in bold for a subclade is that it has traditionally represented that subgroup or seems the most promising representative.
19
Identical SNPs are SNPs that have the same y-position, mutation, and subclade within a haplogroup and were discovered in different labs. They are listed in alphabetical order, (not necessarily in the order of discovery), and are separated by "/". Examples: P257/U6, L31/S149.
20
Mutation names followed by ^ represent ones from next-generation sequencing which do not yet meet quality guidelines for minimum number of reads. They may also represent items from microarray or spectographic tresting whose reliability at a particular site is not yet documented for a particular testing company. Those with ^^ represent mutations that do not meet quality guidelines but may be a helpful identifier. ~ indicates a subgroup whose position on the tree is only approximate.
21
22
A. General Requirements for SNP Validation
23
24
1. Inserting a SNP by Creating a NON-Terminal Branch to the ISOGG Tree
25
The supporting information provided by the proposer should demonstrate that the new SNP is downstream of an established tree mutation. There is need also to show that the SNP was tested in individuals from all subgroups on the tree parallel to the new subgroup, and that a man with the new mutation is negative for all the parallel subgroups. There must be men with the new mutation who have divergent results for the subgroups under it.
26
27
28
Example: Suppose that a new subgroup is being added with name of Q18.
29
Fictional example:
30
G-L140
31
G-L13
32
G-L1266
33
G-Q18
34
G-L1268
35
36
One man each from the L1266 and L13 subgroups must be Q18-. Simultaneously one Q18+ man must be L1266- and L13-. In addition, one Q18+ man must be L140+, L1268+, and a second L140+, Q18+ man must be L1268-.
37
38
2. Adding a SNP Representing a New Terminal Branch to the ISOGG Tree
39
The man with the new mutation must also have the mutation of the subgroup immediately upstream on the tree. He must also lack the mutation of the parallel subgroup or subgroups. And the parallel subgroup(s) must lack the new mutation.
40
41
42
Example: Suppose that a new subgroup is being added with name QQ12.
43
Fictional example:
44
G-L5432
45
G-P343
46
G-QQ12
47
48
Then the evidence for QQ12 must show that two men are L5432+, QQ12+. Simultaneously one man from P343 must be QQ12-. Also, one of the QQ12 men must be P343-
49
50
3. Missing Information
51
If there are multiple mutations defining a subgroup and they all cannot be tested, then the decision must be made whether to put those mutations in investigational gray below the confirmed subgroup or instead to place the new subgroup itself in investigational gray due to the lack of information. Also, in the case where there is missing information about a proven parallel subgroup mutation, the new mutations can be placed in investigational gray until the missing information is available.
52
53
54
B. Requirements for Specific Type of Testing
55
56
Sanger Sequencing
57
As of this writing, Sanger sequencing is available to the public only at the German company, YSeq. This test compares the results from testing against about 1000 adjacent base pairs in the reference sample. This is a larger comparison segment than the other tests provide. SNP information confirmed by Sanger sequencing appears in normal font on the tree.
58
59
Next Generation Sequencing
60
Next-generation Y sequencing is available for the genealogical community at Full Genomes Corporation, in Family Tree DNA's Big Y Test, at 23mofang or via whole genome sequencing. Additional companies are expected to offer this testing. Next generation sequencing compares fragments of about 200 adjacent base pairs against the reference sample. If the evidence as to a SNP is based solely on next-generation sequencing, the SNP will appear in italics on the tree
61
62
Microarray Chip-based Genotyping and Spectometry Testing
63
Examples of these type tests are Geno 2.0 Next-Generation test, 23andMe, ancestry.com, Chromo 2.0 and Family Tree DNA's SNP packs or its individual tests. This type testing targets a selected SNP. Use of SNP evidence based on these methods requires that the SNP name should be followed by the ^ symbol unless evidence is available from a sample tested by other means that the relevant sites are providing the same, correct information for those sites at that specific laboratory.
64
65
C. Quality Guidelines
66
67
To include an SNP name on the tree without the ^^ symbol following it -- except as specified -- these criteria must be met:
68
69
# When 500 adjacent base pairs are viewed with the mutation site in the center, the same sequence cannot appear at another chromosome site where 95.5% or more of the base pairs are in the same sequence. This applies only to those displayed comparisons where the number of base pairs compared are 500 or almost 500 in number, and not for smaller numbers. The 500 adjacent base pairs can be obtained using the ISOGG Y-Browse (http://ybrowse.y-chromosome.org/gb2/gbrowse/chrY/), and then the percentage is obtained using the BLAT Search at https://genome.ucsc.edu/cgi-bin/hgBlat.
70
71
Most of the remaining specifications require the use of a chromosome browser with detailed information about reads, such as the IGV browser from the Broad Institute. Multiple samples, preferably some from the sample subgroup, should be used to evaluate the site.
72
73
1. The total number of reads for that site in the sample providing the evidence for a new SNP must be at least four. If the number of reads is less than 4 and all the remaining criteria are met, it can be added to the tree with the ^ symbol.
74
75
2. The percentage of reads showing the mutation in the candidate sample must be at least 87%. For less than 10 reads, 100% of the reads must show the mutation for a terminal SNP. Any reads with a mapping quality score less than 10 can be ignored in meeting the criteria of this paragraph, but these should be subtracted from the total number of reads. In IGV, poor quality reads have a muted color.
76
77
3. The mutation site cannot already be listed on the ISOGG tree in three or more locations.
78
79
4. The mapping quality score for the reads at the site must average more than 30 when viewing the results from varied samples. And mapping quality reads under 30 cannot exceed 10% of the total in reads showing the mutation. In IGV, the mapping quality score displays as the cursor passes over each read. Items that do not meet the # 4 criteria here should never be added to the tree.
80
81
5. No additional called mutations for that individual immediately adjacent to the site or nearby.
82
83
6. The mutation site must not be part of a series of repeated allele sequences. Of particular concern are SNPs within a STR (STR alerts are displayed for relevant locations in ISOGG YBrowse) and in the DYZ19 (125-base-pair-repeat region which is also displayed in ISOGG Ybrowse when these tracks are not suppressed) Items in these locations should never be added to the tree.
84
85
7. When viewing the SNP in multiple samples, a significant number of inconclusive reads nearby would be grounds for leaving the site off the tree.
86
87
88
Corrections/Additions made since 1 January 2019:
89
90
None
91
92
Copyright 2019. International Society of Genetic Genealogy. All Rights Reserved
93
94
95
96
97
98
99
100