The project has gained many new members in the past few months, so I am going to go over some of the basics of Y-DNA (paternal lineage) testing. I am going to talk about the differences between Y-SNP and Y-STR tests, the differences between BIG Y and SNP Pack tests, and how SNPs are placed on branches of the Y-DNA paternal tree (Y-Tree).
As of today, the project has:
Total Members: 1269
Big Y: 147
Y-SNPs vs Y-STRs
Prior to 2014, the rule for Y-SNPs and Y-STRs was this. Y-SNPs were deep ancestry markers that were good for tracing your lineage thousands of years into the past. On the other hand, Y-STRs were for recent generations and for finding connections to help with genealogy. BIG Y changed all of that! These definitions are no longer true.
Y-SNPs are best understood to be definitive markers of our paternal tree. Each Y-SNP marks a branch point in the tree. They occur randomly every one to four generations along a lineage. At times, a branch has many and even dozens of Y-SNPs. This is because only one man survived from the branch. The most common reasons for this are environmental hardship (i.e. the last ice age), famine, war, and random bad luck.
- Definitive branch markers
- Definitive source for dates
- Reliable range from 100 years ago to earliest prehistoric mankind
- Smaller database size for matching
- Relatively expensive test ($450 to $575)
- Lower cost SNP Packs are branch specific and need branch prediction with Y-STRs
Y-STRs are best understood to be the universal method of detecting potential paternal relationships. They are good for very recent generations through the distant historic past. With them, you can infer deeper ancestry connections and predict Y-Tree membership. You caught that I said infer? Y-STRs are problematic in that they don’t tell you exactly how each match is related. Exact matches can be father-son. They can also reach back over ten generations. Close matches could be anything from a 1st cousin to a 15th cousin. At lower testing levels like the Y-DNA12, Y-DNA25, and Y-DNA37 tests, the match could be even older. Further, because Y-STRs randomly change fairly often, it is possible that over ten generations they could change and then change back to the original value. This changing back (back mutations) can confuse and confound research.
That said, Y-STRs have a real advantage because any male can test for the same Y-STR markers and find matches.
- Universal marker set; no advanced knowledge needed
- Large database size (see current statistics)
- Potential for back mutations
- Difficult to distinguish between different branches of a lineage (2nd cousin vs. 7th cousin)
I think the community has heard a good deal about what the BIG Y test is. What goes unexplained is its utility. Here it goes. The BIG Y test is the single best test for finding branch points for a family genealogy and is also the single best way to find information for the ancient history of a specific male lineage. Because the BIG Y tests all potential marker values for over 10 Million base pairs (A, C, T, or G) of the Y-Chromosome, it finds new markers for new branches. Further, because about the same regions are tested for each sample, results can be used for date calculations.
BIG Y Pros
- Defines new branches
- Allows Y-SNP based date calculations
BIG Y Cons
Family Tree DNA offers SNP Packs of select Y-SNPs for different branches. Haplogroup administrators like the Y-DNA Q-M242 team. They are built with information gained from academic research and BIG Y results. They allow many people to test for Y-SNPs discovered in academic and BIG Y testing. Testing more people this way expands the amount of demographic information available.
SNP Pack pros
- Lower cost (about $119 US)
- Added demographic information
- Split Y-SNPs onto separate branches
SNP Pack cons
- No new markers found
- Cannot calculate branch ages with results
Ancestral (Negative) and Derived (Positive) Y-SNP Status
These terms sound much more complicated than they actually are. When you are ancestral for a Y-SNP, it means you don’t have the interesting variant. You are negative and belong to a different branch. When you are derived for a Y-SNP, you have the variant. That means you are positive and belong to the branch.
For example, Bob is tested for M242. He has the derived T value. Thus, he is positive for M242.
Bob is also tested for M378. He has the ancestral A value. Thus, he is negative for M378.
Building the Y-Tree
The Y-Tree is built using positive and negative values for Y-SNPs. The table below shows the values for four Q Y-SNPs for six men.
To begin building a tree with these results, one starts with Y-SNPs that are shared between all samples. In this case, all six men are positive (+) for M242.
Next, find sets of Y-SNPs where the greatest number of men are positive or negative. Four of the men are positive for M346 while two are negative. Thus, M346 is a branch under M242.
Looking at the four men who are positive for M346, two of them are also positive for M3 while two are negative. Thus, M3 is a branch under M346.
This leaves us with the two men who are positive for M242 but who are negative for M346 and M3. They are both positive for M378. Thus, M378 must also be a branch under M242.
This process is followed as additional Y-SNPs are discovered and added to the Y-Tree. The process can be manual and done by a person, or it can be automated and done by computer software. The Q-M242 tree now has around 4,000 Y-SNPs thanks to BIG Y testing. Therefore, the automated option is often preferred as at least the first step.
Untested Y-SNPs and No-Calls
Often when working with results, it is necessary to use those that have not all been tested for the same Y-SNPs. There are many reasons for this to happen. With older testing methods, someone might simply not have tested a specific Y-SNP. With Geno 2.0 and SNP Pack results, the panels may have some Y-SNPs that did not return results. These are no-calls. For BIG Y results, the tests coverage at a usable level is not guaranteed across the exact same regions every time. Thus, most people will have results for a Y-SNP, but some will not.
This makes the process I demonstrated above more complicated. There are two ways to counter this. You can test specific missing Y-SNPs where they are needed for clarity. You can also test large numbers of men so that you will have some who have results at each branch point.
When working with some data sets like the VCF files from BIG Y results or the current Genographic DAR database, negative Y-SNP values are not included. Without these, negative results cannot be distinguished from untested or no-call results. The last table becomes like this.
As you can imagine, constructing the Y-Tree with this becomes very hard if not impossible. For example, with the limited information here for Jim and Jeff’s results, it is not possible to tell if M378 is on the same branch as M346 or a separate one. This is why the Q-M242 project has those with BIG Y results ask for their BAM files. The BAM files include much additional information. Importantly, they include negative Y-SNP results.
I hope that was helpful. Please post questions and feedback in the comments. Thank you for being part of the project.
Rebekah A. Canada and the Q-M242 Team