I am starting my first big adventure for the Genetic Genealogy Compendium. Four major genetic genealogy test sellers use microarray chips (Genotyping BeadChips) for products: Ancestry.com, 23andMe, Family Tree DNA, and the National Geographic Genographic Project. These are the tests that match you to recent family from all branches of your family tree.
Each company has changed microarray chip versions at least once. However, information on the details of any one chip can be difficult to find.
Thus, my first step is to create a page in the Genetic Genealogy Compendium for each chip version. To test the concept, I have started with the two Ancestry.com chips and the three most recent 23andMe chips.
SNP Density Heatmaps
I am going to start out with a number of charts that show the SNP per 1 Million base pair ratio for each chip. I am going to use Kitty Cooper's tools to make them into heatmaps.
The number of SNPs per million base pairs DNA tell us the density of SNP coverage. Density matters, because low density areas are a potential risk for false positives. This is especially true as we move back past 1st and 2nd cousin matches and the DNA shared with them.
Thus, the SNP density is something we should consider when we look at a DNA segment shared with one or more matches.
Chip Comparison Heatmaps
I will move on to the same thing for SNPs in common between any two chips. For those who use GEDMatch and other matching sites, this is important. You are matched on SNPs in common with both chips.
The autosomal chromosomes and the X-Chromosome recombine each generation. Each person gets half of each parent's autosomal DNA. Men get a mixed single copy their mother's X-Chromosomes. Women get an exact copy of their father's X-Chromosome and a mixed single copy of their mother's X-Chromosomes.
Recombination is random, but some places across the autosomal and X-Chromosomes that are more often recombinant points than others. This is why DNA segment length is measured in centiMorgans based on the recombination rates and not direct base pair counts.
Recombination rates can also be population specific. The 1000 Genomes Project has centiMorgan mapping data for twenty of the populations they have sequenced. I plan to create heatmaps for each one.
Allele Frequency Mapping
The SNP markers on the microarray chips that we use almost always have exactly two possible values. The more common one is the major allele and the less common one is the minor allele. It is more likely that you will randomly match I am thinking on how to show this in a chromosome map. It is important for evaluating the quality of a shared DNA segment, because there is a greater chance of randomly matching someone –a false positive– in places where you have mostly or entirely the major allele.
Which value is major and which one is minor can vary by population, so where possible I will split out frequency by 1000 Genomes population group.
SNP No-Call rates
To do everything I would like, I need your help. Some of the information I would like to add requires at least 100 raw data files for each chip. Mapping No-Call rates is one of those things. What is a No-Call? It is a place where the microarray chip fails to return a result for a SNP. Sometimes this is completely random. Sometimes a single SNP marker will have high No-Call rate. Mapping the frequencies of No-Calls may expose interesting patterns. If you are interested in helping, please contact me.
I would love your opinion on resources that interest you. What am I missing? What would you like to see that you have not found at another website?