This post continues my series on PCA based ethnic origins, what they are meant for, how they work, and what can make them better. This time I will go over what is wrong with all of the visual presentations, the maps, that the companies use.
I am going to start by using a map from MyHeritage. All of the major companies have the same problem though. Please do not think I am singling out the good people at MyHeritage. <smile>
It is a beautiful map. However, it does not show the genetic reality. The genetic signature or signatures found in Italy today have had tens of thousands of years to spread. They have been influenced by every wave of migration into Europe starting with the first peoples to travel out-of-Africa long ago.
Yes, there are distinct local signatures, but the individual genetic markers behind them are old.
Microarray chip based tests like they sell at Ancestry, 23andMe, and Living DNA are samples of our genomes tested with known variants. Most of those variants are common to at least 5% of the human population around the world. Some are selected to be between 1% and 5% of the human population. That means they have been around for a long time. The variants on a microarray chip date to the first farmers who spread agriculture 10,000 years ago, to the Out-of-Africa travellers 50,000 to 70,000 years ago, and even back to early stone age ancestors who predated modern humans.
Thus, the genetic reality likely looks much more like this.
This brings us back to the populations from Italy in previous posts.
To fully show the distribution of each signature, it helps to split the signatures onto separate maps.
Each of the signatures is strongest in its own region. However, they all can be found at lower frequencies in at least one other region. This is something that should be clearly explained.
Currently many of the companies, especially Ancestry, attempt to explain these patterns. However, while plausible, their interpretations have not been checked against genetic models. This unfortunately means these interpretations will likely be proven at least partly wrong in time. Generally, without more robust study, interpretation is an area where less is more.
In my next post, I will look into the misconception that any of the problems and fixes for ethnic percentages are because it is ‘both a science and an art.’ It is science. The issues come from limitations to the science and from misguided representation of the science.
Posts in series
- PCA Based Ethnic Origins – Part 1, How it looks
- PCA Based Ethnic Origins – Part 2, What it was meant for
- PCA Based Ethnic Origins – Part 3, How it is used
- PCA Based Ethnic Origins – Part 4, Fixing the visuals
- PCA Based Ethnic Origins – Part 5, Science & not art
- PCA Based Ethnic Origins – Part 6, Genealogical markers
- PCA Based Ethnic Origins – Part 7, Autosomal SNP Genetics 101