PCA Ethnic Origins Percentages

PCA Based Ethnic Origins – Part 4, Fixing the visuals

This post continues my series on PCA based ethnic origins, what they are meant for, how they work, and what can make them better. This time I will go over what is wrong with all of the visual presentations, the maps, that the companies use.

I am going to start by using a map from MyHeritage. All of the major companies have the same problem though. Please do not think I am singling out the good people at MyHeritage. <smile>

MyHeritage’s Italian Population Map

It is a beautiful map. However, it does not show the genetic reality. The genetic signature or signatures found in Italy today have had tens of thousands of years to spread. They have been influenced by every wave of migration into Europe starting with the first peoples to travel out-of-Africa long ago.

Yes, there are distinct local signatures, but the individual genetic markers behind them are old.

Microarray chip based tests like they sell at Ancestry, 23andMe, and Living DNA are samples of our genomes tested with known variants. Most of those variants are common to at least 5% of the human population around the world. Some are selected to be between 1% and 5% of the human population. That means they have been around for a long time. The variants on a microarray chip date to the first farmers who spread agriculture 10,000 years ago, to the Out-of-Africa travellers 50,000 to 70,000 years ago, and even back to early stone age ancestors who predated modern humans.

Thus, the genetic reality likely looks much more like this.

The likely spread of ‘Italian’ signature

This brings us back to the populations from Italy in previous posts.

PCA - Italy
PCA Italy

To fully show the distribution of each signature, it helps to split the signatures onto separate maps.

North Italy Signature
North Italy
Central Italy Signature
Central Italy
South Italy Signature
South Italy
Sardinia Signature

Each of the signatures is strongest in its own region. However, they all can be found at lower frequencies in at least one other region. This is something that should be clearly explained.

Currently many of the companies, especially Ancestry, attempt to explain these patterns. However, while plausible, their interpretations have not been checked against genetic models. This unfortunately means these interpretations will likely be proven at least partly wrong in time. Generally, without more robust study, interpretation is an area where less is more.

In my next post, I will look into the misconception that any of the problems and fixes for ethnic percentages are because it is ‘both a science and an art.’ It is science. The issues come from limitations to the science and from misguided representation of the science.

Posts in series

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.