PCA Based Ethnic Origins – Part 1, How it looks

A few days ago, I posted that 23andMe has once again posted an update to their ethnic percentages. It has also been on the news lately that even identical twins can get different results both between companies and at the same company. Thus, I am going to take a look at how my results compare right now at each of the four major companies and at DNA Land.

I have broken the result from each company down into four broad categories: Northern European, Southern European, Western Eurasian, and Other. The population groupings within those varies a good deal between companies. However, one would expect the larger grouping to be consistent.

They are not.

My Southern European percentages range from 6% at Ancestry to 40% at FTDNA. That is not a little bit of a difference based on different population sets. I, and everyone else, has real valid ethnic origins based on inherited DNA. If the methodology is solid, there should not be extreme differences.

So, as a review, I will post my 5 gen pedigree chart.

By strict percentages, I am 25% Northern Italian and 12.5% Sicilian for 37.5% Southern European. Random recombination is expected to pull percentages one way or another. It looks at the regional level though like FTDNA is the only company to come close.

Now, I have a suspicion that that is because most of the companies jiggle percentages to be a bit more Northern European than Southern European. I cannot 100% prove that though.

And still, the breakout of FTDNA results below the regional level is not something that can be used without interpretation.

Southeast Europe 6%
Asia Minor 9%
Sephardic 11%
East Europe 14%

One can interpret… A unified Italy is a very recent thing. It is more important to understand though that, PCA (principal component analysis) with the SNPs on a normal microarray chip are amounts to the use of a method that was meant for something else combined with very old genetic variants.

Microarray chip based tests like they sell at Ancestry, 23andMe, and Living DNA are samples of our genomes tested with known variants. Most of those variants are common to at least 5% of the human population around the world. Some are selected to be between 1% and 5% of the human population. That means they have been around for a long time. The variants on a microarray chip date to the first farmers who spread agriculture 10,000 years ago, to the Out-of-Africa travellers 50,000 to 70,000 years ago, and even back to early stone age ancestors who predated modern humans.

The thing is, different people have their ‘best’ results at randomly any one of the major companies. Thus, we don’t even have one company that is consistently good or getting it right.

I will talk about that some more, but first I need to write about my dog, Bebe le Strange, and how she got better DNA origins results testing than I have.

Posts in series

Categories News

8 thoughts on “PCA Based Ethnic Origins – Part 1, How it looks”

  1. Thank you for that. Yes I notice the differences in mine too. Whoever is programming the algorithms that make these calculations is the problem since this type of effort is very new and still evolving.
    I don’t take any particular result as 100% accurate and they do post updated results from time to time.
    The best thing is that they all generally hit the same areas for me with none posting something the others do not. I am optimistic they will eventually be more consistent.
    Thanks for sharing

  2. Northern Italy is an interesting region which can be regarded both as Southern and Western and Central European. There have been significant German and French speaking minorities, especially in Aosta Valley, Bolzano…

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.