PCA Based Ethnic Origins – Part 2, What it was meant for

Principal Component Analysis (PCA) is at times used as a population analysis method. It was designed to show the computed similarities and especially differences between and within data sets. The data sets could come from anything such as shopping patterns, books read, or favorite TV shows. The key is that some groups will be more alike and others less alike.

In population genetics, this usually means the relationships between populations. Here is an example using populations from Italy.

PCA – Italy

The population has been divided into five groups. They are North Italy, Central Italy, South Italy, Sardinia, and Undefined. In the upper left is a map showing how the plot relates to the chart. On the chart, each sample is shown as a dot. The color of the dots match with the source of the sample.

That means that samples marked South Italy come from people living in Italy. Researchers usually work to get high quality samples. Thus, they ask questions about where each of the participants’ grandparents were born.

The method and the chart show that people from Central Italy are in general half way between people from North Italy and people from South Italy. Meanwhile, people from Sardinia are significantly different from the other three groups.

With the PCA chart, one can quickly see general relationships between the groups. This makes PCA a good method for understanding populations.

Tomorrow, I will write about the reverse use. That is, using a PCA chart to find an individual’s origins.

Posts in series

Sources & Resources

Categories News

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

you're currently offline