Principal Component Analysis (PCA) is a population clustering method. It was designed to show the computed similarities and differences between data sets. The data sets could come from anything such as shopping patterns, books read, or favorite TV shows. The key is that some groups will be more alike and others less alike.
In population genetics, this usually means the relationships between populations. Here is an example using populations from Italy.
The population has been divided into five groups. They are North Italy, Central Italy, South Italy, Sardinia, and Undefined. In the upper left is a map showing how the plot relates to the chart. On the chart, each sample is shown as a dot. The color of the dots match with the source of the sample.
That means that samples marked South Italy come from people living in Italy. Researchers usually work to get high quality samples. Thus, they ask questions about where each of the participants’ grandparents were born.
The method and the chart show that people from Central Italy are in general half way between people from North Italy and people from South Italy. Meanwhile, people from Sardinia are significantly different from the other three groups.
With the PCA chart, one can quickly see general relationships between the groups. This makes PCA a good method for understanding populations.
Tomorrow, I will write about the reverse use. That is, using a PCA chart to find an individual’s origins.
Posts in series
- PCA Based Ethnic Origins – Part 1, How it looks
- PCA Based Ethnic Origins – Part 2, What it was meant for
- PCA Based Ethnic Origins – Part 3, How it is used
- PCA Based Ethnic Origins – Part 4, Fixing the visuals
- PCA Based Ethnic Origins – Part 5, Science & not art
- PCA Based Ethnic Origins – Part 6, Genealogical markers
- PCA Based Ethnic Origins – Part 7, Autosomal SNP Genetics 101