Research Article: Population structure in Argentina

Date Published: May 1, 2018

Publisher: Public Library of Science

Author(s): Marina Muzzio, Josefina M. B. Motti, Paula B. Paz Sepulveda, Muh-ching Yee, Thomas Cooke, María R. Santos, Virginia Ramallo, Emma L. Alfaro, Jose E. Dipierri, Graciela Bailliet, Claudio M. Bravi, Carlos D. Bustamante, Eimear E. Kenny, Taras K Oleksyk.


We analyzed 391 samples from 12 Argentinian populations from the Center-West, East and North-West regions with the Illumina Human Exome Beadchip v1.0 (HumanExome-12v1-A). We did Principal Components analysis to infer patterns of populational divergence and migrations. We identified proportions and patterns of European, African and Native American ancestry and found a correlation between distance to Buenos Aires and proportion of Native American ancestry, where the highest proportion corresponds to the Northernmost populations, which is also the furthest from the Argentinian capital. Most of the European sources are from a South European origin, matching historical records, and we see two different Native American components, one that spreads all over Argentina and another specifically Andean. The highest percentages of African ancestry were in the Center West of Argentina, where the old trade routes took the slaves from Buenos Aires to Chile and Peru. Subcontinentaly, sources of this African component are represented by both West Africa and groups influenced by the Bantu expansion, the second slightly higher than the first, unlike North America and the Caribbean, where the main source is West Africa. This is reasonable, considering that a large proportion of the ships arriving at the Southern Hemisphere came from Mozambique, Loango and Angola.

Partial Text

One of the most important applications of admixture research is to reduce bias of association studies, since it has long been known that the underlying genetic structure can produce a high percentage of false positive results due to differences in the genetic composition of cases and controls [1–6]. This bias occurs when the frequency of the case disease varies between populations, so that the probability of selecting affected individuals from specific subpopulations grows; thus, any allele with a higher frequency in the over represented population will show an association with the phenotype [7–9].

On Fig 2 we represent the Principal Component Analysis (PCA) results (Fig 2). First, we used the YRI and LWK for African references, the IBS and TSI for European references and the PEL as a Latin American reference that has samples with high Native American ancestry (Fig 2A). The x axis, PC1, separates African samples from the rest while the y axis, PC2 shows Europe and the samples with highest Native American ancestry in each extreme, with the remaining admixed samples distributed along. Then, we removed the African, the TSI and all admixed individuals that had over 3% African ancestry and added the MXL and CLM panels that fit this criterion, so that we could have a better understanding of differences among the Native American component of these populations (Fig 2B). PC1 (x-axis) shows us the spectrum of admixture between full European ancestry near -0.6 while near 0.8 we have the Native American extreme. PC2 (y axis) shows that the samples with a high Native American ancestry from the North West segregate from MXL and overlap with PEL while those from the North East do not group with PEL and overlap with the MXL. Argentinians and Colombians overlapping with Mexicans is due to the low resolution of our SNP set; CLM and MXL segregate from each other in PC space when denser sets are assayed, as shown in by Gravel et al. [28].