Thursday, September 4, 2014
This is an update of a supervised ADMIXTURE analysis that I ran earlier this year looking at ANE levels throughout Asia, the results of which I posted at my other blog (see here). Anyone wanna make a map?
here). That's not to say that people like Iosif Lazaridis, Nick Patterson and David Reich don't know what they're doing. Clearly they do, but at the fine-scale there's usually room for improvement no matter who you are.
For instance, in their paper in table S14.9 they list the Basques (in fact, French Basques) as 11.4% ANE, which sounds reasonable, although perhaps a little too high considering they admit that this population can be modeled as 0% ANE. On the other hand, they estimate the "North Spanish" to be 16.3% ANE.
Now, this reference set is actually from the 1000 Genomes project, where it's listed as Spaniards from Pais Vasco (ie. Basque Country). Essentially, what this means is that these are Basques from Spain. So why would Basques from France carry only 11.4% ANE, and Basques from Spain a whopping 16.3%? Not only that, but according to Lazaridis et al., these "North Spanish" also can be modeled as 0% ANE.
Obviously, something's not quite right there. Indeed, in my spreadsheet, the very same French Basques are listed as 7.4% ANE, while the Pais Vasco Spaniards as just over 8%. Call me crazy, and many do, but I think these results actually make good sense.
By the way, I made ten synthetic samples from the ANE allele frequencies from this test, and remarkably, in all of the analyses I've ran so far they behaved very much like MA-1 or Mal'ta boy, the main ANE proxy. Below, for example, is a Principal Component Analysis (PCA) of West Eurasia featuring these individuals. The result is very similar to those I obtained with Mal'ta boy (see here and here).
The synthetic ANE samples are available here. Feel free to play around with them, and if you do, please let me know what you discover.
As some regular visitors already know, I'm currently designing a new test for GEDmatch that will include various ancient components like ANE. Unfortunately, it might be a while before it's ready, simply because I want it to be as accurate as possible.
Eurogenes ANE K7
Corded Ware Culture linked to the spread of ANE across Europe
Wednesday, August 13, 2014
I'd say this open access paper at Science Direct is the most detailed work on European stature ever. The conclusion is that male height in Europe is mostly determined by nutrition and genetics, which isn't really earth shattering. But the authors also point out that Y-chromosome haplogroup I-M170 shows a strong correlation with the highest average stature on the continent, and speculate that the link between the two might be Upper Paleolithic hunter-gatherer ancestry:
The average height of 45 national samples used in our study was 178.3 cm (median 178.5 cm). The average of 42 European countries was 178.3 cm (median 178.4 cm). When weighted by population size, the average height of a young European male can be estimated at 177.6 cm. The geographical comparison of European samples (Fig. 1) shows that above average stature (178+ cm) is typical for Northern/Central Europe and the Western Balkans (the area of the Dinaric Alps). This agrees with observations of 20th century anthropologists (Coon, 1939; Lundman 1977). At present, the tallest nation in Europe (and also in the world) are the Dutch (average male height 183.8 cm), followed by Montenegrins (183.2 cm) and possibly Bosnians (182.5 cm) (Table 1). In contrast with these high values, the shortest men in Europe can be found in Turkey (173.6 cm), Portugal (173.9 cm), Cyprus (174.6 cm) and in economically underdeveloped nations of the Balkans and former Soviet Union (mainly Albania, Moldova, and the Caucasian republics).
The trend of increasing height has already stopped in Norway, Denmark, the Netherlands, Slovakia and Germany. In Norway, military statistics date its cessation to late 1980s.
In contrast, the fastest pace of the height increase (≥1 cm/decade) can be observed in Ireland, Portugal, Spain, Latvia, Belarus, Poland, Bosnia and Herzegovina, Croatia, Greece, Turkey and at least in the southern parts of Italy.
Although the documented differences in male stature in European nations can largely be explained by nutrition and other exogenous factors, it is remarkable that the picture in Fig. 1 strikingly resembles the distribution of Y haplogroup I-M170 (Fig. 10a). Apart from a regional anomaly in Sardinia (sub-branch I2a1a-M26), this male genetic lineage has two frequency peaks, from which one is located in Scandinavia and northern Germany (I1-M253 and I2a2-M436), and the second one in the Dinaric Alps in Bosnia and Herzegovina (I2a1b-M423)16. In other words, these are exactly the regions that are characterized by unusual tallness. The correlation between the frequency of I-M170 and male height in 43 European countries (including USA) is indeed highly statistically significant (r = 0.65; p < 0.001) (Fig. 11a, Table 4). Furthermore, frequencies of Paleolithic Y haplogroups in Northeastern Europe are improbably low, being distorted by the genetic drift of N1c-M46, a paternal marker of Ugrofinian hunter-gatherers. After the exclusion of N1c-M46 from the genetic profile of the Baltic states and Finland, the r-value would further slightly rise to 0.67 (p < 0.001). These relationships strongly suggest that extraordinary predispositions for tallness were already present in the Upper Paleolithic groups that had once brought this lineage from the Near East to Europe.
Grasgruber et al., The role of nutrition and genetics as key determinants of the positive height trend, Economics & Human Biology, available online 7 August 2014, DOI: 10.1016/j.ehb.2014.07.002
Tuesday, July 29, 2014
Apparently, this 15,000 year-old genome from Central Siberia is heavily contaminated with modern DNA (see section SI 5.2.3. in Raghavan et al. 2013). However, apart from MA-1, it's the only Ancient North Eurasian (ANE) sample available right now, so I thought I'd take a closer look at it.
The shared drift statistics using f3(Mbuti;AG-2,Test) do suggest contamination from a present-day Eastern European source, with, for instance, Ukrainians from Lviv showing an unexpectedly strong signal (third on the list below just behind Pima Indians). This makes sense since AG-2 was probably mainly handled by Slavic-speaking Soviet archaeologists and museum staff.
Eurogenes K15 results for AG-24 Ancestors Oracle results based on the K15 ancestry proportions suggest that AG-2 might simply be a more westerly ANE sample than MA-1, perhaps with some European forager ancestry. Below are a few examples of the best population approximations; note the strong showing by StoraFörvar11, a Mesolithic genome from near Gotland, Sweden. The full list can be seen here.
1 Brahmin_UP+North_Amerindian+StoraFörvar11+StoraFörvar11 @ 8.364493However, I was only able to use around 13K SNPs that overlapped with my dataset for all of the tests here. So perhaps these markers were much less affected by contamination than the rest? In any case, here are three Principal Component Analyses (PCA) to finish things off. Again, AG-2 basically looks like the genome of a late ANE survivor with a solid contribution from indigenous European foragers. Hopefully this can be confirmed or debunked in the near future with a much higher quality sequence of its genome.
2 Burusho+North_Amerindian+StoraFörvar11+StoraFörvar11 @ 8.411899
3 MA-1+MA-1+StoraFörvar11+Tatar @ 8.427561
4 Kshatriya+North_Amerindian+StoraFörvar11+StoraFörvar11 @ 8.437549
5 Gujarati+North_Amerindian+StoraFörvar11+StoraFörvar11 @ 8.45127
Update 20/08/2014: In the above analysis I used variants from the 1stextraction AG-2 bam file. To try and get more markers I have now also processed the apparently lower quality supernatant bam. Merging the two files has given me just over 30K SNPs to play with, and I think the extra markers have made a positive difference. Below are the updated results, which I'd say appear more accurate because they're much more similar to those of MA-1 (see here and here).
Revised Eurogenes K15 results for AG-2
PCA based on the new set of markers look almost identical to the PCA above, so I won't bother posting them. By the way, I updated the Eurogenes ancient genomes datasheet with the revised AG-2 K15 results (see here).
Analysis of Mesolithic Swedish forager StoraFörvar11
Wednesday, March 26, 2014
There's been a lot of horseshit published over the years about Y-chromosome haplogroup R1a, which just happens to be my haplogroup. That includes academic papers in journals like PLoS ONE and Nature. My advice is, take all of that stuff with a very large pinch of salt and just look here for updates.
Indeed, a new paper on the phylogeography of R1a appeared at the Nature website today: Underhill et al. 2014. It's actually a much better effort than anything else on the topic at academic level thus far, but certainly not without issues.
For instance, the authors failed to include two well known and very important R1a subclades in their analysis: the Northwest European-specific R1a-CTS4385 and the East and Central European-specific R1a-Z280. As a result, the former is lumped with R1a-M417* and the latter with R1a-Z282*. In fact, Z280 is shown to be above Z282 in the topology of R1a-M420 (see Figure 1 here), which is plain wrong. These are major oversights and mean that this study is not a very useful resource as far as the phylogeography of European R1a is concerned.
But the paper does show a couple of interesting things. For instance, the maps below offer the best illustration to date of the dichotomy between the European-specific R1a-Z282 and Asian-specific R1a-Z93.
However, these are very closely related subclades, sharing the Z645 mutation (unfortunately not mentioned in the paper), and both reaching high frequencies among Indo-European speakers. It's therefore plausible that groups carrying these markers expanded to the west and east from a zone between their current hotspots, possibly the Volga-Ural region, rather recently.
Indeed, these migrations had to have happened after 4800-6800 YBP, which is the age of R1a-M417 reported by Underhill et al., and backed up by estimates from genetic genealogists using, among other things, complete R1a sequences (see here). In other words, the rapid expansions of R1a-Z282 and R1a-Z93 appear to have taken place from more or less the same region during the generally accepted early Indo-European timeframe, making them excellent candidates for paternal markers of the early Indo-European dispersals.
At the same time, the paucity of R1a-Z93 and derived lineages in Europe, including Eastern Europe, suggests that historic migrations originating in East and Central Asia, like those of the early Turks, had a negligible effect on the paternal ancestry of modern Europeans. This shows very clearly on the PCA in Figure 4 (see here).
Underhill et al., The phylogenetic and geographic structure of Y-chromosome haplogroup R1a, European Journal of Human Genetics, advance online publication, 26 March 2014; doi:10.1038/ejhg.2014.50
R1a-Z93 from Bronze Age Mongolia
Afghan Hindu Kush: a genetic sink
Saturday, March 15, 2014
The recent Wilde et al. paper on the ancient DNA of Eastern European steppe nomads included mitochondrial DNA (mtDNA) data for just over 60 of the studied individuals. Below is a Principal Component Analysis (PCA) featuring these samples, marked collectively as KGU, alongside the dataset from last year's Brandt et al. study on the genetic origins of Central Europeans.
Note that KGU falls closest to the Bernburg (BEC) and Unetice (UC) samples from Neolithic and Bronze Age eastern Germany, respectively. This is probably because all of these groups have similar levels of mtDNA haplogroups U5a and H. Moreover, UC is thought to be an Indo-European archaeological culture with origins in Eastern Europe. On the other hand, Brandt et al. hypothesized that BEC might have been of Scandinavian origin.
The Central European metapopulation (CEM) is composed of present-day individuals from Austria, Germany, Poland and the Czech Republic. Its position on the PCA plot suggests to me that modern Central Europeans are largely derived of Kurgan nomads, Bell Beakers from Iberia (BBC), and remnants of Neolithic farmers from the Near East, at least in terms of maternal ancestry.
In other words, I'd say the result correlates well with the findings of Brandt et al., who posited that long-range migrations from eastern and western Europe into the heart of the continent, particularly during the late Neolithic, played an important role in the formation of the modern Central European mtDNA gene pool.
Citations and credits...
Thanks to Eurogenes Project member PL16 for the PCA
Wilde et al., Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 y, PNAS, Published online before print on March 10, 2014, DO:I10.1073/pnas.1316513111
Guido Brandt, Wolfgang Haak et al., Ancient DNA Reveals Key Stages in the Formation of Central European Mitochondrial Genetic Diversity, Science 11 October 2013: Vol. 342 no. 6155 pp. 257-261 DOI: 10.1126/science.1241844
Extreme positive selection for light skin, hair and eyes on the Pontic-Caspian steppe...or not
Sunday, February 23, 2014
The Estonian Biocentre has a new genotype dataset available from the recently released "Khazar" preprint (see here). The samples include Poles from Estonia, so I ran a PCA to see whether there was a clear difference between them and their ethnic kin from Poland in terms of genome-wide genetic structure. This doesn't appear to be the case, except for a few individuals who probably have significant Estonian and/or northwest Russian ancestry (the several northernmost and easternmost Polish_Estonian samples on the plots below). It's an interesting result, considering that, as far as I know, most Estonian Poles are not of recent Polish origin, but have roots in the East Baltic dating back to the Polish-ruled Duchy of Livonia of the 1600s. Please note, the plots were rotated and stretched horizontally to fit with geography.
Behar, Doron M.; Metspalu, Mait; Baran, Yael; Kopelman, Naama M.; Yunusbayev, Bayazit; Gladstein, Ariella; Tzur, Shay; Sahakyan, Havhannes; Bahmanimehr, Ardeshir; Yepiskoposyan, Levon; Tambets, Kristiina; Khusnutdinova, Elza K.; Kusniarevich, Aljona; Balanovsky, Oleg; Balanovsky, Elena; Kovacevic, Lejla; Marjanovic, Damir; Mihailov, Evelin; Kouvatsi, Anastasia; Traintaphyllidis, Costas; King, Roy J.; Semino, Ornella; Torroni, Anotonio; Hammer, Michael F.; Metspalu, Ene; Skorecki, Karl; Rosset, Saharon; Halperin, Eran; Villems, Richard; and Rosenberg, Noah A., No Evidence from Genome-Wide Data of a Khazar Origin for the Ashkenazi Jews (2013). Human Biology Open Access Pre-Prints. Paper 41.
Friday, February 14, 2014
We've turned out French-like. Sacrebleu!
Source: A genetic atlas of human admixture history
Mapping the history of human admixture (paper + website)