search this blog

Wednesday, October 31, 2012

First official attempt to divide R1a1 into multiple subclades since the discovery of R-M458

Unfortunately, this paper has already become outdated since being submitted for peer review at the AJPA, largely thanks to work by R1a hobbyists (see here). For instance, the authors claim that the overlap zone between R-Z280 and R-Z93 is Inner and Central Asia. In fact, these two subclades overlap in Europe, which is where most of the ancestral R-Z93 lineages have been located to date. Hopefully a major paper on R1a is on the way that will clear this up at academic level, because it's a strong hint that R-Z93 might have expanded deep into Asia from Europe.

Since the discovery of R1a1-M458, this is the first scientific attempt to divide haplogroup R1a1-M198 into multiple SNP-based sub-haplogroups. We have genotyped 217 R1a1-M198 samples from seven different population groups at M458, as well as the Z280 and Z93 SNPs recently identified from the “1000 Genomes Project”.

The two additional binary markers present an effective tool because now more than 98% of the samples analyzed assign to one of the three sub-haplogroups. R1a1-M458 and R1a1-Z280 were typical for the Hungarian population groups, whereas R1a1-Z93 was typical for Malaysian Indians and the Hungarian Roma. Inner and Central Asia is an overlap zone for the R1a1-Z280 and R1a1-Z93 lineages. This pattern implies that an early differentiation zone of R1a1-M198 conceivably occurred somewhere within the Eurasian Steppes or the Middle East and Caucasus region as they lie between South Asia and Eastern Europe. The detection of the Z93 paternal genetic imprint in the Hungarian Roma gene pool is consistent with South Asian ancestry and amends the view that H1a-M82 is their only discernible paternal lineage of Indian heritage.


Previous publications have pointed out that regions of highest haplogroup frequencies do not always indicate the territory of origin (Cinnioglu et al., 2004) and high STR diversity may not be exclusively an indicator of in-situ diversification but could also be the consequence of repeated gene flow from different sources (Zerjal et al., 2002; Sharma et al., 2009).

Pamjav, H., Fehér, T., Németh, E. and Pádár, Z. (2012), Brief communication: New Y-chromosome binary markers improve phylogenetic resolution within haplogroup R1a1. Am. J. Phys. Anthropol.. doi: 10.1002/ajpa.22167

Thursday, July 19, 2012

Long IBD gives clues to migrations across Europe from the Iron Age to the present - take 2

That Ralph and Coop study on intra-European IBD sharing I blogged about recently (see here) is now out as a preprint. It's a really nice read for those interested in European population genetics, and shows things from a perspective rarely offered in other studies. Here are a few quotes that caught my eye:

Work from uniparentally inherited markers (mtDNA and Y chromosomes) has improved our understanding of human demographic history (e.g. Soares et al., 2010). However, interpretation of these markers is difficult since they only record a single lineage of each individual (the maternal and paternal lineages, respectively), rather than the entire distribution of ancestors. Genome-wide genotyping and sequencing datasets have the potential to provide a much richer picture of human history, as we can learn simultaneously about the diversity of ancestors that contributed to each individual’s genome.


In this paper, we analyze those rare long chunks of genome that are shared between pairs of individuals due to inheritance from recent common ancestors, to obtain a detailed view of the geographic structure of recent relatedness. To determine the time scale of these relationships, we develop methodology that uses the lengths of shared genomic segments to infer the distribution of the ages of these recent common ancestors. We find that even geographically distant Europeans share ubiquitous common ancestry even within the past 1,000 years, and show that common ancestry from the past 3,000 years is a result of both local migration and large-scale historical events.


In contrast, within samples from the UK and nearby regions we see negative correlation between numbers of blocks shared with Irish and numbers of blocks shared with Germans. From our data, we do not know if this substructure is also geographically arranged within the UK. However, an obvious explanation of this pattern is that individuals within the UK di er in the extent of their recent Irish ancestry, and that individuals with less Irish ancestry have a larger portion of their recent ancestry shared with Germans. This suggests that there is variation across the UK { perhaps a geographic gradient { in terms of the amount of Celtic versus Germanic ancestry.

Individuals usually share the highest number of IBD blocks with others from the same population, but with some exceptions. For example, individuals in the UK share more IBD blocks on average, and hence more close genetic ancestors, with individuals from Ireland than with other individuals from the UK, and Germans share similarly more with Polish than with other Germans. In figure 3 we depict the geography of rates of IBD sharing between populations, i.e. the average number of IBD blocks shared by a randomly chosen pair of individuals. Above, maps show the IBD rate relative to certain chosen populations (maps, above), and below, all pairwise sharing rates are plotted against the geographic distance separating the populations. It is evident that geographic proximity is a major determinant of IBD sharing (and hence recent relatedness), with the rate of pairwise IBD decreasing relatively smoothly as the geographic separation of the pair of populations increases.


The fact that most people alive today in Europe share nearly the same set of (European, and possibly world-wide) ancestors from only 1,000 years ago seems to contradict the signals of long term, albeit subtle, population genetic structure within Europe (e.g. Novembre et al., 2008; Lao et al., 2008). These two facts can be reconciled by the fact that even though the distribution of ancestors (as cartooned in Figure 1B) has spread to cover the continent, there remain differences in degree of relatedness of modern individuals to these ancestral individuals. For example, someone in Spain may be related to an ancestor in the Iberian peninsula through perhaps 1000 different routes back through the pedigree, but to an ancestor in the Baltic region by only 10 different routes, so that the probability that this Spanish individual inherited genetic material from the Iberian ancestor is 100 times higher. This allows the amount of genetic material shared by pairs of extant individuals to vary even if the set of ancestors is constant.


One of the striking patterns we see is the relatively high level of sharing of IBD between pairs of individuals across eastern Europe, as high or higher than that observed within other, much smaller populations. Furthermore, the numbers of short (older) IBD blocks shared between different populations is constant regardless of the geographic distance separating the two, as shown in figure 3. This is consistent with these individuals having a comparatively large proportion of ancestry drawn from a relatively small population that expanded over a large geographic area, ancestry which we date to 1,000-2,000 years ago (see figures 4, 5, and S8).


This evidence is consistent with the idea that these populations derive a substantial proportion of their ancestry from various groups that expanded during the "migration period" from the fourth through ninth centuries (Davies, 2010). This period begins with the Huns moving into eastern Europe towards the end of the fourth century, establishing an empire including modern-day Hungary and Romania; and continues in the fifth century as various Germanic groups moved into and ruled much of the western Roman empire. The Slavic populations expanded beginning in the sixth century, probably from somewhere in the area between the Baltic, Black, and Adriatic seas (Barford, 2001).

The only point I'll argue about with the authors is their suggestion that the high IBD sharing across Central and Eastern Europe might be of Hunnic origin. The Huns probably facilitated migrations across Europe, via their military and political activities, but I doubt they inundated Europe with their own IBD.

The eastern IBD most likely derives from near the Baltic, because its spread shows a very strong correlation with the "North European" autosomal component seen in my project and elsewhere. This component reaches very high frequencies in Balto-Slavs, including individuals from eastern Poland, Ukraine and Belarus, where most scholars put the Slavic homeland.

It also matches closely the geographic peaks of two Y-DNA haplogroups commonly found in Slavs today - R1a-Z283 and I2. So I think it's pretty clear that it's mainly a signal of the early Slavic dispersals.


Peter Ralph, Graham Coop, The geography of recent genetic ancestry across Europe, Populations and Evolution, arXiv:1207.3815v2 [q-bio.PE]

Monday, June 25, 2012

Long IBD gives clues to migrations across Europe from the Iron Age to the present (aka. SMBE 2012 abstracts)

The Society for Molecular Biology and Evolution (SMBE) is holding its annual conference this week, and has released a PDF of abstracts of the presentations at the meeting. Most of these presentations are yet to be published as articles in journals, but after a bit of Googling, I think I located one of them online. Luckily, it just happens to be the one I’m most interested in…

Long IBD in Europeans and recent population history

Peter Ralph, Graham Coop
UC Davis, Davis, CA, USA

Numbers of common ancestors shared at various points in time across populations can tell us about recent demography, migration, and population movements. These rates of shared ancestry over tens of generations can be inferred from genomic data, thereby dramatically increasing our ability to infer population history much more recent than was previously possible with population genetic techniques. We have analyzed patterns of IBD in a dataset of thousands of Europeans from across the continent, which provide a window into recent European geographic structure and migration.

Unfortunately, the link doesn’t include much data, but has lots of impressive graphics. I’ve put together a small selection of these, focusing on…surprise, surprise…Poland. Basically, the larger the circle, the more Identity-by-descent (IBD) shared:

I think t’s very clear from the results that the Polish sample shares a lot of fairly recent IBD with many groups from across Europe, and especially those from north and east of the Alps. Most of these segments were certainly spread by various Indo-European groups, including the Slavs.

The authors have attempted to estimate the ages of the admixtures, and divided the results into three periods. The outcomes for Poland appear very accurate based on what we know from history and archaeology, although keep in mind that East Slavic individuals are missing from this part of the analysis. I’ve also included the graphics for Italy (IT) and Iberia (Iber), for comparison. The results for these two Southern European regions look much more conservative, and I suspect that’s due to their larger effective population sizes, plus the Alps and Pyrenees acting as strong barriers to gene flow from the north.

At the 0-540 ya period, Poles don’t share much with anyone except with each other and Germans. This makes sense, considering, for instance, the heavy migration of Poles from regions under Prussian occupation to the German industrial areas of the Ruhr and Saxony. These people were quickly Germanised and absorbed by the locals. Today, only their Polish sounding names and diluted genes remain.

I think the 555-1500 ya graphic very clearly shows the effects of the Slavic expansion, probably at least partly from the territory of modern Poland. I suspect the same expansion can also be seen on the 1515-2353 ya graphic. But here we can also likely see the effects of several other major population movements, including migrations of the Celts and Germanics. In any case, looking at all those large “Slavic” bubbles in the Balkans, I’m reminded of this quote from Procopius.

Illyria and all of Thrace, that is, from the Ionian Gulf to the suburbs of Constantinople, including Greece and the Chersonese, were overrun by the Huns, Slavs and Antes, almost every year, from the time when Justinian took over the Roman Empire; and intolerable things they did to the inhabitants. For in each of these incursions, I should say, more than two hundred thousand Romans were slain or enslaved, so that all this country became a desert like that of Scythia.

Eventually, the Slavs stopped raiding the Balkans and settled there permanently. Many became subjects of the Roman Empire.

It’d be fascinating if an IBD analysis like this was carried out on an expanded dataset, including many more samples from Northern and Eastern Europe, as well as West and Central Asia. We know there were movements of people from Europe deep into Asia during the metal ages, and learning more about these events could help us unravel the origins of such enigmatic groups as the early Indo-Europeans.

Actually, there’s another abstract in that SMBE selection, and this one is dealing with Identical by State (IBS) tracts in Europeans. It claims there’s been” no significant gene flow between Europeans and Asians within the past few hundred generations”. That sounds like a reasonable statement, but only in the context of the 1000 Genomes samples these scientists compared, which I assume included Europeans vs. South and East Asians. So like I say, what we really need is a study of IBD or IBS, or both, that looks at a wider variety of groups from West and Central Asia, because that’s where most of the relatively recent mixing took place.

Reconstructing demographic histories from long tracts of DNA sequence identity

Kelley Harris1, Rasmus Nielsen1,2

1UC Berkeley, Berkeley, CA, USA, 2University of Copenhagen, Copenhagen, Denmark

There has been recent excitement and debate about the details of human demographic history, involving gene flow that has occurred between populations as well as the extent and timing of bottlenecks and periods of population growth. Much of the debate concerns the timing of past admixture events; for example, whether Neanderthals exchanged genetic material with the ancestors of non-Africans before before or after they left Africa. Here, we present a method for using sequence data to jointly estimate the timing and magnitude of past genetic exchanges, along with population divergence times and changes in effective population size. To achieve this, we look at the length distribution of regions that are shared identical by state (IBS) and maximize an analytic composite likelihood that we derive from the sequentially Markov coalescent (SMC). Recent gene flow between populations leaves behind long tracts of identity by descent (IBD), and these tracts give our method its power by influencing the distribution of shared IBS tracts. However, since IBS tracts are directly observable, we do not need to infer the precise locations of IBD tracts. In this way, we can accurately estimate admixture times for relatively ancient events where admixture mapping is not possible, and in simulated data we show excellent power to characterize admixture pulses that occurred 100 to several hundred generations ago. When we study the IBS tracts shared between and within the populations sequenced by the 1000 Genomes consortium, we find evidence that there was no significant gene flow between Europeans and Asians within the past few hundred generations. It also looks unlikely that the Yorubans of Nigeria interbred with Europeans or Asians in a population-specific way, though there may have been admixture between Africans and an ancestral non-African population.

See also...

Long IBD gives clues to migrations across Europe from the Iron Age to the present - take 2

Wednesday, June 20, 2012

First direct evidence of genetic continuity in West and Central Poland from the Iron Age to the present

I've just been sent a fascinating thesis on the mtDNA of Iron Age and Medieval samples from Poland. It suggests direct genetic continuity between Iron Age samples belonging to the Przeworsk and Wielbark Cultures, of what is now West and Central Poland, and present-day Poles. Here's the English summary, and a map of the sites under study:

For many years the origin of the Slavs has been the subject-matter in archaeology, anthropology, history, linguistics and recently also modern human population genetics. By now there is no unambiguous answer to a question where, when and in what way the Slavs originated. For the purposes of this dissertation, the analysis of ancient human mitochondrial DNA was applied. The ancient DNA was isolated from 72 specimens which came from Iron-Age and medieval graveyards from the area of current Poland. Ancient mtDNA was extracted from two teeth from each individual and reproducible sequence results were obtained for 20 medieval and 23 Iron-Age specimens. On the basis of HVR I mtDNA mutation motifs and coding region SNPs each specimen was assigned to a mitochondrial haplogroup. The obtained results were used together with other ancient and modern populations to analyse shared haplotypes and population genetic distances illustrated by multidimentional scaling plots (MDS). The differences on genetic level and quite high genetic distances (FST) between medieval and Iron-Age populations as well as significant number of shared informative haplotypes with Belarus, Ukraine and Bulgaria may evidence genetic discontinuity between medieval and Iron Ages. From the other side, the highest number of shared informative haplotypes between Iron-Age and extant Polish population as well as the presence of subhaplogroup N1a1a2, can confirm that some genetic lines show continuity at least from Iron Age or even Neolithic in the areas of present day Poland. The results obtained in this work are considered to be the first ancient contribution in genetic history of the Slavs.

Below is an MDS from the thesis, based on data corrected for the effects of potential relatives in the Iron Age sample. I don't think it's a particularly useful way of judging the intra-European affinity of the two ancient Polish groups, mostly because the samples are small, and contemporary North, Central and East Europeans don't differ very much in terms of mtDNA. Nevertheless, we can see that both the Iron Age (Okres Rzymski) and Medieval (Sredniowiecze) samples fall within the range of modern European mtDNA diversity. On the other hand, the German Neolithic LBK sample (Neolit LBK Niemcy) clearly does not, because it's sitting at the far right of the plot, away from the main European cluster. This dichotomy between the genetic structure of the LBK farmers and modern Europeans has been demonstrated in previous studies, but the reasons for it are still a mystery.

Interestingly, modern Poles are closer to an Iron Age sample from Denmark (Okres Zelaza Dania) than to the Polish Iron Age set. However, as per the summary above, the author also compared the frequencies of the most informative haplotypes among the modern and ancient samples, and found that extant Poles are the closest group to the Polish Iron Age remains, followed by Balts, Swedes and Baltic Finns. Below is a table showing those results.

According to the author, these matches might hint at Baltic, Germanic and Finno-Ugric influences in the Polish Iron Age population. Perhaps, but in my opinion, they're simply in line with geography, and reflect the general North European character of maternal lineages shared by populations from around the Baltic, both today and during the Iron Age.

The results for the Medieval Polish sample are more intriguing, because they're somewhat out of whack with geography. Its best matching modern groups are Belorussians, Ukrainians and Bulgarians. This might suggest that, during the early middle ages, the territory of present day Poland experienced an influx of groups from what are now Belarus and Ukraine, who then melted into the gene pool of the natives of Polish Iron Age descent. However, conversely, it might mean that Belorussians, Ukrainians and Bulgarians descend in large part from fairly specific medieval groups from the area of modern Poland.

In any case, whether present day Polish territory saw some migrations from the immediate east during the Medieval period or not, this preliminary look at ancient Polish mtDNA suggests long-standing genetic continuity in the region. What it clearly doesn't show is a complete, or almost complete, population replacement in the areas between the Oder and Bug rivers during the migration period.

Indeed, the thesis results put into doubt past notions that the Przeworsk and Wielbark cultures were of Germanic origin.

The (mtDNA) haplogroup missing from both the Iron Age and medieval samples from the territory of modern Poland was haplogroup I. In contemporary Slavic populations, this haplogroup is found at levels ranging from 1.2% in Bulgarians to 4.8% in Slovaks. It was also recorded at high levels in ancient remains from Denmark. It showed a frequency of 12.5% in an Iron Age sample, and 13.8% in a medieval sample. Melchior et al. 2008 suggest that haplogroup I might have been more common in Denmark and Northern Europe during that period. Therefore, the lack of this haplogroup in ancient DNA from the territory of modern Poland, might mean that the Przeworsk and Wielbark cultures should not be identified with Germanic populations.

I'm sure more ancient DNA studies are on the way looking at the origins of Slavs and Poles. Indeed, if the Y-chromosomes of Przeworsk and Wielbark remains are successfully tested, I won't be surprised if they look fairly typical of modern Poles, with a decent representation of R1a1a-M458, which is the most common Y-chromosome haplogroup in Poland today.

Anna Juras, Etnogeneza Słowian w świetle badań kopalnego DNA, Praca doktorska wykonana w Zakładzie Biologii Ewolucyjnej Człowieka Instytutu Antropologii UAM w Poznaniu pod kierunkiem Prof. dr hab. Janusza Piontka

Friday, April 27, 2012

Prehistoric Scandinavians genetically most similar to present-day Poles

Scientists from Uppsala University have managed to extract genome-wide markers from the early Neolithic remains of three hunter-gatherers and one farmer from southern Sweden. They only pulled a few thousand SNPs from each sample, but that was enough to successfully compare the ancient remains to modern Europeans. The results of their study, published in Science Magazine today, reveal that Poles top the allele sharing list with the the hunter-gatherers. Interestingly, Poles also show higher allele-sharing with the farmer than Swedes do, but not as high as Cypriots and Greeks. The figure below illustrates this clearly.

Now, if we look at the ADMIXTURE analysis from the study, it suggests that the farmer was also very similar to Sardinians, albeit with more North European admixture.

So I think it's pretty clear we're dealing here with an individual of mostly deep East Mediterranean origin, whose ancestors made their way from West Asia to Western Europe, probably via maritime routes, and settled islands like Sardinia in the process. They possibly moved into what is now Sweden via Western and Central Europe, but then again, they might have gone straight from the Mediterranean to Scandinavia by boat.

But why is it that Poles show higher similarity to these Neolithic Scandinavians than Swedes do? Firstly, it's important to realize that the differences aren't that great. Note, for instance, that Swedes are the second most similar population to the hunter-gatherers after Poles. However, clearly, the data suggests that there had to be other population movements into Scandinavia after the late Neolithic. These also likely affected Poland, but to a lesser degree.

No one yet knows what these were exactly, but if I had to guess, I'd say the Bell Beaker folk of the Copper Age represented one of the major waves (see figure below from "Europe during the third millennium BC and Bell Beaker culture phenomenon: peopling history though dental non-metric traits study" by Jocelyne Desideri). Also, another factor might be that the hunter-gatherers tested by Skoglund et al. belonged to the Pitted Ware culture, which arrived in Scandinavia from the Eastern Baltic.

Anyway, I'm absolutely delighted with the results from this study. The reason is that they correlate very closely with the experiments I've been running with ADMIXTURE, aimed at untangling the story of the peopling of Europe. Note, for instance, the close correlation between the STRUCTURE plot above, and the results from my Hunter-Gatherer vs. Farmer analysis (see here). All you have to do is add up the blue and purple components from the STRUCTURE graph, and you'll basically get my "Baltic hunter-gatherer" cluster. Also, the orange component is very similar to my "Mediterranean farmer" cluster.

If Skoglund et al. had access to more prehistoric samples, then it's likely these would create their own clusters. That's because the four Neolithic individuals they tested, especially the hunter-gatherers, seem to fall outside the range of modern European genetic variation, like on some of the PCAs below. The appearance of ancient clusters wouldn't invalidate the current results, because such clusters would no doubt show a close relationship to those created by modern samples. However, I’m pretty sure they'd give us a better idea of how much hunter-gatherer ancestry survives in modern Europeans, because they wouldn't be affected by such factors as genetic drift since the Neolithic. So that’s something to look forward to in the future.


Skoglund et al., Origins and Genetic Legacy of Neolithic Farmers and Hunter-Gatherers in Europe, Science 27 April 2012: Vol. 336 no. 6080 pp. 466-469 DOI: 10.1126/science.1216304

Wednesday, March 7, 2012

Northwest Eurasians + Southwest Eurasians + Mesolithic survivors = modern Europeans

Update 23/13/2013: Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans


For a long time, it was generally accepted that Europeans were direct descendants of Palaeolithic settlers of the continent, with some Middle Eastern ancestry in the Mediterranean regions, courtesy of Neolithic farmers. However, in the last few years, largely thanks to ancient DNA, it dawned on most people that such a scenario was unrealistic. It now seems that Europe was populated after the Ice Age in a big way, by multiple waves of migrants from almost all directions, but especially from the southeast.

Getting to grips with the finer details of the peopling of Europe is going to be a difficult and painstaking process, and will require ancient DNA technology that probably isn’t even available at the moment. However, the mystery about the basic origins and genetic structure of Europeans was solved for me this week, after I completed a series of ADMIXTURE runs focusing on West Eurasia (see
K=10, K=11, K=12, and K=13). The map below, produced by one of my project members, surmises very nicely the most pertinent information from those runs (thanks FR7!). It shows the relative spread of three key genetic clusters, from the K=13, in a wide range of populations from Europe, North Africa, and West, Central and South Asia (i.e. the data represents the nature of West Eurasian alleles in the sampled groups, with only three clusters considered). The yellow cluster is best described as Mediterranean or Southwest Eurasian, while the cyan and magenta, which are sister clades, as Northwest Eurasian.

Thus, it appears as if modern Europeans are made up of two major Neolithic groups, which are related, but at some point became distinct enough to leave persistent signals of that split. They spread into different parts of Western Asia before moving into Europe. The Southwest Eurasians, possibly from the southern Levant, dominated the Mediterranean Basin, including North Africa, Southern Europe, and the Arabian Peninsula. I’m pretty sure that Otzi the Iceman is the best known representative of the ancient Southwest Eurasians (see here).

The Northwest Eurasians might have originated in the northern Levant, but that’s a pure guess. In fact, judging by the map above, their influence isn’t particularly strong in that part of the world today, and only becomes noticeable several hundred kilometers to the north and east, in the North Caucasus and Iran respectively. However, the northern Levant is actually dominated by a fourth West Eurasian cluster, tagged by me as "Caucasus" in the K=13 run, and not shown on the map above. Various calculations show that this can also be assigned to the Northwest Eurasian group, except that it seems to have split from the other Northwest Eurasian components at an early stage (see comments section here).

After their initial spread, it appears as if the Northwest Eurasians inhaled varying amounts of native Mesolithic groups in their newly acquired territories west, north and east of the Levant. This is being strongly suggested by the aforementioned ancient DNA results, at least as far as Europe is concerned. They also mixed heavily with Southwest Eurasians in Europe and nearby. That’s why, for instance, you’ll never find an Irishman who clusters closer genetically to an Indian than to other Europeans. However, even a basic analysis of their DNA, like my own ADMIXTURE runs, shows that a large subset of their genomes comes from the same, relatively recent, “Northwest Eurasian” source.

We can follow the same logic when talking about the differentiation between modern descendants of Southwest Eurasians. For instance, those in Iberia have significant admixture from Northwest Eurasians, while those in North Africa carry appreciable amounts of West and East African influence.

I’m convinced that the scenario of the peopling of Europe outlined above, by two basic stocks of migrants from Neolithic West Asia, is the only plausible one, because the signals from the data are too strong to argue against it. I’m sure you’ll be seeing the same story told by scientists over the next few years in peer reviewed papers. They’ll probably come up with different monikers for the Southwest and Northwest Eurasians, but the general concepts will be the same.

However, that was the easy part. The hard part is linking the myriad of movements of these Southwest and Northwest Eurasians with archaeological and linguistic groups. Perhaps the earliest Southwest Eurasians into Europe were Afro-Asiatic speakers? To be honest, I have no idea, because that’s not an area I’ve studied closely. But I would say that it’s almost certain that the proto-Indo-Europeans were of Northwest Eurasian stock. It’s an obvious conclusion, due to the trivial to non-existent amounts of Southwest Eurasian influence in regions associated with the early Indo-Europeans, like Eastern Europe and Central Asia.

Perhaps the simplest and most diplomatic thing to do for the time being, would be to associate the entire Northwest Eurasian group with an early (Neolithic) spread of Indo-European languages from somewhere on the border between West Asia and Europe? I know that would work for a lot of people, specifically those who’d like to see an Indo-European urheimat in Asia, as opposed to Europe. But it wouldn’t work for me, especially not after taking a closer look at that map above.

As already mentioned, the Northwest Eurasians can be reliably split into two clusters, marked on the map in cyan and magenta. I call the cyan cluster North Atlantic, because it peaks among the Irish and other Atlantic fringe groups, and the magenta Baltic, because it shows the highest frequencies among Lithuanians and nearby populations. The story suggested by the map is pretty awesome, with the Baltic cluster seemingly exploding from somewhere in the middle of the Northwest Eurasian range, and pushing its close relatives to the peripheries of that range. Thus, under such a dramatic model, the North Atlantic is essentially the remnant of the pre-Baltic Northwest Eurasians, and appears to have found refuge in Western and Northwestern Europe, in the valleys of the Caucasus Mountains, and in South Asia.

Indeed, there seems to be a correlation between the highest relative frequencies of the North Atlantic and regions that are still home to non-Indo-European speakers, or were known to have been home to such groups in historic times. For instance, France has the Basques, while the British Isles had the Picts, who are hypothesized to be of non-Indo-European stock. Note also the native, non-Indo European speakers in the Caucasus, like the Chechens, who show extreme relative frequencies of the North Atlantic component. Moreover, at the south-eastern end of the Northwest Eurasian range, in India, there are still many groups of Dravidian speakers.

Below are two maps that isolate the relative frequencies of the North Atlantic (cyan) and Baltic (magenta) components, versus each other and the Southwest Eurasian cluster, to better show the hole in the distribution of the North Atlantic. To be sure, this North Atlantic can be broken down further, but only with more a comprehensive sampling strategy, especially of Northern and Western Europe.

That’s my take on what the data is showing, and other explanations are possible. But I don’t really know what they might be? I should also mention that the potentially proto-Indo-European Baltic cluster shows a remarkable correlation with the spread of Y-chromosome haplogroup R1a, and ancient DNA rich in this haplogroup from supposed early Indo-Europeans. For more info on that, see the links below:

Best of 2008: Corded Ware DNA from Germany

Ancient Siberians carrying R1a1 had light eyes

Ancient Siberians carrying R1a1 had light eyes - take 2

Bronze Age Tarim Basin "Caucasoids" carried R1a1 (and European mtDNA lineages too)

European admixture among ancient East Asians (aka. two-rooted canines carried by early Indo-Europeans to China)

Monday, January 23, 2012

Eurogenes' North Euro clusters - phase 2, final results

This is a continuation of my ChromoPainter analysis of Europeans from north of the Pyrenees, Alps and Balkans (see here). To obtain the most accurate results possible on my laptop, I increased the burn-ins and iterations in fineSTRUCTURE to 500K each (5 hour run in all, which is all I'm willing to put this machine through). The end product looks very similar to my initial analysis, in which I explored the data at 200K burn-ins and iterations. What I think this shows is that the results are robust, and I doubt they'd change much even after a couple of days of running fineSTRUCTURE.

Indeed, as mentioned in my previous blog entry, this appears to be the most detailed and accurate cluster analysis of this part of Europe produced anywhere to date. There are 21 clusters in all, with at least 20 looking like strong signals of genetic substructures across North, West, Central and East Europe (see spreadsheet for individual classifications). They include:

pop0 - West Finnish1: This is a pair of reference individuals, most likely from Western Finland, judging by their PCA and ADMIXTURE results. They are either from the same community, or have a very similar mix of very specific ancestries.

pop1 - Erzya + Moksha: This includes all of the Erzya and Moksha in the project, plus a Russian with recent Erzya ancestry. It's closely related to ethnic Russian clusters that stretch from Northwest Russia to near the Volga, and also to the Estonian cluster.

pop2 - South/Central Finnish: This is the largest Finnish cluster, and that's probably more than just the result of sampling bias. I would say that the greater part of the Finnish population would belong to this type of cluster, which occupies regions of highest population density within the country.

pop3 - Fenno-Scandian: This cluster includes a Northern Swede, a Swede with probable recent Finnish ancestry, and Finns with probable recent Swedish influence. I have a feeling that Finland Swedes and Aland Islanders would also be placed here more often than not.

pop4 - Northwest Russian/Southeast Finnish: Although this cluster includes only two individuals, it's definitely much more than just the result of two relatively closely related samples being in the same run. I'd hazard a guess that Northwest Russians with, say, significant Ingrian ancestry, would land here, and so would Finns with recent Russian ancestry.

pop5 - West Finnish2: Based on PCA and ADMIXTURE results, most of these Finns likely come from Western Finland, probably from places like Southern Ostrobothnia. They possibly also have some Swedish influence.

pop6 - West German: This cluster is based on individuals from Western and Northwestern Germany. It also includes a Dutchman, Austrian and people of mixed origin, like a Dane with French and German ancestry, and Americans with British, German, Scandinavian and/or Polish ancestry. In other words, this is where Northwestern Europe meets Central Europe.

pop7 - Vologda Russian: Most of the Vologda Russians from the HGDP land here, so this appears to be a local cluster. Judging from its phylogeny, it looks like a mix of North Slavic, Baltic and Finnic influences.

pop8 - East Finnish: All the project and reference Finns with substantial ancestry from new settlement areas of Eastern Finland appear in this cluster. No wonder then, that this is the cluster with the highest chunk count in this analysis.

pop9 - Estonian: This is a mixed cluster, including individuals from Estonia, and, as far as I know, Russians with substantial ancestry from near Estonia. As mentioned above, it's closely related to the Erzya + Moksha, Northwest Russian and Vologda clusters. However, it's clearly much more western than any of these clusters (for instance, see the PCA below), which suggests Germanic influence in its makeup.

pop10 - Cornish: Almost all of my Cornish samples from the 1000 Genomes Project feature in this very local cluster, which shows the highest chunk count among the Western European samples. The overall results suggest a lack of outbreeding in recent times.

pop11 - French/Belgian: Interestingly, this cluster includes the bulk of the French samples, a French Canadian, and two Belgians. On the other hand, the most northerly French are placed in the more cosmopolitan Northwest European cluster (see below).

pop12 - Lithuanian: All of the more or less pure Lithuanians fall in this cluster. Those that don't are a reference sample from Behar et al. 2009, who always appears very Belorussian like in other analyses, and here sits in the East Slavic cluster, and a project member with recent German ancestry (LIT3). The Western European influence carried by the latter pushes him into the Polish/West Ukrainian cluster, despite not having any documented Polish or Ukrainian ancestry.

pop13 - Northwest Russian: This cluster appears to be made up of Russians who have more Finnic, and/or perhaps Eastern Baltic, ancestry than the individuals in the East Slavic cluster. In other words, it's more northerly, less westerly, and more closely related to the Finnic-speaking Erzya, Moksha and Estonians.

pop14 - Irish + West British: Most Irish individuals fall in this cluster, as well as British samples from Western Scotland and Wales. It's tempting to correlate this cluster with Celtic genetic ancestry in the Isles.

pop15 - South/West Scandinavian: This is basically a Norwegian and Southern Swedish cluster. It also features Swedes from other parts of the country who most likely have some German, Walloon and/or French influence.

pop16 - East German: This cluster includes individuals with significant or even overwhelming Germanic ancestry, but also with very clear Western Slavic input. One of the individuals here is of mixed Polish, German and Swedish ancestry, which pretty much sums up the character of this cluster in a modern context. The presence of two Hungarians from Behar et al. 2009. isn't surprising, because Hungary was settled by both Germanic and Western Slavic groups from the early Middle Ages until modern times.

pop17 - Northwest European: I had reasonable hopes of breaking up this large cluster into a couple of units at least. However, that did not happen, and I don't think it will unless I obtain more samples from the relevant areas of Europe, like Holland and specific parts of the UK. I think the main reason this cluster failed to budge was because of its cosmopolitan nature. In other words, the samples here include some of the most outbred in the analysis, and this, coupled with the fact that they carry very similar ancestral components, means that fineSTRUCTURE doesn't have anything to latch onto to create divisions.

pop18 - East Scandinavian: This could also be called a Swedish cluster. It's almost entirely made up of Swedes, usually from Eastern or Southeastern Sweden, and/or occasionally with recent Finnish influence.

pop19 - Polish/West Ukrainian: The vast majority of the Poles fall in this cluster, and about half of the Ukrainians from Yunusbayev et al. 2011. Most of these Ukrainians appear to be from the Lviv district in the west, and some might even have fairly recent Polish and/or German ancestry. In fact, I would say the latter is a good bet for UkrLv240Y, who shows large Western European segments on several chromosomes.

pop20 - East Slavic: All of the Belorussians cluster here, and so do Russians from near Belorussia and Ukraine, and almost half of the Ukrainians from Yunusbayev et al. 2011 (those who show more easterly genetic characteristics). An individual of mixed Polish and Lithuanian ancestry also makes an appearance here, suggesting that one of the main factors differentiating this cluster from the Polish/West Ukrainian group is a higher level of Baltic admixture in the former.

pop21 - East Central European:
This cluster is based on most of the Hungarians in my dataset, but it also includes a number of Western and Southern Slavs, often with significant German ancestry. Not surprisingly, this cluster shows very high affinity with both the East German and Polish/West Ukrainian clusters.

Let's now move on to some graphics. Below, in order of appearance, are the following: raw data coancestry matrix, showing the placement of individual samples; aggregate coancestry matrix, showing the populations (or clusters) described above; pairwise coincidence matrix, which is useful for spotting very recent ancestral ties; a PCA plot of the 21 clusters. More detailed ChromoPainter/fineSTRUCTURE PCAs of Western Europe can be found at this link.

Finally, those of you who wish to run your own experiments with the ChromoPainter datasheets from this analysis can download them here. Please note, the sheets don't reveal any raw or traits/disease data.

Saturday, January 14, 2012

Eurogenes' North Euro clusters - phase 1, exploring the data

I have some preliminary results from a new intra-North Euro cluster analysis, using a cutting edge tool called ChromoPainter. More than 400 samples and 270K SNPs were tested, in linkage mode, and then the output processed in fineSTRUCTURE at 200K burn-ins and iterations. Like I say, the results should be treated as preliminary, but they already look better than any other cluster analysis I've ever seen dealing with Europe north of the Alps, Pyrenees and Balkans. The algorithm identified 21 clusters, with most located in Eastern and Northeastern Europe (see spreadsheet for details). Below are two plots showing how the clusters relate to each other via a tree diagram and heat maps – the first shows an aggregate view, and the second the individual samples.

It's interesting that the Baltic Finns seem to create clusters at a drop of a hat, but they also share the highest number of chunks, and the longest chunks, than any other group. Indeed, all of the Finnish clusters are closely related, and many of the individuals, especially from East Finland, even look like distant relatives on the heat map (note the ultra-hot, blue squares). On the other hand, the large Northwestern European cluster, featuring samples from across the UK, as well as from several nearby countries, is holding firm, and might be tough to break up in this analysis.

I have some theories about the reasons for the obvious genetic homogeneity and diversity in Western Europe, and these include the effects of the Black Death. It decimated many populations in the western half of the continent, thus encouraging migrations into emptied areas, and eventually leading to more open, mobile societies. It's an interesting subject, and I might write much more on it in the future. Meantime, here's a PCA plot from the ChromoPainter chunk counts data. Note the large distances spanned by groups from Northern and Eastern Europe, and the tight bundle of samples from the west, mostly from the UK, Ireland, France and the Low Countries. Interestingly, and perhaps counter-intuitively, it's the closely related Finns who take up most of the space on the plot.

The first component picked up by this PCA appears to be an Atlantic one. It peaks among the Cornish samples, but shows similar levels in all the British, Irish, French, Dutch and Belgians (post-Black Death mobility?). If we are to assume that I identified the component correctly, then it appears as if the East Finns, Vologda Russians, Erzya from the Middle Volga, and Lithuanians are the least “Atlantic” samples in this analysis. These groups, especially the East Finns, also happen to act like relative genetic isolates in many of my experiments (such as ADMIXTURE and MDS analyses). Thus, it seems they've been sheltered from significant gene flow from outside in recent times, including from the west, like German emigration to East Central Europe and Scandinavian influence in Western and Southwestern Finland.