Trees within trees: the Bundle Model


Reposting here, an illustration I whipped up a few days before Christmas, for a debate on the validity of the tree model in linguistics, held at in an article draft session by fellow historical linguists and linguistics bloggers Guillaume Jacques and Johann-Mattis List. They argue against recent papers by Alexandre François and Siva Kalyan, who have proposed “freeing” historical linguistics from the tree model, and moving to an updated wave-model-esque approach they call “historical glottometry”.

I will not cover the debate here in detail, especially as the comments have been made publicly available by now (see also the link above thru to Jacques’ blog for some set-up details and further links). One major observation that I think however emerges is that there are multiple different senses in which we can speak of the “splitting” of languages — and it therefore often depends on the level of analysis how the relationships between languages should be represented.

My diagram above says nothing directly about linguistics, and is simply an abstract interleaving of two disparate tree structures: a macro-level, represented by branch distances; and a micro-level, represented by the graph topology. If you look closely, you can also see that there are indeed two micro-trees in the graph, unconnected to each other. (They likely would join paths sometime further down in history, had I continued drawing.)

There are 12 leaf nodes in this “double-tree”, which we may call A, B, C, …, L. Depending on which level of analysis we are looking at, there are two possible taxonomies generated by the two tree structures:

  • a “macro classification”:
    • [[A, [B, [C, [D, E]]]], [F, [G, H]], [[[I, J], K], L]]
  • a “micro classification”:
    • {{A, {{B, C}, D}}, {{E, {{F, G}, H}}, I}}
    • {{J, K}, L}

There are not many subgroups that would occur in both structures! The only such one is the triplet {F, G, H}… and even the subgrouping of this again diverges. There is moreover an interesting chronological complication with the splitting of this group: the micro-level branching occurs in its entirety substantially earlier than the macro-level branching.

In principle, it would be also possible to nest a third tree yet, of arbitrary structure, deeper inside the picture — so that upon zooming in, the graph representing microstructure again resolves into a set of unconnected nanostructures, branching and turning in tandem. And so on, ad libitum: fit then in an additional picostructure inside the nanostructure, or perhaps: use the current macro-division as a base for a megastructure with another geometry again entirely. (Moving from two dimensions to three or more will be required, if we wanted to fit in “non-contiguous” subgroups such as {A, C} or {E, F, J}.)

My approach here is also but one of various possibilities for “mixing” trees together. It does have one interesting constraint: in all cases, a macro-branching between two leaves takes place later, or at most at the same time (e.g. E | F), as their micro-branching. — But we could also imagine e.g. a single three-dimensional tree, whose 2D projections in a number of different directions each form a new tree of a different shape. In this case, branchings visible e.g. in the XZ-plane could be equally well earlier or later than the corresponding branchings visible in the YZ-plane.

If we imagined the above tree to indicate language relationships, perhaps linguist fieldworkers’ initial instinct would be to group the 12 varieties as 4 languages, according to the macro-structure:

  1. {A}, clearly a variety of its own;
  2. {B, C, D, E} as a set of “closely related” varieties;
  3. {F, G, H} as a more diffuse dialect continuum;
  4. {I, J, K, L} as an intermediate case.

But at some point, a closer look into the dialect diversification of these varieties might indicate e.g. that the features separating A from B-E include some traits that go quite far back, already before the B-E / F-H split. Other troubling isoglosses might also surface, where A thru I shared one value, J thru K another — and where we were regardless unable to show that the latter, “more closely related” varieties truly have innovated, and not the diverse remainder. At some point “language 2” might end up renamed a “dialect continuum” or a “linkage”, while the “more diffuse” language 3 might firmly retain its clade status. If “language 4” also would end up analyzed as a linkage is less obvious. Perhaps linguists would still hang on to analysing at least the split that distinguishes A-D from E-I as multiple unconnected events (one for E, one for F-H, one for I?)

Commentors in the session soon pointed out that my illustration reminds them of the concept of incomplete lineage sorting (ILS) from evolutionary biology. This is, roughly speaking (and any readers with more evobio under their belt than I have, feel free to correct me if this is inexact), the phenomenon that while speciation takes a parent species’ entire gene pool with it, some diversity may later end up being lost in daughter species. And if a species S with two alleles of a gene G splits into two daughter species, and allele G₁ eventually survives only in daughter S₁ while allele G₂ survives only in daughter S₂, we might end up wrongly concluding that the distinct alleles only developed in the daughter species. Moreover, if this kind of a situation takes place a couple of times, a gene may futher seem to have split into alleles in the “wrong” order, compared to the actual family tree of the species.

This is however not quite the same phenomenon that I am attempting to point at.

The exact linguistic counterpart of ILS is levelling: if we reconstruct a morphophonological alternation pattern in a proto-language, let’s say *a ~ *b, it will be possible for descendants to analogically eliminate one or the other alternant, and to end up with unvarying *a or unvarying *b. I have many opinions on levelling (most of them critical of reconstructing alternation from non-alternating reflexes; or of projecting attested alternation patterns deeper than necessary)… but that would be an overly large tangent to go on right now. Suffice to note that yes, levelling indeed also creates counter-tree-like isogloss configurations.

We could also define “lexical levelling”, brought about by the loss of inherited vocabulary. Mechanistically, this might look like a different phenomenon from morphological levelling, [1] but in terms of isogloss patterning, it often ends up looking exactly the same. An ancient proto-word might survive only in one group of descendant languages (and end up looking like an innovation particular to it); or it might be lost in a few descendants quite early on (and end up making the other descendants look like a subgroup defined by the introduction of this word); or it might survive in a ragtag assortment of not especially closely related descendants (and make it very clear that the occurrence or non-occurrence of a given word is not a strong genetic signal).

There is however a key difference between lineage sorting and my meta-trees. The “proto-variation” I’m trying to indicate by this meta-tree is not internal to a language variety. It is instead built from variation between the idiolects (topolects, etc.) that a given language is composed of.

Genes are obviously different entities from species, and likewise allomorphs (words) are different entities from languages, so it’s not a huge surprize that their family trees might not match each other; perhaps not even resemble. Two seemingly unrelated genes could turn out to be related, once you look a couple billion instead of just a couple million years back. It is hard to tell how common the same might be for seemingly unrelated words, given that our knowledge of linguistic history remains far shallower than our knowledge of evolutionary history… but even if we assumed that no such cases exist at all (which is, by the way, demonstrably untrue), loaning still often enough suffices to generate completely opaque doublets such as wool and flannel, or atoll and esoteric.

Language contrasts, dialect contrasts and idiolect contrasts meanwhile are only qualitative variations of the one and same thing: linguistic variation between speakers. And yet we can also sketch a situation where a “language split” ends up taking place along different fault lines than an earlier “dialect split” did.

This observation is by no means my own invention. For example, my Helsinki colleague J. Häkkinen calls this phenomenon “boundary shift” in a paper published a few years ago. [2] The particular example he refers to (certain divergences in vowel history in the common West Uralic era) has by now been explained otherwise, [3] but other candidates could easily be located as well. A few that spring to mind within western Uralic would be the numerous isoglosses connecting Votic with the Eastern Finnic (Savonian-Ingrian-Karelian-Veps) language group, e.g. the innovative 1st and 2nd person plural pronouns *möö, *töö, [4] rather than with Estonian, generally considered the closest relative of Votic; or the treatment of initial *d₂- in Samic, where Southern and partly Ume Sami show a development to *θ- > /h-/, but most languages show instead a development to /t-/, which happens to be also found in Finnic. [5] It is likely that many such conflicting isoglosses simply represent secondary contacts, much after the initial separation of the language groups, or even independent developments altogether, but I indeed see no reason to assume that they must all be somehow secondary. Many examples could well have taken root already during the initial dialect divergence of the involved language groups.

We know from dialectology and sociolinguistics that linguistic innovations almost always have a “width”. Instead of taking place in a single isolated variety, with inheritance from there to a set of descendants, they rather spread across some number of related-but-distinct varieties. (This is a point that François and Kalyan justly stress in their papers, if with different terminology.) A boundary shift is, then, nothing more than a change in how far exactly isoglosses coming in from a given direction end up spreading. The conventional usage of “language area” or “language contact” mainly comes up when new innovations extend wider than older ones did, and we often speak of dialect area X extending some influence to dialect area Y. But the opposite is possible as well: if new innovations “shrink” — they stop reaching a particular group of varieties — then not only does this lead to these varieties “splitting away” as a relict area from an earlier group of related varieties: it also leads to their earlier sibling varieties now “changing course” to instead align with some other adjacent “cousin” varieties.

This is the phenomenon that I attempt to capture by the various bunched right-angle turns in my opening graphic. For example, the split between “language 1” and “language 2” involves three micro-lineages (B-C, D and E) turning away in unison from the micro-lineage of variety A — even though the micro-lineage of E has already much earlier split away from that of A-D, and also the split between A and B-D is already well enough in effect. There is therefore a boundary shift here: the macro-lineage formed by A, B-D and E is broken, and only the latter two continue on together (B-D now moreover split into B-C and D). After this, new innovations again continue to accrue across the macro-lineage for a while, as represented by the linear “branch” section.

This situation does not amount to an “unitary protolanguage”, since the three lineages are, in fact, already micro-separate. An attempt at reconstructing a unitary Proto-BCDE would have to reach much deeper than this period to be able to unify also the deepest micro-divergences.

But, just about equally importantly — a single unified Proto-BCDE regardless exists, if way back there (in this case it is, in fact, simultaneously also the proto-variety behind everything from A to I). “Boundary-shrinking” in this sense can thus only operate on closely related varieties; and it can only decrease the similarity of some varieties from their earlier siblings. It is not capable of leading to the “convergence” of unrelated languages. Whatever macro-group ends up being formed by some separate lineages is not in any way converging: it is merely maintaining its pre-existing divergences at a given level, while language varieties outside the group are free to diverge further off. (Of course other processes, such as loss of archaic vocabulary, can well lead to actual linguistic convergence.)

The distinction I draw here between micro-lineages and macro-lineages however also has a different readily applicable interpretation in linguistics: genealogy vs. typology. We find no problem in stating something to the effect that Finnish and Turkish are agglutinative vowel harmony languages, while Livonian and German are a fusional vowel-reduction languages: this is taken as nothing more than a relatively superficial system of classification, separate from the “true”, i.e. genetic classification (according to which Finnish and Livonian are both Finnic, while Turkish and German are not even Uralic). But regardless, just as (proto-)languages can split into multiple descendants, language areals can similarly over time split into multiple typologies. Starting from a single point far enough back in time, we should be again able to trace a tree of diverging typologies, which is also again 1) likely to diverge in structure from any genealogical tree, and 2) likely to have all of its splits located later than the corresponding genealogical splits.

Typological divergences definitely also often involve boundary shifts of their own. If Livonian at some point in its history has taken a turn towards fusional typology, then it also has to have taken a turn away from agglutinating typology, and this quite well amounts to boundary shrinking of the “(core) Finnic macro-lineage of agglutinative typology”. Or, inversely: the relatively clean agglutinative morphology of common Finnic, still preserved in e.g. standard Finnish and Karelian, has in many later descendants been muddled by various processes of apocope and syncope: such is the case at least in Livonian, Estonian, Southwestern Finnish, Veps, and partly Ludic; more recently also in some dialects of Ingrian and Votic. This has the effect of turning inherited polysyllabic vocalic stems into “thematic stems”, arguably a step towards a more fusional typology (and at least in Livonian and Estonian, this has been a basic building block for many other innovations in morphology). Regardless, looking from the perspective of early dialect divisions in the Proto-Finnic era, the varieties involved are just about a scattershot. [6]

There also seems to be deeper similarity in here to dialect diversification, not only in the resulting tree structures, but also in the actual details of linguistic change. “Genetic macrostructural”, or “linkage-defining” wide-spreading innovations indeed have various features in common with “typological” wide-spreading ones:

  • They may ignore the microstructure of the dialect continuum;
  • They may spread in phases, taking root in different micro-lineages at different times;
  • Where independent, they may spread also over each other, forming patchwork-like rather than concentric isogloss patterns;
  • They may end up being reversed, if a counterinnovation arises;
    (I’m thinking here principally about “isomorphic” sound changes, that only affect the phonetic realization of a phoneme or a phoneme sequence, not its relation to the rest of the phonology; innovations in syntax may be applicable as well)
  • And finally, they can take the leap to “fully areal”, and spread also to “unrelated”, or at least not at all closely related language varieties.

Due to the lack of clear distinction on which linguistic innovations count as “macro” and which as “micro”, François & Kalyan have suggested roughly that we should treat them all as equally genetic. But I would claim that an opposite approach is just as well possible: since there is also no clear distinction between innovations that count as “macro” and innovations that count as “typological”, perhaps we should treat them as equally non-genetic.

So how do we reconcile these two extremes? A trivial solution would be to claim that no genetic relatedness between language varieties exists, but this obviously gets us into other conceptual problems quite fast (not to mention the troubling echoes of Marrism). Another option might be to instead deny the idea that we can speak of “the” genealogy of a language. Whenever many different and contradictory tree structures emerge, it may be worth checking if we could consider each of them to represent the descent of a different thing. A language’s nominal syntax does not have to have the same exact (areal or dialectological) origin as its vowel inventory, which does not have to have the same origin as its verb morphology, which does not have to have the same origin as its metalworking vocabulary; and perhaps it is a mistake to think that we can pick out the “One True Tree” from among the histories of these various subsystems.

But a third option yet, which I am growing increasingly fond of, would be to first grant that, yes, all usually recognized linguistic innovations are more or less “typological” or “areal” — but to then seek a deeper level yet that we could use as the rooting for the genetic origin of a language variety. My current contender for such a level is local continuity, forming what I call the bundle model.

In the absense of dialect levelling events (the introduction of expansive acrolects through e.g. migrations, mass media, or standardized schooling), a topolect specific to a given location has been primarily descending from the earlier topolect of that same village, as far back as language-level continuity gets us. A fundamental division of language varieties into topolects is also relatively unambiguous: just about any speaker either lives, or doesn’t live, in a particular village. No especially coherent division into topolects smaller than a village is possible either (at least as long as we’re talking about settled, non-urbanized, agricultural societies). [7]

A given linguistic innovation that forms an isogloss somewhere across a dialect continuum is, then, not what actually splits two topolects apart. Their existence is merely evidence that two topolects on different sides of the isogloss had already split from each other at the time. A primary splitting event instead corresponds to either the foundation of a new settlement altogether; or to the introduction of a novel language variety to a pre-existing settlement (no matter if as L2 or L1).

There is admittedly the complication that topolect monogeny is not ensured. Any new settlement could gain its speaker base from more than one pre-existing settlement; and the resulting new topolect can quite possibly end up taking on a mixture of its parents’ traits, instead of starting off as essentially a copy of one of its parents.

As for secondary splitting events, i.e. the actual language diversification, these could be instead said to form “bundles” of local micro-lineages: a category which includes as subtypes all three of “language areas”; “linkages” of related languages; and “subgroups” defined by common features. The differences between the three are, in the bundle model, considered differences in degree, not kind, with no sharp boundaries between them. However, it seems to be necessary to note that there are at least two gradual transitions here: half-a-continent-spanning language areas are still clearly different from local linkages, which in turn are also clearly different from small, tight bundles of topolects.

Also, amusingly enough, not only is it possible for a bundle to comprise language varieties of differing genetic backgrounds — it is also possible for a genetic group of languages to fail to be identified by a corresponding feature bundle. I expect many large-scale subfamilies to be indeed genetic subgroups, in addition to their unambiguous bundle status. But within any one such subfamily, it is easily possible for various smaller genetic groups to have formed, and then split up again, fast enough that no actual linguistic markers managed to establish themselves as characterizing the entire group (and only it).

What would be different for “secure” subfamilies (and “primary” language families) is moreover not their speed of formation. I would equally well expect that e.g. the main local-continuity genetic groups of Finnic had already split from each other before the vast majority of the innovations that today characterize the Finnic subfamily (no matter if one current primary branch would amount to half the Finnic language area; or to a single backwoods town somewhere in southern Estonia). It is the extinction of other early connecting varieties that allows me to be relatively sure that, yes, there was once a common genetic ancestor of the Finnic languages that was also not the genetic ancestor of e.g. any of the modern-day Samic languages. This common genetic ancestor could very well still predate various innovations that did spread to both the Finnic and Samic languages, putting it well within Proto-Uralic times, and thus looking distinctively non-Finnic. If we look for biological parallels, this “common genetic ancestor” thus functions the most like the identical ancestors point.

By contrast, reconstructible Proto-Finnic, no matter if we define this loosely by the last innovation common to all the languages (e.g. in phonology, the best candidate is *š > *h), or more strictly by the last innovation that is not predated by any innovation particular to a smaller set of varieties (in phonology I’d suggest for this something like the raising *aa > *oo, *ää > *ee), instead functions as the mere last common ancestor of the “population” of Finnic language varieties. In practice, this would mean something like the last language variety whose distinguishing linguistic characteristics were eventually uptaken by all other Finnic varieties known to us (either with or without allowing for the survival of additional earlier characteristics).

The bundle model also seems to have the benefit that we could make much closer use of archeology in determining when have various micro-lineages originally split from each other. If a cultural wave that we identify as Finnic reaches Southwestern Finland already in 500 BCE — then very well, let us assume that the deepest distinctions between individual western Finnish dialects could have already taken root at the time (and not at whatever time distinctions first start turning up in phonology, or morphology, or vocabulary). After this, we expect to see the foundation of new Finnic-speaking settlements in quick gradual succession, followed by the slower bundling of linguistic innovations (and possibly isoglosses) on top. But just as dialectologists and “linkageists” have long observed, there is no reason to a priori expect these later innovations to form a clear nested tree-like structure.

I have thus ended up agreeing partly with both the Jacques-List and the François-Kalyan camps. As per the latter, yes, we should stop trying to force our analyses of linguistic innovations into a tree shape by default; but per the former, no, this does not mean that we should up-end the concept of “genetic relatedness” entirely, and start applying it also to what are obviously areal units joined only by relatively late innovations (and though I’ve barely even touched the topic in this discussion, also: no, F & K ‘s “historical glottometry” is not an especially illuminating way of demonstrating the historical development of language groups).

For closing, I present here another imaginary diagram, this time more heavily un-tree-like (highly dialect-continuumish), and with some specific features of the bundle model illustrated. — For credit, this is again not completely original work. My key convention of presenting isoglosses as horizontal lines connecting multiple varieties is inspired, foremost, by earlier articles by Sammallahti and Viitso. [8]

  • Solid lines indicate micro-lineages, just as before;
  • Wide-angle turns indicate spreading events;
  • Small-angle turns (mostly) indicate boundary shrinking events;
  • Dashed lines indicate (some) isoglosses, bundling micro-lineages together;
  • Dead ends in T indicate language replacement events;
  • Dead ends in X indicate abandoned settlements.


I leave it to you to explore the picture further, e.g. to figure out how many processes that I have discussed above you can find illustrated.

[1] They also do share some important mechanistic similarities. If we treat morphophonology as lexicalized rather than surface phonological — then “alternating stem variants” will be nothing more than lexically separate words altogether; and “morphological levelling” amounts to the loss of such “transparently suppletive” words from a paradigm. This is often showcased by morphophonological alternants that lose their original function, but remain in some specialized one.
— A simple example might be Finnish syöpä ‘cancer’. Originally this is simply the active present participle of syö- ‘to eat’; however, it has been ousted from this function by a newer form syövä ‘eating’. Here -vä is the most regular front-vocalic APP ending, analogically drafted in from much more common bisyllabic verb roots (e.g. elä-vä ‘living’, tietä-vä ‘knowing’, käänty-vä ‘turning’, pese-vä ‘washing’), where it is phonologically regular (due to lenition *p > *b > v between unstressed vowels). Hence, the history here involves three steps: 1) the semantic enrichment syöpä ‘eating’ > ‘eating; cancer’; 2) the introduction of the more regular form syövä into the paradigm of ‘to eat’; 3) the loss of the form syöpä ‘eating’.
[2] Häkkinen, Jaakko (2012): “After the protolanguage: Invisible convergence, fake divergence and boundary shift”. — Finnisch-Ugrische Forschungen 61: 7–28.
[3] The Erzya dialects in question seem to agree with Samic in suggesting (West) Uralic *we- in a couple of words, in contrast to forms suggesting *(w)o- in the other Mordvinic varieties. This though turns out to be merely a part of a more general late conditional sound change *u- > /vi-/ in these dialects; see Ante Aikio’s article in SUSA 95: 42.
[4] Discussed in some detail by Terho Itkonen (1983): “Välikatsaus suomen kielen juuriin“. — Virittäjä 2/87: 214–217.
[5] An example taken from the isogloss map of Finno-Ugric by Tiit-Rein Viitso (2000): “Finnic Affinity”. — Congressus Nonus Internationalis Fenno-Ugristarum I: Orationes plenariae & Orationes publicae: 153–178.
[6] This actually goes further yet. Also “Estonian” and “Finnish” have been known for long to be basically typological groupings formed in this fashion, both comprising multiple different genetic micro-lineages, some of which are not especially close in origin. Very roughly, if a Finnic variety is fully consonant-gradating, relatively archaic in its morphology otherwise, mostly nonpalatalizing and lexically Swedicized, it is “Finnish”; if it is consonant-gradating, fully syncopating and apocopating, and lexically Germanized, it is “Estonian”. Laxing the definitions a bit might also allow us to call Karelian, Ingrian and Votic “typologically Finnish”, versus Livonian “typologically Estonian”. — Constructing a definition of “typologically Veps” as a third areal is left as an exercize for the reader.
[7] A slightly modified model, allowing for “locations” to be territories rather than settlements, as well as for more fluid transitions and exhanges between tribal units, would seem be required for nomadic and certain hunter-gatherer societies. This might also provide some degree of explanation for, and new tools for addressing, the difficulties in reconstructing the linguistic pre-history of areas characterized by heavy diffusion between “unrelated” or not closely related languages, such as Australia and Central Asia. I do not think I am quite going into reviving the punctured-equilibrium paradigm of linguistic history here, which likewise denies the possibility of figuring out clear tree-like linguistic histories for mobile societies… but discussing the distinctions between that model and mine would be too much to chew on right now.
[8] See e.g. Sammallahti, Pekka (1977): “Suomalaisten esihistorian kysymyksiä“. — Virittäjä 2/81: 119–136.
– Viitso, Tiit-Rein (1999): “On Classifying the Selkup Dialects”. — Europa et Sibiria. Veröffentlichungen der Societas Uralo-Altaica 51: 441–451.

Tagged with: , , , , , ,
Posted in Methodology
16 comments on “Trees within trees: the Bundle Model
  1. Kathryn Spence says:

    I think this is exactly the model I’ve been entertaining for a while in my head, though I’d been thinking of it as a more realistic compromise of the tree model and the wave model.

    • Kathryn Spence says:

      To clarify, the model I conceived of would replace languages with bundles of dialects – which may themselves split into more dialects. These bundles could remain together, sharing innovations, or split up into smaller bundles. Two bundles, related or not, could additionally converge to share innovations, though I don’t think I allowed for a dialect to ‘jump’ bundles. Now, I think that my dialects directly correspond to your micro-lineages, and the bundles constituting languages correspond to your macro-lineages. If this comparison is as accurate as I suspect, it’s an excellent way to interpret the model, and I’d be very interested to see this applied to real-world cases where the relevant data is available.

  2. David Marjanović says:

    Commentors in the session soon pointed out that my illustration reminds them of the concept of incomplete lineage sorting (ILS) from evolutionary biology.

    I was going to say: “Incomplete lineage sorting! Yesss.” :-)

    Your explanation of it is correct (for some meanings of “speciation” – but all is precisely correct if you use the unambiguous term cladogenesis).

    More later, it’s too late at night; but generally I like this post very much.

    • David Marjanović says:

      This situation does not amount to an “unitary protolanguage”, since the three lineages are, in fact, already micro-separate. An attempt at reconstructing a unitary Proto-BCDE would have to reach much deeper than this period to be able to unify also the deepest micro-divergences.

      But, just about equally importantly — a single unified Proto-BCDE regardless exists, if way back there

      Slavic and West Germanic come to mind as examples of such situations where micro-divergences began noticeably earlier than macro-divergences:

      reconstructible Proto-Finnic, no matter if we define this loosely by the last innovation common to all the languages (e.g. in phonology, the best candidate is *š > *h), or more strictly by the last innovation that is not predated by any innovation particular to a smaller set of varieties (in phonology I’d suggest for this something like the raising *aa > *oo, *ää > *ee)

      Last innovation common to all Slavic languages: the Great Slavic Vowel Shift, particularly *ō > *u and *a > *o. (That’s not the Proto-Balto-Slavic *ō, but the new one from PBS *au and *eu and from loans like Saloniki > Solun.) Last innovation not preceded by an innovation with more limited spread: umlaut, which happened before the GSVS and before at least one palatalization among other things.

      Last innovation common to all West Germanic languages: if we don’t count umlaut, then perhaps syncope and/or anaptyxis, both of which must have postdated the High German consonant shift. Last innovation not preceded by an innovation with more limited spread: perhaps *ðw, *zw > *ww, which happened before the common WGmc. shift of *ð > *d in all positions; this shift is preceded by “Bahder’s law” (devoicing of fricatives, including *ð, before sonorants in a large but apparently unidentifiable part of High German).

      • j. says:

        Really some amount of this goes on with just about all large language subfamilies I’ve ever looked into in more than passing detail.

        • North Germanic: the original East Norse / West Norse division is almost completely opaque by now, in favor of a much starker Insular / Continental division;
        • Italic: too many details are not available, but at least *θ > f is a common change postdating some individual innovations;
        • Greek: Mycenaean already shows ‹o› and not ‹a› for some syllabic resonants, predicting later dialect developments;
        • Iranian: the Saka group has *ćw > *ś already before the pan-Iranian *ć > *c > s; this and Balochi seem to have been never affected by *Pʰ > *F;
        • Samic: aside from my example with *d₂- > /t-/ (which strikes me as relatively likely to be an independent development), especially Southern Sami has substantially different treatment of a couple inputs to the Great Sami Vowel Shift;
        • Samoyedic: Nganasan micro-diverges early on in its vocalism, but continues to share other innovations with Nenets-Enets, including the famous epenthesis of /ŋ-/.
        • David Marjanović says:

          On Iranian, do you know the recent paper that argues for a Central Iranian branch as opposed to the traditional East / West division? I don’t have access to it. And is *sw > *ś imaginable?

          • j. says:

            If you mean the one by Agnes Korn just last November, unfortunately I have not seen it either. Some kind of a Central Iranian branch is not her invention though, and a division based on the reflexes of *ćw (Persid *s / Central *sp / Sakan *ś) is already suggested e.g. in Gernot Windfuhr’s dialectology chapter in Routledge’s The Iranian Languages (2009). You’d get a slightly different picture with other isoglosses however (e.g. *c > θ and *dz > *ð > d make a narrower Persid isogloss, not shared by languages like Kurdish), and Korn seems to be likewise attempting to define a more limited Central Iranian group.

            *sw > *ś seems kind of thinkable, but routing would have to be difficult. [ʂ] or [ʃ] would make a reasonable first step, but since this does not merge with *š [ʃ] from RUKI or /ʂ/ from things like *ćr, you’d need to instead assume something odder like *[sʷ] > *[ʃʷ] > [ɕʷ] > [ɕ].

            On this topic, there’s also a paper from 2005 arguing against “Iranian” as a coherent concept at all by Xavier Tremblay, which I’d also be interested in seeing, but it’s published in some obtusely inaccessible conference proceedings volume…

            • David Marjanović says:

              Thanks! And yes, that’s the one I meant.

              it’s published in some obtusely inaccessible conference proceedings volume…

              I don’t really understand why people do that.

              • j. says:

                Not inaccessible through obscurity though: it’s more that it’s a giant brick with a cost well in triple digits. I suspect it’s well available enough in universities with more lively IE departments. (At Helsinki we do nominally have a Master’s programme in IE studies, but it has no faculty, no regularly scheduled courses, and barely any grad students even…)

          • Rémy Viredaz says:

            David Marjanović “is *sw > *ś imaginable?”
            A change *sw > *św took place in some Northern Dardic languages.
            The same change can be supposed as a first step to the reflex śś of Khotanese and š of Wakhi. It is an areal feature.

  3. David Marjanović says:

    I’ve read all the comments now! All I have to add is that John Cowan (he’s everywhere, isn’t he? :-) ) is wrong when he insists that “species” and their phylogeny are what biologists are really interested in, and wrong when he implies that species are by default defined by the ability to interbreed. In lots of papers, biologists have been happy to use individuals as their “operational taxonomic units”/”terminal taxa” and to either use the resulting tree to delimit species or to just not care about species; and the Biological Species Concept, which defines species by the ability to have fertile offspring, is much less popular among biologists these days than several other species concepts (…each of which describes different entities and unfortunately slaps the same label on them).

  4. M. says:

    We find no problem in stating something to the effect that Finnish and Turkish are agglutinative vowel harmony languages, while Livonian and German are a fusional vowel-reduction languages

    As far as I can tell, “agglutinativity” is a property of individual morphemes, not of languages as a whole. It may be possible to call a language “agglutinative” in a statistical sense (i.e. it has a greater percentage of agglutinative morphemes than a given benchmark), but to my knowledge there is no evidence for saying that agglutinativity is some kind of “organizing principle” of languages that precludes them from having any fusional morphology.

    For example, despite the usual characterization of Finnish as “agglutinative”, the past passive/impersonal suffix -(t)tiin cannot be cleanly parsed into morphemes meaning “past tense” and “impersonal”: to my knowledge, no one has explained the vocalism or the geminate -tt- without recourse to irregular deletion/analogy.

    Given this, I would say that not only is there no basis for using agglutinativity as a basis for language relationship between two languages — it’s also dubious whether agglutinativity qualifies as a criterion of *similarity* in the first place.

    • M. says:

      “for language relationship between two languages” -> “for historical relationship between two languages”

  5. […] avec J.-M. List présente ma position sur le sujet; voir aussi le débat qu’il a suscité et ce billet de l’ouraliste Juho Pystynen). C’est justement l’application rigoureuse […]

  6. […] Pystynen présente aussi dans un billet écrit il y a quelques années une illustration claire de la notion de tri de lignées incomplet en […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Enter your email address to follow this blog and receive notifications of new posts by email.

%d bloggers like this: