A note on the Mitian Argument

An article to have caught my attention tonight: Mikael Parkvall (2008), Which parts of language are the most stable?, Sprachtypologie und Universalienforschung 61/3.

The main momentum of the paper is to define a statistical measure of the “arealness” or “geneticness” of a particular linguistic feature. This can be accomplished with fairly elementary calculations, once given a large dataset (the author uses, not especially surprizingly, WALS). Typologists will likely find the excercise illustrative, both in its general array of eyeball-able results, and in demonstrating how even the simplest bit of math can go a long way. [1]

One result stands out to me: among the features found the most strongly genetic, at #3 stands “M-T pronouns” — i.e. the likes of Uralic *minä, *tinä, and their suggested distant relatives in Indo-European, Yukaghir, Turkic, Mongolic, etc. (families that, taken together, form a subset of the Nostratic macrofamily hypothesis known as “Mitian”). Parkvall does not fail to notice this result either.

This may still require a number of caveats. WALS does not pack a very large number of etymological data sets, and is more geared towards features that can instead illuminate areal patterns. And, perhaps as a warning, the #1 most genetic feature on the list turns out to be “presence of phonemic clicks”.

As people who dabble in linguistic classification most probably know, click consonants have traditionally been held as a defining marker of an alleged “Khoisan” language family of southern Africa, first proposed by notorious “lumper” Joe Greenberg. However, putting together more conventional evidence for this grouping has over the years proven near-impossible, and these days conservative analyses instead seem to have settled on distinguishing some 3-4 separate families (the larger units with some acceptance being Khoe, Tuu, and Ju-ǂHoan) in place of unified Khoisan.

(An additional point, if you look closely at the math behind the stats, is that the highly genetic assessment of clicks gets a slice of its homogeneity score not just from the high homogeneity of the “Khoisan” families in their presence of clicks; but also from the complete homogeneity of all non-African language families in their absense of clicks. This argument can be expected to equally apply to any other trait that is truly a single-family or single-geographical-area idiosyncracy, rather than one found sporadically around the world.)

Regardless, we see “Mitianness” still squarely beating out various common tell-tale signs of established-family genetic relatedness, such as the presence of ejectives; sex-based noun gender systems; or polysynthesis.

At some point in the future, once we have an “etymological WALS” at our disposal, it would be moreover interesting to repeat this experiment with a few other lexical variables. E.g. how do numerals or body parts stack against pronouns in genetic classification? What are the stablest kinship terms? How good a job does the Swadesh list really do? Are there any interesting surprizes to be found in words for abstract concepts? Do old and universal enough cultural concepts (think “pottery”, “hunting technology”) behave as if they were core vocabulary? Etc, etc, time will tell.

[1] Of course, something like 90% of the time, “the simplest bit of maths” seems to be all that we have yet in linguistics. This is surely great news for people who are not professionals, but who want to follow linguistics arguments along from home; or for the career plans of people like myself, who know enough undergrad-level maths to craft a couple other elementary mathematical tools for testing this or that hypothesis, if necessary. On the other hand, it is a less than promising sign about the overall quantitative reliability of our field in general, so far…

14 comments on “A note on the Mitian Argument
  1. David Marjanović says:

    It seems that comments with more than one link land in the spam filter. Trying again in two parts:

    these days conservative analyses instead seem to have settled on distinguishing some 3-4 separate families (the larger units with some acceptance being Khoe, Tuu, and Ju-ǂHoan) in place of unified Khoisan.

    This less conservative reconstruction does present evidence for a distant genetic relationship of all these languages, including Sandawe.

    • j. says:

      It seems that comments with more than one link land in the spam filter.

      Moderation queue, not the spam folder, but as you wish.

    • j. says:

      Of course, this is Starostin, known to “present evidence” for distant genetic relationships between pretty much everything (“Borean” correspondences between “Nostratic” and “Sino-Caucasian”, etc.) A hunt for long-range deep lexical connections is not going to help much for showing that Khoisan constitutes a clade. At minimum, one would need to add other potentially related language groups as control groups (Nilo-Saharan, Niger-Congo, any of their alleged components that don’t show clear enough evidence of belonging together with the rest of the family), as well as to account for how similar root structure makes it much easier to identify potential inner-Khoisan than other potential cognates.

      It is most likely that the Khoisan languages are somehow related to each other at a deep enough time level, but that goes for literally everything; already anthropology strongly implies that e.g. a “Proto-Exo-African” language once existed.

      • David Marjanović says:

        Starostin does not, in fact, accept Nilo-Saharan, only at most something like its eastern half.

        Thanks for letting me know about the moderation queue; next time I’ll just wait. :-)

        • j. says:

          Yes, I’m aware. “The eastern half” is more like “the eastern 80%” though in terms of the number of languages, so this does not really strike me as a rejection of the hypothesis entirely (though since he does reject the affinity of the Saharan languages, it might be best renamed something like “Macro-Sudanic”.)

  2. David Marjanović says:

    The basic vocabulary of Hadza/Hatsa, though, is (surprise, surprise!) Afro-Asiatic – the words with clicks probably belong to a thick fat substrate, much like in Dahalo.

    E.g. how do numerals or body parts stack against pronouns in genetic classification? What are the stablest kinship terms?

    I wonder if there’s even a single answer to such questions. Some contact situations lead to rampant borrowing of numerals for some or all purposes (as in the Maghreb or the Philippines), others don’t (witness the remarkable stability of IE numerals). Body parts are stable until they are attacked by a wave of dysphemisms; hence the shockingly low continuity between Latin and Romance in this respect, which even extends into German (Kopf from Latin cuppa “bowl”).

    • j. says:

      I wonder if there’s even a single answer to such questions.

      In either case we’re going to need an accurate first-order picture before it will be smart to start positing fine-tuned sociolinguistic correction terms. Would Kopf demonstrate the possibility for a general conditional unstability of body part terms, a relative unstability of ‘head’ in general, or something in-between?

      • M. says:

        On the evidence of the various IE branches, words for “head” don’t seem especially stable: the Germanic and Latin terms for “head” point to *kaput, Baltic and Slavic to *galwa, Celtic to *kWennom, etc.

        Likewise, IE has many branch-specific words for “hand”; *gHes(r)- may have been the original IE term, but this word is absent (at least in the meaning “hand”) in all the European branches of IE except Greek and possibly Albanian.

        More stable meanings seem to be “foot”, “eye”, “ear”, “nose”, “heart” and perhaps a few others, although for each of these meanings there is at least one IE branch that has innovated its own term.

        • j. says:

          Meanwhile in Uralic we see e.g. ‘arm, hand’ (replaced only in Samoyedic) being much more stable than ‘leg, foot’ (the most widespread root *jalka survives as such in Samic + Finnic + Mari, with adverb ‘on foot’ reflexes in Mordvinic + Hungarian), and no evidence of a PU root for ‘nose’ at all (a root *närə has this meaning in Mari & Permic, but most likely originally meant simply ‘point, hump’). ‘Head’ has three competing PU roots. ‘Eye’ and ‘heart’ are 100% stable; at the subgroup level also ‘liver’, though most Sami varieties have switched over to a Scandinavian loanword.

          This is already surely enough to show that we would need a wider statistical survey to draw whatever conclusions could be drawn about the relative historical stability of body part terms.

          I actually have a pipe dream about some day starting a project similar to Zompist’s Numbers in >5000 Languages, except about body part terminology…

          • David Marjanović says:

            Another question is what to count as a body part. Internal organs without cultural significance (heart) may be more commonly encountered as food items, as in Italian fegato, French foie “liver” from Latin iecur ficatum, “liver with figs” – and that’s cultural vocabulary, at least to some degree.

            • M says:

              The heart isn’t an entirely “internal” organ, though, in the sense that you can feel it beating from the outside.

  3. Daniel N. says:

    I did some very basic, amateurish research, and Swadesh list don’t perform that well. For example, “fire” is a part of “basic” vocabulary, but it has various words in IE, and Romance did a semantic shift FIREPLACE > FIRE.

    Even better, quite closely related dialects of South Slavic (e.g. different dialects in Croatia) have different words for fire: ‘oganj’/’ogenj’ (obviously inherited) vs. (standard) ‘vatra’ – which is borrowed from Romanian/Balkan Romance which borrowed it from (Old) Albanian, where it means “fireplace” – the same shift again…

    I’d say the best preserved terms have specific semantics and specific (stable) phonetics. E.g. “nose”, or “three” (Croatian/Serbian: nos, tri).

    • j. says:

      ‘Fire’ in Indo-European is a kind of an anomaly, probably because of two competing original terms with subtly differing semantics (although this is maybe not the whole story — ‘water’ has a similar setup but it’s much clearer than one of these behaves as the “main” root.)

      I don’t think stable phonetics are in any way related to other kinds of stability: e.g. some of the most etymologically and semantically stable Uralic roots are phonetically highly variable (particularly huge messes of this kind are ‘heart’, ‘night’, ‘marrow’ and ‘metal’). It sure makes it easier to identify cognates in the first place, though.

      But yes, the Swadesh List is a bit too much of a blunt instrument really, and one way forward from it would be to put together phylum-specific lists of best-retained vocabulary. These would probably be more culturally dependant as a rule: e.g. the Indo-European list would end up unusually rich in numerals, as well as in animal husbandry terms (‘cow’, ‘sheep’, ‘horse’).

