Will Someone Please Reconstruct Proto-Kurdish Already

Some things about comparative linguistics you might just take for granted in your own little corner of a particular language family, until you start looking at how they do things in others. In Uralic studies, we’ve known for 200+ years, and put into explicit practice since 150+ years ago, that progress requires documenting unwritten language varieties (just comparing literary Hungarian / Finnish / Estonian / Sami runs out of steam fast [1]). For 120+ years, even, that it’s additionally good practice to get detailed interdialectal comparison of such languages started sooner rather than later, not just rely on one well-known doculect.

The big dog of our Eurasian linguistic region, Indo-European studies, has of course an enviable access to a good bunch of attested Old Indians, Old Church Slavonics and Old High Germans, which are lot more directly compareable with each other. But you’d think the field would have somewhere during the 20th century understood at least that, yes, newer-attested languages will have contributions to make to the overall picture too. Remember e.g. how Nuristani, a little bunch of languages up in the mountains of Afghanistan, turned out to have the key evidence for affricate reflexes of *ḱ *ǵʰ *ǵ in Proto-Indo-Iranian, preserved several millennia longer than in Avestan or Sanskrit?

Where Slavistics, Baltistics, Germanistics, Armenistics, Romanistics have all still gotten their general comparative programmes rolling pretty well, Indo-Iranian keeps being a rock that drags behind pretty badly. Considering extra-scientific causes, this is not a giant surprize / is clearly in some amount thanks to these other sub-fields’ status as National Sciences in the various nation-states of Europe. Still, this would not have to be the case, it’s not like Celtistics has been left in the dust. Comparative linguistics also seems like something with sufficiently little direct political valence that it should be doable enough even e.g. under Iran’s current theocratic administration, let alone by the sizable and somewhat intellectual-leaning Iranian diaspora(s). Indian fans of the Out-of-India theory also demonstrate an existing if unorganized interest in linguistic history.

But indeed. Indo-Iranian is not just any random branch of Indo-European; it is today the largest branch (e.g. Glottolog counts 319 varieties, out of 581 Indo-European varieties altogether), and also the only one to preserve all of its known main branches since antiquity. Reflects more branches today than in history, really: already Nuristani is nowhere to be seen until the 19th century. By contrast, in Europe East Germanic, West Baltic, Continental Celtic, Aeolian Greek etc. are long gone. If anywhere in IE, it is in Indo-Iranian that we should expect to be able to reach quite deep time depths by collecting data from modern varieties and applying comparative reconstruction efforts as usual. Yet this generally seems to have not been done, and approximations derived mainly from Sanskrit and Avestan end up making do as Proto-Indic, Proto-Iranian, and the main fodder for Proto-Indo-Iranian.

By now there is clear evidence that this is insufficient. One informative case from recent years is Martin Kümmel’s observation that “secondary” word-initial h- in several Iranian varieties — at least Khotanese and many western Iranian varieties including Middle to New Persian — actually seems to be a retention of PIE laryngeals (especially *h₂)! This may not have been completely out of the blue. Laryngeal hiatus in Vedic (*aHa > *a.a > ā in some cases still parsing as two syllables) has been known since the early decades of laryngeal theory, and Cheung’s Etymological Dictionary of the Iranian Verb from 2007 takes an extremely cautionary approach of projecting all PIE laryngeals into Proto-Iranian, including an implausible-looking contrast between this *H and secondary Iranian *h < *s (and implausible-looking clusters like *Hhauš- ‘to dry out’). [2] Regardless we do see that it is incorrect methodology to treat any divergences from attested Old Iranian as innovations, and that this will fail to connect archaisms in marginal new Indo-Iranian varieties back with the wider programme of Indo-European reconstruction. The same has been very patiently explained by Kümmel too, in a 2016 paper “Is ancient old and modern new? Fallacies of attestation and reconstruction“.

I’ve picked Kurdish here as a semi-random example of a modern Iranian language group that probably deserves closer investigation in this fashion, though its western peripheral location might indeed make it a more likely location for archaisms than smaller languages more fully encircled by Persian. It quite clearly shares at least the propensity of retaining *h₂. Even just looking over the lexicon of standard Kurmanji as listed at Wikipedia readily turns up cases like hêk ‘egg’ << PII *Hāwyam < PIE *h₂ōwyom; hirç ‘bear’ << PII *Hr̥ćšas < PIE *h₂r̥tḱos (~ Middle Persian xāyag, xirs). However also cases like hesp ‘horse’, where some kind of “aspiration throwback” could be considered (*asp- > *esʰp > hesp).

The outlines of Kurdish historical phonology are known, of course. Relatively detailed discussion is readily found in sources like Asarian & Livshits (1994), or at Iranica Online. What seems to be missing from these accounts, however, is any real integration of variation among the Kurdish “dialects” (by now widely thought to comprise at least 2–3 languages). They also spend much effort on lamenting difficulties in telling what might be native Kurdish words and what loanwords from Persian or Zazaki or some other neighboring Iranian variety; same as in many other studies on individual western Iranian varieties. But we — at least e.g. us Uralicists — know quite well that attention paid to dialectology is often able to resolve such issues! Maybe some Kurdish variety would turn out to display a form different from the others that would then need to be considered the native one; or to display a different loanword substitution, pointing in favor of relatively recent loaning, whether from Persian or not. Dialect differences could also help with relative chronology, in telling late areal changes (and across Iranian these are many) apart from what really are early Proto-Kurdish innovations. The retained laryngeals, too, are noted by Kümmel to not be entirely systematic. Conceivably it could be the case that e.g. Kurdish only gets them through Persian, at some older or newer date. Or inversely, maybe Kurdish might be in its native vocabulary more systematic about this than Persian is. No way to tell before looking.

Let me be clear here on the proposal. A reconstruction of e.g. Proto-Kurdish should not rely on just some handful of already available descriptions / dictionaries (though I’m sure their comparison, too, would already add up to several results), nor aim just for identifying phonological variation. The goal of such a project should be primarily lexicogeographic: to have detailed enough dialectological picture to be able to see the directions of vocabulary spread, to tell local innovations apart from local archaisms. In Uralic studies, when putting together an understanding of, or at least the data for understanding, Proto-Samic, Proto-Mari, Proto-Permic, Proto-Mansi, Proto-Khanty, Proto-Selkup, etc., we have routinely based this on low double digit numbers of varieties, each documented at least to low quadruple digits of vocabulary. And these are all smallish language groups, spoken by some tens or hundreds of thousands of people. The Kurdish languages have tens of millions of speakers altogether. Even if extensive fieldwork in Kurdistan were to look too dangerous or politically complex right now, already connecting with the diaspora communities worldwide should be easily able to provide data on some dozens of varieties.

I do not pretend that this would be a small or quick task (it is clearly beyond what I or anyone could accomplish as just one unattached researcher), but it seems like a very doable task, and likely fruitful, not just for the circles of Kurdish studies or Iranian studies, but for Indo-European studies altogether. And by no means is this a gigantic endeavor either. This could be all done in under a decade by one research group, if there was first of all the will for it to happen (be funded and prioritized).

Closing up this plea, let me also suggest one other hypothesis that could be up to something. In existing overviews, Kurdish is reported to “sometimes” show PIr. *x > kʰ, e.g. *xara- > /kʰer/ ‘mule’. The facts that this development (1) fails to be regular and (2) seems to be a regression (alleged PIr. free-standing *x is < PII *kʰ) should already suggest that it is perhaps an archaism rather than an innovation. The same might go for /tʰ/ from “PIr. *θ” < PII *tʰ, reported at least in *θaiwar- > /tʰiː/ ‘brother-in-law’. This interpretation is not airtight off the cuff by any means: both Armenian and Semitic influence could have encouraged secondary introduction of aspirated stops. But, interestingly, on a brief look-around I do not find cases where Persian /x/ ~ Kurdish /kʰ/ would derive from a secondary *x that continues *h₂, only cases with PII *kʰ. From the former, the result seems to be /h/, as above in e.g. ‘egg’. So did Kurdish regularly shift *x > /h/, while never shifting *kʰ? Again, detailed dialect evidence could perhaps swing this either way. One of these decades we will hopefully know better.

[1] Yes, written Sami already existed 200 years ago, indeed since the mid-17th century. The first variety to have been standardized to some practical extent was so-called “Old Swedish Sami”, a clergy-designed form from the mid-18th, based most closely on Ume Sami though aimed as a general western interdialectal standard. Standard Northern Sami took its first steps around the same time as well.
[2] Omniretained laryngeals are furthermore trouble also for e.g. RUKI. If we have *s > *š in e.g. *buHs- > *buHš ‘to endeavor’, as if triggered by a preceding *H and not *ū, why not also in e.g. *yaHs- > *yaHh- ‘to girdle’? Without the assumption of universal laryngeal preservation, though, this could be easily resolved by assuming *eH >> *ā as an independent vocalization from *iH/*uH > *ī/*ū. Note also a further but welcome corollary: if we do go with thinking that RUKI in *buHš- has been triggered by a long-retained *H, then also *Hhauš- will have to be simplified to just *hauš-, indeed already to a pre-RUKI late PIE *sews- < early PIE *h₂sews-.

16 comments on "Will Someone Please Reconstruct Proto-Kurdish Already
  1. David Marjanović says:

    *standing ovation*

    IEists have long been peculiarly uninterested in extant diversity. In the late 19th century IE was reconstructed from the ancient languages under the assumption that those either were the branch protolanguages (especially Vedic = Proto-Indic) or close enough (Old Church Slavonic = Proto-Slavic modulo *TVRT). That makes all later varieties irrelevant for that purpose, except if a morpheme happens to be unattested in the ancient texts and can’t just be swept under the carpet of blissful ignorance.

    I hope I’m wrong, but I think the first work up to Uralicist standards of comprehensiveness is Kroonen’s etymological dictionary of Proto-Germanic, and that’s from 2013.

    Honorable mention for Kurt Goblirsch, who read five hundred descriptions of about as many German dialects for his study of the High German Consonant Shift ( ~ 2005).

    However also cases like hesp ‘horse’, where some kind of “aspiration throwback” could be considered (*asp- > *esʰp > hesp).

    Given that the [s] is still there, that strikes me as unlikely. I wonder if we’re looking at a retained *h₁, because:
    – Kümmel has collected evidence that the PII aspirated plosives continue not only *h₂, but also *h₁, meaning *h₁ was not lost as a consonant in all positions immediately after the PIE state;
    – I’m not simply assuming *h₁ in the “horse” root because it would otherwise begin with a vowel. *h₁ is required by Bozzone’s law to explain the h of hippos.

    In existing overviews, Kurdish is reported to “sometimes” show PIr. *x > , e.g. *xara- > /kʰer/ ‘mule’.

    That’s interesting, because *kʰara- “donkey” is a well-known loanword into PII, with Indic reflexes with .

    *θaiwar- > /tʰiː/ ‘brother-in-law’.

    That, on the other hand, is from PIE *dh₂-, with a devoicing Kümmel documented ( > *th₂- > *tʰ- > *θ-).

    both Armenian and Semitic influence could have encouraged secondary introduction of aspirated stops.

    Sure, but hardly from voiceless fricatives, which they both possess in great abundance (including [θ] in Aramaic).

    Note also a further but welcome corollary: if we do go with thinking that RUKI in *buHš- has been triggered by a long-retained *H, then also *Hhauš- will have to be simplified to just *hauš-, indeed already to a pre-RUKI late PIE *sews- < early PIE *h₂sews-.

    Why “have to”? – But given that *HC- is reflected as different from *C- only in Greek and Anatolian (and Armenian and Phrygian, I think?), I see no problem with positing *HC- > *C- for some early branch like the last common ancestor of Indo-Iranian and Balto-Slavic (the branches that have the full RUKI… except apparently for Nuristani, which just has RIK as far as I can tell…).

    Yes, written Sami already existed 200 years ago, indeed since the mid-17th century.

    One of many things I’ve learned from this post.

  2. sansdomino says:

    I think the first work up to Uralicist standards of comprehensiveness is Kroonen’s etymological dictionary of Proto-Germanic

    The first Uralicist work anything like that standard though might be itself also as recent as Lehtiranta’s Common Sami Vocabulary from 1989. As published lexica goes, I’m thinking more of large dialect dictionaries, but with no reconstruction, filtering of loanwords, or anything else just yet. The first candidate for that might be Munkácsi’s Udmurt dictionary from 1896 (covering 4 main dialects); or more arguably Castrén’s pan-Samoyedic lexicon from 1855 (~15 varieties in rather variable detail); but starting to come out more regularly only from the mid-20th century, a first “batch” being Lagercranz 1939 on non-Russian Sami, Uotila/Wichmann 1942 on Komi, and Toivonen/Karjalainen 1948 on Khanty. Though, some of their collectors have put their base data to use much earlier, e.g. Karjalainen’s initial reconstruction of Proto-Khanty is from 1905.

    Sure, but hardly from voiceless fricatives

    By now I’d think more of (1) aspirates are introduced as loanword phonemes or just known by bilingualism, (2) this helps “activate” the option of mergers *x, *θ > kʰ, tʰ (cf. my previous post and discussion on it).

    Why “have to”?

    Since if *Hs > *Hš, surely this would work regardless of position (*ks- > *kš- does hold). Differences between laryngeals would be a theoretical option, but I’d expect *h₂ to most likely pattern with the velars even then. Though, hmm, I guess a third option would be vocalization to a schwa *ə₂ before loss (as the Greek / Armenian reflexes of *HC- suggest).

    Can’t find it right now, but IIRC I’ve seen an argument that lack-of-full-RUKI in Nuristani should be considered just reversion to *s, similar to Middle Indic.

    • David Marjanović says:

      if we do go with thinking that RUKI in *buHš- has been triggered by a long-retained *H, then also *Hhauš- will have to be simplified to just *hauš-

      Ich stehe auf der Leitung. There must be something obvious I’m not getting. If we assume *Hs > *Hš (and I agree that’s phonetically reasonable for *h₂ and likely *h₃), how does it follow that “*Hhauš- will have to be simplified to just *hauš-“, where the *u and not the *H is the trigger?

      Though, hmm, I guess a third option would be vocalization to a schwa *ə₂ before loss

      I prefer to follow Byrd and call it epenthesis of *ə, which in Indo-Iranian is pushed toward [ɨ] by the *e *o *a > [ɐ]-or-thereabouts merger and then merges into *i. But, sure, I’d expect [ɨ] to have a retracting effect just as well.

  3. The first Uralicist work anything like that standard though might be itself also as recent as Lehtiranta’s Common Sami Vocabulary from 1989.

    If anything can be a standard of comprehensiveness, it is Steinitz’s DEWOS, rather than YSS or Kroonen’s Germanic dictionary. It is not only earlier, but superior to Lehtiranta’s YSS in all respects save that it lacks reconstructed forms. It lists practically all reflexes of a given root, including derivatives, in all daughter varieties. All related words are always listed under the same entry. Borrowings from neighboring languages are correctly identified, I think, in 98% of cases. All meanings are accurately reproduced as they were given in primary sources. All words are accurately phonologized. Given that Khanty is a family of the same depth as Turkic or Slavic, this is a monumental achievement even without explicit reconstructions.

    • sansdomino says:

      Yeah, that’s picked specifically on covering all the bases including reconstruction, and really also the “anti-base” of being limited to “sufficiently inherited” vocabulary. By thoroughness DEWOS is clearly much ahead.

      The biggest single jump in thoroughness I think would be Lagercrantz’ Lappischer Wortschatz at 1200+ pages (even if Samic is a still more diverse language group and not comprehensively covered by this).

      Outside Uralic, Radloff’s Turkic dictionary seems like another important precedessor in this genre. Any others of the sort?

      • What distinguishes DEWOS from most similar works is not so much thoroughness in listing all attested words, but rather thoroughness of phonological and etymological analysis, much of which is not explicit (e.g., placing related words under the same entry requires a good understanding of derivation). This is not a trivial matter, as can be seen from a much more recent Historical Dictionary of Yukaghir, which, while dealing with just two languages, fails to properly cite meanings, abounds in mistranslations and misattributions (Kolyma words given as Tundra), etc.
        One more example of a very good work is Vera Cincius’ Comparative Tungusic Dictionary (Сравнительный словарь тунгусо-маньчжурских языков, usually abbreviated as ТМС, 1975-1977) in two volumes comprising circa 1300 pages. Like DEWOS, it lists all related words under the same entry, but does not provide reconstructions. As in DEWOS, here a good knowledge of sound correspondences is implemented in the dictionary, which is expected, since Cincius herself is the author of the first and most thorough phonological reconstruction of Proto-Tungusic (1949). Another point of similarity to DEWOS is that ТМС also lists isolated roots represented in just one language, thus allowing an easy search for substrate words. I would rank it second after DEWOS due to slightly less reliable etymological decisions and phonologization.

  4. Anthony Jakob says:

    Regarding RUKI, given that it applies after *r < *l in Indo-Iranian (Skt. karṣ-, Av. karš- < *kʷels-), it must have been active post-PIE, and if I’m reliably informed, still seems to be productive in early Vedic. Therefore, it’s not a problem to assume RUKI applied after secondary *ū, *ī (< *uH, iH). Interestingly, it has been proposed that RUKI was indeed blocked by a laryngeal in Nuristani (Hegedűs 2012, The RUKI-rule in Nuristani), although here the author seems content to assume a lot of 'sporadic' developments and the data does not look clear-cut at all.

    • sansdomino says:

      Patched! As usual in blog comment sections, < > not escaped to &lt; &gt; will risk getting parts of your message lost.

      I could see secondary RUKI in PII times, we know this is the case for *ps in Iranian anyway, late enough to include secondary *s from *ć. But, since *u(H)ć, *i(H)ć are not affected, late RUKI for *ūs *īs seems to create a major conflict in chronology, if we follow also the belief that *uH *iH > *ū *ī was late enough to be only partial in Eastern Iranian.

      I’ve also never understood why should *ls >> *rš require a particularly late date for RUKI. For one, this might as well be thru *lš, not just *rs (cf. also Fortunatov’s Law where *l specifically seems to trigger retroflexion). For two, there might not have been immediate *l > *r in all environments, but rather first e.g. in the coda, or in various consonant clusters. Identified retentions seem to be predominantly in a few specific positions like absolute word-initial, never general. For three, is there even any counterevidence against *ls > *lš already “originally”, so at minimum, also in Balto-Slavic? The only example of the change I’ve ever seen is this *kʷels-, and it is not continued there, or in Albanian etc.

      • sansdomino says:

        Further on point two re *ls: cf. also that we have II loanwords in Uralic evidencing both retention of *e but already *r > *l, and inversely, retention of *l but already *e > *a. Instead of assuming two dialects with different PII chronology, this too could be maybe evidence that *l > *r happened at different times in different positions.

        Another option yet might be that a first change was something like *l > *ɾ, i.e. already into a rhotic, which could be substituted by *r in Uralic; but a distinct one from old *r, so that the former could also secondarily develop back to /l/ in some varieties, but would be expected to mostly merge into /r/.

      • David Marjanović says:

        From a few minutes of googling, it seems that Fortunatov’s law is “controversial” in the sense that two famous figures of the 1890s said it’s wrong (i.e. most of Fortunatov’s examples are wrong, and the few valid ones are Prakritisms in Classical Sanskrit), so it was never mentioned again till the 1970s, and only once per decade in rather obscure papers since then.

        It’s in good company; Kluge’s law had a similar history until its spectacular revival 10 years ago.

        • sansdomino says:

          Also mainstream enough to have gotten an entire main chapter in Collinge 1985 (The Laws of Indo-European) with quite a bit of discussion; including apparently several counterarguments and some arguments in favor based on an assumption that the immediately feeding environment should have been *l+T. Collinge responds by pointing out a few sound laws usually only given in a telescoped form (Greek *kʷ > /t/, Indic *asD > /ēd/, Armenian *w > /g/) and that incredulity “seems rather naive”. The etymologies themselves I’m in no position to really defend or oppose though.

          • David Marjanović says:

            The Laws of Indo-European is definitely something I should read if I get a chance. It doesn’t seem to be lying around on teh intarwebz.

  5. Andreas Johansson says:

    If consonantal laryngeals have been hiding in plain sight in a language as well-known as New Persian, one wonders what else may be out there. Dragons?

    (I’m aware there may also be a consonantal laryngeal in English “quick”, but if so it’s hiding rather better than these would.)

    • sansdomino says:

      A major obstacle for this may have been the assumption that New Persian derives via Middle Persian from Old Persian, where these initial laryngeals are all lost — by current knowledge, they all instead go back to slightly different Proto-Persid dialects (e.g. in a paper just from last year, Agnes Korn shows that OP *Cy *Cw > Ciy Cuw did not happen in pre-MP).

