Nonregularity in North Caucasian

Due to a recent ZBB discussion I ended up re-reading Sergei Starostin’s A North Caucasian Etymological Dictionary Preface. This is one of the more worrisome cases of “Moscow School” phonological tarpits: there is no doubt about Northeast Caucasian being a valid family, and I would also think the relationship with Northwest Caucasian is sufficiently established… but the reconstruction the late Starostin advances for the family sure looks like it has too many bells and whistles, with features like six laryngeals that end up almost randomly reshuffled in the descendants, nearly all obstruents having a plain/geminate distinction orthogonal to phonation, or abundant *Cw clusters at all POAs other than labial. I count 132 basic sound correspondences plus some fifty-odd cluster correspondences. Even spread across two root consonant positions in 2300+ reconstructions, in a reconstruction scheme of this kind there are bound to be reflexes that aren’t actually well enough established.

Probably most fixes to this reconstruction would also have to be etymological. Likely there are correspondences representing areal loanwords rather than original inheritance, or correspondences used to stitch together unrelated vocabulary. Just checking for not-really-regular correspondences would be a good start though.

I’ve picked for a quick case study *pC clusters. These appear word-initially, supposedly evolving from certain *Cw clusters, in two far ends of the family: Nakh and Khinalug. The asserted sources are as follows:

  • *ff > N *pχ, Kh. /px/
  • *ćw > N *ps (Kh. /cʼ/)
  • *św > N *ps (Kh. /s(w)/)
  • *śśw > N *ps, Kh. /pš/
  • *cw > Kh. /ps/ (N *c)
  • *čw > Kh. /pš/ (N *č)
  • *xxw > N *pχ
  • *qw > N *pħ (Kh. /q/)
  • *qqʼw, ɢɢw > N *pħ (Kh. /qʼ/)
  • *χχw > N *pħ, Kh. /pχ/

Also a cluster *bʡ in Nakh has three origins asserted: *qʼw, *ʡw and *hw.

How many of these developments are actually regular once we look into it? Put in your bets now…

(1) Nakh *ps is found in five examples. Every single one of them has a different reconstruction! i.e. none of them can be considered regular. Besides the three expected cases of *ćw, *św, *śśw, there’s one of *cc’w (alleged regular Nakh reflex *t-) and one of *ćʼ with no labialization even (alleged regular Nakh reflex *cʼ). Tsk tsk tsk. For that matter, two cases have NWC cognates with a presyllable *pə-, supposedly a prefix. My bet would be that this is what really occurs in the Nakh examples too.

(2) A Nakh *pš turns out to exist in one example with *čʼw, whose regular Nakh reflex is allegedly plain *š-. (Maybe another likely prefix case?)

(3) Nakh *pχ is found in four examples; just one of *ff, so irregular in any case. There are no more than two initial and four medial instances of *ff reconstructed altogether. The other case of initial *ff- actually has a Nakh reflex too, but showing *ħ-! — The three cases of *xxw do not look that much better. NWC has *xw in two cases (and also for the *ff case), secondary *x́w in one, so this at least seems to work. Lak has one case of /xx/, one case of /xxw/ and one case of /šš/; the last supposedly by late palatalization from *xx … but, unfortunately, the one example of /xx/ occurs before /i/? Andic has one case of *xw, one case of *ɬw.

(4) Nakh *pħ rakes together a seemingly respectable 13 examples. But they diverge to nine reconstructions, of which most occur just once: *q *qw *qq *qqw *qʼw *χχw *pʼɦ. The last is a cluster type (obstruent + laryngeal) that seems to be relatively common in the proto-lexicon but is strangely not at all commented on in the Preface. As for the others, only the *qw and *χχw cases seem even expected. For the others the allegedly regular Nakh reflexes are *q > *q, *qq > *q/*ʁ,  *qqw > *q/*ʁ, *qʼw > *bʢ. (There is one appeal to labiality metathesis: *qarćʼwV > *qwarćʼV before *qw > *pħ? But this is itself clearly ad hoc rather than regular.)

Our last hope for Nakh *pC are thus the clusters *ɢɢw, *qqʼw; the first represented by four examples (one of them with also a laryngeal: *ɢɢHw), the second by two examples (one of them with a laryngeal). Starting with *qqʼw, and skipping over subfamilies reflecting only one instance, in Tsezic we have one case of *qʼw and one of *qʼ; in Lezgic, one case of *qʼˤw and one of *qʼw (respectively). Inconsistent secondary articulations are not the most major problem maybe, but then the latter etymology additionally requires metathesis from *tʼHalqqʼwV to *qqʼHwaltʼV in Nakh. — Moving to *ɢɢw (when’s the last time you heard of a language that has geminate voiced uvular stops, incidentally?): Tsezic has one *q, one *qw; Dargwa has one *ʁˤw, one *ʁˤ and one *qqw; Lezgic has one *qqˤ, one *qqʼˤ, one *qqʼˤw. One case has a presyllable *mu-, and it would be possible to speculate that actually this is the real source of the Nakh cluster.

(5) Nakh *bʡ is found in also respectable eleven examples (plus one word-initial one). Three of them are from *ʡw, which ends up reflected reasonably regularly: the reflexes also include two cases of Andic *ħ and one of *ħw, two cases of Tsezic *ħ, three cases of Lak zero, two cases of Dargwa *ħ, two cases of Lezgic *ʔw. A small ray of hope, maybe…

Four cases from *qʼw (three of them with also a laryngeal: *qʼHw) look promising too. But the distribution of these etyma is terrible: only Lak and NWC also reflect more than one of them. The former has one case of *w, one case of *qʼ; the latter has in both cases *qʼ, though the second one with a presyllable *p-, again casting doubt on analyzing Nakh *b as continuing *w.

In the waste pile of protoforms attested only once, we have *ʔw, *hw, *ɦw, *bɦ (with the *hw case showing a presyllable *ba- in NWC).

(6) A Nakh *bʕ appears too. One supposedly from PNC *wH, another two from PNC *bʕ (of which one case “with some metatheses and aberrations“). The latter two do have *pp in Lezgian.

(7) Khinalug /ps/ is found in two examples, one of them indeed *cw and the other *čw. For *cwaʡmV ‘bear’, NWC adds (is supposed to metathesize) a presyllable *mə-; maybe this is once again what’s really going on.

(8) Khinalug /pš/ is found in four examples, going back to *śśw twice, *čw once and also *chw once (I think that’s an alveolar affricate + laryngeal sequence?). Lak has /š/ in both cases of *śśw; NWC has a presyllable *pə- in one of them.

(9) Khinalug /px/ is attested just once; enough said.

(10) Khinalug /pχ/ is attested once word-initially from *χχw as promised, also once word-medially from a sequence *-waχχ-.

So the basic toll is: the Nakh *pC clusters regularly correspond to nothing whatsoever across Northeast Caucasian. Only three of the eight alleged regular sources are actually regular even from PNC to Nakh (“soundlawful regularity“, one of the weakest types). For *bʕ we can find a weak two-example correspondence with Lezgian *pp, for *bʡ one just barely more substantially regular set of correspondences. Khinalug /pš/ finds one two-example correspondence with Lak /š/.

This survey does not fill me with hope for either the current proposals being correct or for the ability to find new, stronger phonological solutions with future work. Probably this is bound to happen to some extent in comparative work between languages with highly complex phonologies. I however wonder now just how much else does this result apply to.

20 comments on “Nonregularity in North Caucasian
  1. Blasius B. Blasebalg says:

    “the relationship [of Northeast Caucasian] with Northwest Caucasian is sufficiently established”

    Huh? As far as I can see, that relationship was basically Starostin’s pet phantom …
    All arguments to that end seem to stem from his extended entourage.

    What strikes me most about this alleged connection is that the two families are geographically close, with communication/migration/travelling obstacles (i.e. mountain ridges) much easier to master than those within either family. So if they are ultimately related, and if both families have been present there for all of their existence (as most linguists suggest), why haven’t they formed a dialect continuum? To my knowledge there is no feature that somehow mimicks a transition from NE to NW or vice versa. This would be weird given the two hypotheses.

    But I don’t only disbelieve the “North Caucasian” macro-family; I am also hardly convinced that both families should have been stayed in place for all of their existence. To be clear, it is not impossible that either or both families still sit in their original homeland. However, it seems that some scholars prefer to fix these families in place just as a simplification, to facilitate statements about other groups.

    But given the geographic situation, isn’t it more plausible to assume for each of the three Caucasian families that it has been “pushed” against the mountains from an originally larger and more distant homeland? When other groups expanded in the surrounding areas, “Caucasians” may have retreated to a less accessible space. That doesn’t imply that the homeland must have been very far away, but the “center of gravity” may very well have been outside the reach of the mountain landscape. Taking this into consideration, the respective homelands of NE and NW Caucasian may have been at a appreciable distance.

    That consideration applies to each of the Caucasian families separately. The insight that one of them has or has not originated outside the vicinity of the Caucasus does not have any direct implications on the analogous questions for the other two.

    • David Marjanović says:

      Even Johanna Nichols has accepted Caucasian for over 10 years now. She used to not even accept East Caucasian, preferring to keep Nakh and Daghestanian separate.

      The NCED interestingly finds that Proto-WC can be almost entirely derived from its Proto-EC reconstruction, so that PC and PEC are almost identical. On top of that, it does not find a Daghestanian branch, nesting Nakh inside it instead.

      why haven’t they formed a dialect continuum?

      Leave a dialect continuum in place for four thousand years (or seven thousand), and it’ll fragment, especially in a fragmented landscape. Supposedly there are already mutually unintelligible Tyrolean dialects in neighboring Alpine valleys. Even the transition zone between Lake Constance Alemannic and Tyrolean is just the Lech valley.

      And that’s before we get to the fact that what used to be the contact zone between WC and EC has been the contact zone between WC and Ossetic for at least a thousand years now. Several Nakh languages are historically known to have vanished in Ossetia.

      • j. says:

        I do not know if there is reason to even expect to find a “Dagestanian” branch, any more than there is reason to expect to find an “European” branch of Indo-European. It seems to be defined just by the lack of characteristic Nakh features, e.g. they as a rule tend to have a large variety of labialized consonants, but this appears to be a general North Caucasian archaism, not an innovation (once again showing that typology is not really rootable). NWC being simply one branch that has gotten typologically remodelled does not sound unthinkable either. Though that scheme would surely raise a question: what did it get remodelled after? Indo-European? (Which then gets into chicken-and-egg territory with the Caucasian substrate hypothesis for IE itself.)

        A few features are actually apparent in Starostin’s reconstruction where Nakh seems to occupy the geographically expected bridging position between NWC and Dagestanian branches. One case are the lateral fricatives and velar fricatives, which are distinguished in NWC and Nakh, but coincide everywhere in Dagestanian; yet which they’re reflected as varies by sub-branch. (In this light also Archi’s lateral fricatives by now turning out to be velar /ʟ̝̊/ rather than alveolar /ɬ/ makes excellent sense.)

    • j. says:

      For NW Caucasian formation in place in the Caucasus seems likely, but presumably its branches have moved around a bit (already the setup with crossing geographic distributions of Abkhaz-Abaza and Circassian suggests as much), possibly with some branches or languages dying off entirely before being recorded. This can have also repeated earlier too. I don’t see a lack of a dialect continuum / linkage between NWC and NEC as any more mysterious than the lack of a dialect continuum between Albanian and Slavic (in which case we know that Dacian, Thracian, Phrygian, Scythian etc. have died off from between them), between Mari and Permic, between Aramean and Akkadian, between Semitic and Cushitic, etc.

      I would agree though that it seems possible that at least some NE Caucasian subgroups are “relatively recent” pre-IE arrivals into the Caucasus (Nakh in the north and Lezgian in the south seem like the prime candidates); and that such a scenario would then suggest that the others are arrivals at some earlier time as well. As I’ve mentioned in the referred ZBB discussion, the presence of NEC languages in the Eurasian steppes before Indo-European steamrollered over would even be a plausible alternative explanation for occasional Caucasian-like words in Yeniseian and Burushaski, which as you may know, Starostin & co. instead see as evidence for a Dene-Caucasian superfamily. (Who knows, perhaps this would also work for some of the parallels of Caucasian languages with Uralic, scouted out by Munkácsi long ago and mostly forgotten since then.)

      • David Marjanović says:

        That should definitely be explored. Recently I read a scenario that involved people with Caucasian Hunter-Gatherer ancestry expanding way north along the Caspian seashore to explain why they make up 50% of the ancestry of the people buried by the Pit Grave (Yamnaya) Culture. I’ll try to look for the reference this weekend.

        • David Marjanović says:

          There it is, by none less than David W. Anthony.

          Last paragraph:

          “Of course another, final, possibility, consistent with the archaeological and genetic evidence presented here, is that there were two phases of interference from Caucasian languages in two periods. The first, perhaps responsible for some of the basic morphological and phonological traits Bomhard detected, could have occurred in the fifth millennium BC and involved very archaic eastern Caucasian languages that had moved to the lower Volga steppes with CHG people, where they intermarried with Samara-based EHG pre-Uralic people to create early PIE and the Khvalynsk culture and a new EHG/CHG genetic admixture; and the second phase, which left a Northwest Caucasian imprint over late PIE, perhaps more superficial (lexical) than the earlier interference, could have been during the Maikop period, but without a major genetic exchange between Maikop and Yamnaya.”

  2. David Marjanović says:

    It’s a bit unfair to base conclusions on Khinalug, because here’s what the book has to say about it on p. 217:

    1.9. Khinalug.

    In spite of the fact that this language is often included in the Lezghian group (see, e.g., [Talibov 1980]), there are no serious reasons for this; the impression that Khinalug is especially close to Lezghian languages arises apparently because of a rather large number of loanwords from the neighbouring Kryz and Budukh languages (probably from Proto-Shakhdagh as well). Multiple specific phonetic and lexical features of Khinalug (on the development of Khinalug phonemes from PEC, see above) clearly distinguish it from Lezghian languages, as a separate branch of East Caucasian. In general there is less data on Khinalug than on other North Caucasian languages (in fact only comparatively small lexical lists, given in the works [Kibrik-Kodzasov-Olovyannikova 1972, Kibrik-Kodzasov 1988, 1990]. Therefore, many specific features of Khinalug reflexation are still unclear: there are many gaps in the reflexes of PEC consonants, uncertainties in establishing the behaviour of vowels, the Khinalug reflexation of the verbal root is completely unexamined, the Khinalug prosodic system has not been described. All these problems still expect their investigation.

    Similar things hold for languages with clearer phylogenetic position. A pretty short look at what’s been published on Archi since 1994 (accessible from Wikipedia) shows that the data Nikolayev & Starostin had available was woefully inadequate. And beyond that, N & S knew that many Caucasian languages have tones, but the tones were simply not widely enough described to use; given that tones are known to interact with consonants and vowels in many languages outside the Caucasus, this gap in the data must have introduced a heaping helping of mistakes. And so on and so forth.

    The whole book should be redone from scratch, and not the same scratch that was available in 1982 (or 1994).

    • j. says:

      Tones are known to be affected by consonants, but I know of zero examples where tone affects consonants back. A knowledge of tone could of course make for additional reasons to e.g. analyze various correspondences as loanwords. As for vowels, per what I know of Uralic, they of course must be way more susceptible to error anyway. Most individual languages have simple systems, but they end up setting up e.g. for Tsezic a system with nine vowels × one undefined possibly-prosodic vowel suprasegmental contrast, kind of reminding me of e.g. the known troubles with Permic.

      A re-do “from scratch” probably would still not affect many bits of the NCED such as branch reconstructions or many of the etymologies. It’s not as if the book was itself set up from scratch either. Detailed studies on specific etymological-comparative issues would seem to be more useful additions at this point than another attempt to directly cover the entire family (with or without NWC).

      • David Marjanović says:

        I know of zero examples where tone affects consonants back

        The voiced plosives of Middle Chinese became plain voiceless in Mandarin in most tones, but aspirated in another.

        (It’s also possible that phenomena like Verner’s law have something to do with phonetic pitch despite the absence of phonemic tone. Now that I think of it, the Moscow School posits Verner-like phenomena to derive consonant splits in Mongolic and Japonic from a Proto-Altaic phonemic tone.)

  3. Blasius B. Blasebalg says:

    Points taken on my dialect continuum argument.

    If NE and NW Caucausian are in a situation like Cushitic and Semitic, where we know that the existing contact zones are rather young, just without any analogue of the early attestation of Semitic languages – that is something that doesn’t seem completely out of the question.

    Then again, I stumbled across the statement that the Proto-languages are almost the same(!). This is something that seems typical of long-range comparisons (which in this case is not in terms of kilometers, but beyond established families). Is it realistic that hardly any sound changes occurred between two groups that have been recognized as families long ago, but whose connection is a rather recent hypothesis? If the phonetic situation is so close, why has nobody noticed earlier?

    Or is it more likely that long-range comparisons tend to look for look-alikes, and this process then translates into no or only trivial sound changes?

    • j. says:

      It’s Starostin’s PNEC and PNC that appear to be almost the same, but NWC and the NEC components are all fairly well removed from this reconstruction. E.g. the (alleged) cognates of Nakh *pχi and Khinalug /pxu/ ‘five’ are Avar /ššúgo/, Andic *ʔinšštu-, Tsezic *ɬɬinɔ, Lak /χχul-/, Dargwa *xu-, Lezgic *ɬɬwe-, NWC *sxʷə — some basic family resemblance (an initial tense fricative in the lateral/velar region plus a labial component), but almost no trivial identities. You probably would not guess offhand what Starostin reconstructs for this.

      There are two main explanations for this situation that I can imagine (not mutually exclusive):
      (1) North Caucasian is Northeast Caucasian, and NWC is just a particularly divergent daughter;
      (2) the PNEC reconstruction is overengineered, and this allows NWC to be derived from it in a way that does not capture the real historical phonological development involved.

      For example, under (2), instead of treating NWC *s- in ‘five’ as a prefix of unclear origin, we could give it more evidential weight and opt to assume that a PNC *sxʷ cluster assimilates to PNEC or maybe just East Daghestanian *xxʷ.

      • David Marjanović says:

        I should have mentioned that the changes from PC to PWC are mostly mergers as far as I remember (apart from the two consonant splits caused by the vowel mergers), so we have here a loss of information that might be giving us long-branch attraction.

        Similarly, Proto-Eskimo can be almost entirely derived from Moscow School Proto-Altaic (there’s a paper on this somewhere out there, I don’t have time to look for it right now), and again most of the changes are mergers.

  4. David Marjanović says:

    I overlooked this:

    (8) Khinalug /pš/ is found in four examples, going back to *śśw twice, *čw once and also *chw once (I think that’s an alveolar affricate + laryngeal sequence?).

    Worse – it’s a sequence of an aspirated alveolar affricate + [h]. I suspect some kind of typo in the original.

    • j. says:

      NCED does not treat aspiration as phonemic.

      Not that this is a trivial decision. For several languages it might make more sense to treat aspiration as phonemic and “tenseness” as a surface feature of plain voiceless consonants. Tense > voiced is a very common shift in the descendants, something we would definitely not expect of geminated consonants.

      • David Marjanović says:

        does not treat aspiration as phonemic

        Of course, but distinguishing /t͡s/ from /t͡sh/ becomes a lot harder if /t͡s/ is phonetically aspirated, whether by default or as a phonemic feature.

        (It’s not impossible, contrary to Canepari, but it’s hard to maintain.)

        • j. says:

          Really most of the PNC *CH- clusters only seem to be reconstructed to explain pharyngealized vowels in descendants, leaving no segmental reflexes: e.g. PNC *ssHwintʼV > Lezgian *sswiˤntʼ ‘snot’, PNC *kHwanššV > Dargwa *kʷaˤš ‘foot/leg’, PNC *ćʼćʼHwildV > Lak /čuˤj/ ‘tower’, Dargwa *ceˤtta ‘tombstone’ (but Lezgian nonpharyngealized *cʼwirtt ‘tower, stone pillar’). This would allow some degree of shuffling around where the laryngeal feature really was realized.

          In this example though, Khinalug /pšlä/ ‘fox’ further corresponds with Chechen /cħōgal/ ‘jackal’ (reconstructed as Proto-Nakh *cɦōkʼal on account of plain /c-/ in Ingush and Bats) — and they also suggest possibly original trisyllabic stem structure as *cVhwōle, with Semitic *čuʕāl ‘fox’ given as a possible external areal parallel (this *č would be standard *θ I think).

          • Crom Daba says:

            C’mon! This is obviously simply Turkish çakal.

            • j. says:

              Probably related as well, but I’d imagine connected at most in Proto-Indo-Iranian times (? *ćr̥ga- > Sanskrit /śṛgāla/ → Persian /šagāl/ → Turkic etc.). It also kind of looks like there’s a correspondence here of IE *r with the “intrusive” *ɦ in Nakh. I wonder if there are more examples of that.

              The NEC etymon is in fact nicely divisible into ‘jackal’ in Nakh (also in NWC, but attestation there is not very substantive) versus ‘fox’ in Dagestanian branches. The comment that *cɦōkʼal would be from a metathesized diminutive *chwole-kʼV is not very compelling. But then we don’t actually have a PIE etymon to retreat to either… so regardless of if the general NEC comparison works, this could still be a pre-Nakh loanword in II.

  5. Crom Daba says:

    I don’t see why assume that labialization is really that archaic in Caucasian languages. The system of secondary articulation in NW Caucasian is so symmetric and complex that it’s hard for me to imagine that it could be that old. Feature spread from vowels (as suggested by Starostin previously), perhaps even a parallel on in different branches seems very probable.

    The NE Caucasian correspondence of Nakh pC : Others Cw looks like an obvious case of the ancestor language having various onset clusters (or these arising secondarily).

    • j. says:

      All NEC branches other than Nakh have pretty sizable inventories of labialized consonants, more than seems likely to have happened by accidental parallel development. At least the labiovelar and labiouvular correspondences look mostly robust. The labiolaterals and labiopostalveolars are also supposed to have direct evidence from multiple branches; but it would be worth checking how well their correspondences really bear out. From there on it gets more spotty though.

      Altogether I’d expect the real picture to be similar to something like the spread of palatalization across Uralic — sometimes it’s maintained pretty consistently (Permic), sometimes reshuffled a little bit (e.g. Hungarian, Mordvinic), sometimes dropped out categorically (Finnish), sometimes expanded massively (Nenets), sometimes dropped out then reintroduced (Veps), sometimes expanded then dropped (Enets)… I’d think there probably has been both some original labialization, and some parallel labialization changes that would deserve closer investigation. Just an analysis of each larger subfamily (Andic, Tsezic, Lezgic) could be able to suggest more really.

