Due to a recent ZBB discussion I ended up re-reading Sergei Starostin’s A North Caucasian Etymological Dictionary Preface. This is one of the more worrisome cases of “Moscow School” phonological tarpits: there is no doubt about Northeast Caucasian being a valid family, and I would also think the relationship with Northwest Caucasian is sufficiently established… but the reconstruction the late Starostin advances for the family sure looks like it has too many bells and whistles, with features like six laryngeals that end up almost randomly reshuffled in the descendants, nearly all obstruents having a plain/geminate distinction orthogonal to phonation, or abundant *Cw clusters at all POAs other than labial. I count 132 basic sound correspondences plus some fifty-odd cluster correspondences. Even spread across two root consonant positions in 2300+ reconstructions, in a reconstruction scheme of this kind there are bound to be reflexes that aren’t actually well enough established.
Probably most fixes to this reconstruction would also have to be etymological. Likely there are correspondences representing areal loanwords rather than original inheritance, or correspondences used to stitch together unrelated vocabulary. Just checking for not-really-regular correspondences would be a good start though.
I’ve picked for a quick case study *pC clusters. These appear word-initially, supposedly evolving from certain *Cw clusters, in two far ends of the family: Nakh and Khinalug. The asserted sources are as follows:
- *ff > N *pχ, Kh. /px/
- *ćw > N *ps (Kh. /cʼ/)
- *św > N *ps (Kh. /s(w)/)
- *śśw > N *ps, Kh. /pš/
- *cw > Kh. /ps/ (N *c)
- *čw > Kh. /pš/ (N *č)
- *xxw > N *pχ
- *qw > N *pħ (Kh. /q/)
- *qqʼw, ɢɢw > N *pħ (Kh. /qʼ/)
- *χχw > N *pħ, Kh. /pχ/
Also a cluster *bʡ in Nakh has three origins asserted: *qʼw, *ʡw and *hw.
How many of these developments are actually regular once we look into it? Put in your bets now…
(1) Nakh *ps is found in five examples. Every single one of them has a different reconstruction! i.e. none of them can be considered regular. Besides the three expected cases of *ćw, *św, *śśw, there’s one of *cc’w (alleged regular Nakh reflex *t-) and one of *ćʼ with no labialization even (alleged regular Nakh reflex *cʼ). Tsk tsk tsk. For that matter, two cases have NWC cognates with a presyllable *pə-, supposedly a prefix. My bet would be that this is what really occurs in the Nakh examples too.
(2) A Nakh *pš turns out to exist in one example with *čʼw, whose regular Nakh reflex is allegedly plain *š-. (Maybe another likely prefix case?)
(3) Nakh *pχ is found in four examples; just one of *ff, so irregular in any case. There are no more than two initial and four medial instances of *ff reconstructed altogether. The other case of initial *ff- actually has a Nakh reflex too, but showing *ħ-! — The three cases of *xxw do not look that much better. NWC has *xw in two cases (and also for the *ff case), secondary *x́w in one, so this at least seems to work. Lak has one case of /xx/, one case of /xxw/ and one case of /šš/; the last supposedly by late palatalization from *xx … but, unfortunately, the one example of /xx/ occurs before /i/? Andic has one case of *xw, one case of *ɬw.
(4) Nakh *pħ rakes together a seemingly respectable 13 examples. But they diverge to nine reconstructions, of which most occur just once: *q *qw *qq *qqw *qʼw *χχw *pʼɦ. The last is a cluster type (obstruent + laryngeal) that seems to be relatively common in the proto-lexicon but is strangely not at all commented on in the Preface. As for the others, only the *qw and *χχw cases seem even expected. For the others the allegedly regular Nakh reflexes are *q > *q, *qq > *q/*ʁ, *qqw > *q/*ʁ, *qʼw > *bʢ. (There is one appeal to labiality metathesis: *qarćʼwV > *qwarćʼV before *qw > *pħ? But this is itself clearly ad hoc rather than regular.)
Our last hope for Nakh *pC are thus the clusters *ɢɢw, *qqʼw; the first represented by four examples (one of them with also a laryngeal: *ɢɢHw), the second by two examples (one of them with a laryngeal). Starting with *qqʼw, and skipping over subfamilies reflecting only one instance, in Tsezic we have one case of *qʼw and one of *qʼ; in Lezgic, one case of *qʼˤw and one of *qʼw (respectively). Inconsistent secondary articulations are not the most major problem maybe, but then the latter etymology additionally requires metathesis from *tʼHalqqʼwV to *qqʼHwaltʼV in Nakh. — Moving to *ɢɢw (when’s the last time you heard of a language that has geminate voiced uvular stops, incidentally?): Tsezic has one *q, one *qw; Dargwa has one *ʁˤw, one *ʁˤ and one *qqw; Lezgic has one *qqˤ, one *qqʼˤ, one *qqʼˤw. One case has a presyllable *mu-, and it would be possible to speculate that actually this is the real source of the Nakh cluster.
(5) Nakh *bʡ is found in also respectable eleven examples (plus one word-initial one). Three of them are from *ʡw, which ends up reflected reasonably regularly: the reflexes also include two cases of Andic *ħ and one of *ħw, two cases of Tsezic *ħ, three cases of Lak zero, two cases of Dargwa *ħ, two cases of Lezgic *ʔw. A small ray of hope, maybe…
Four cases from *qʼw (three of them with also a laryngeal: *qʼHw) look promising too. But the distribution of these etyma is terrible: only Lak and NWC also reflect more than one of them. The former has one case of *w, one case of *qʼ; the latter has in both cases *qʼ, though the second one with a presyllable *p-, again casting doubt on analyzing Nakh *b as continuing *w.
In the waste pile of protoforms attested only once, we have *ʔw, *hw, *ɦw, *bɦ (with the *hw case showing a presyllable *ba- in NWC).
(6) A Nakh *bʕ appears too. One supposedly from PNC *wH, another two from PNC *bʕ (of which one case “with some metatheses and aberrations“). The latter two do have *pp in Lezgian.
(7) Khinalug /ps/ is found in two examples, one of them indeed *cw and the other *čw. For *cwaʡmV ‘bear’, NWC adds (is supposed to metathesize) a presyllable *mə-; maybe this is once again what’s really going on.
(8) Khinalug /pš/ is found in four examples, going back to *śśw twice, *čw once and also *chw once (I think that’s an alveolar affricate + laryngeal sequence?). Lak has /š/ in both cases of *śśw; NWC has a presyllable *pə- in one of them.
(9) Khinalug /px/ is attested just once; enough said.
(10) Khinalug /pχ/ is attested once word-initially from *χχw as promised, also once word-medially from a sequence *-waχχ-.
So the basic toll is: the Nakh *pC clusters regularly correspond to nothing whatsoever across Northeast Caucasian. Only three of the eight alleged regular sources are actually regular even from PNC to Nakh (“soundlawful regularity“, one of the weakest types). For *bʕ we can find a weak two-example correspondence with Lezgian *pp, for *bʡ one just barely more substantially regular set of correspondences. Khinalug /pš/ finds one two-example correspondence with Lak /š/.
This survey does not fill me with hope for either the current proposals being correct or for the ability to find new, stronger phonological solutions with future work. Probably this is bound to happen to some extent in comparative work between languages with highly complex phonologies. I however wonder now just how much else does this result apply to.