Sometimes I feel I’d like to see an anti-etymological dictionary.

Given two or more different etymological dictionaries, especially for an entire group of languages, typically one of them (usually from the older end) is going to end up being less critical, while another one (usually from the newer end) is going to end up being more critical. If we want to know what is known so far about a word’s etymology (cognates, reconstruction, etc.), we’d look in the more modern dictionary, of course. But if we want to know what is not known about a word’s etymology — i.e. what research questions are still open? neither of these sources is really going to work. What’s needed for this is, at a pinch, the difference between them.

Sometimes older separate etymological groups get combined into a single one, and sometimes older single etymological groups will turn out to comprise unrelated words and will be disassembled into various different ones (maybe under different native roots, but maybe also as loans or derivatives). This is all no major problem so far, especially if newer research will bother to mention that earlier, zoop was considered cognate with foop, but per current understanding it is actually cognate with doop.

But etymologies can also simply vanish from the literature record without comment, or with minimal comments along the lines of “strike this” (this latter type I’ve seen in erranda or in “update notes” to new editions). This I find unsatisfying. Even when an explicit reason has been given (”the correspondence z ~ f is irregular”), if this merely renders the compared words without etymology, then we are again back to square one on what the words’ origin actually is. Or, for that matter, on why the earlier observed similarity exists at all?

It is possible for similarity to exist for reasons other than by proper common inheritance or pure random chance: loans between related languages, loans in parallel from a third source, common inherited morphology applied to different roots, contamination between semantically nearby words, universal onomatopoetic patterns… Traditional etymological dictionaries I’ve only seen commonly apply the last of these with any consistency. The first is usually invoked only in cases of obvious, long since established layers of loanwords (in Uralic context e.g. Finnic → Samic, Komi → Ob-Ugric). The second thru the fourth are rarely explored at all.

So I would hope for truly thorough etymological dictionaries to also include a discard pile of words and comparisons from earlier literature that remain without an adequate explanation, something which would definitely make future etymologists’ work slightly easier.

I am currently doing some “antietymological” groundwork myself: charting how much content there is in Collinder’s Fenno-Ugric Vocabulary that is not reproduced also by later sources (mainly the UEW on one hand, Janhunen’s Samojedischer Wortschatz on the other). It is not a lot, and most of the omissions are clearly dregs, but some small part of the material remains interesting. It is even possible to find examples that have later reappeared again: one is the comparison of Mari *lüðä- ‘to fear’ with Samoyedic *lër(ə)- ‘to be afraid’, rediscovered by Ante Aikio in his paper on new Mari etymologies from a few some years back.

A much bigger amount of work, however, would entail somehow bridging the still largely aligned FUV and UEW etymological corpora with the more heavily pruned ones in Janhunen 1981 and Sammallahti 1988. For most of the comparisons rejected by the latter two authors as insufficiently regular, this has been done quietly, without any arguments given at all. This may very well have allowed in increases in historical phonology, but at the cost of what seems like a hefty step back in how much we can claim to know about Uralic etymology.

Even further observations could be perhaps made by taking a look at even earlier etymological compendia: Budenz’ Magyar–ugor összehasonlitó szótár (1873–1881), Donner’s Vergleichendes Wörterbuch der finnisch-ugrischen Sprachen (1874–1888), as well as the extensive material quoted in the major historical phonology overviews that followed in their wake, such as Paasonen’s “Beiträge zur finnischugrisch-samojedischen Lautgeschichte” (1913). I again know of some recently rediscovered etymologies that have first been suggested already around this time or even earlier. Especially the first two include etymological comparisons still more boldly than FUV and UEW though (which were at least constrained by mainly compiling etymologies from already published literature), so the junk to real forgotten goodies ratio would surely be still lower.

There’s also another sense in which “anti-etymologies” could be compiled from this period, however. This far back it is not difficult at all to find comparisons that have been rendered firmly obsolete by now, not just left into a limbo of “irregularity”. These might be illustrative in showing how has etymological progress been achieved over the last 100+ years. Have they been superceded by new native comparisons enabled by new data? by loanword etymologies? by new morphological analyses? something else? … and the results of such a survey could perhaps be then used as a roadmap for future research as well, to work out what’s likely and what’s not likely to continue to provide new results.

Phonology squib: ‘Clay’ in Proto-Uralic

I have a principle that applies quite often when working with quantity-over-quality mass comparative dictionaries (papers, databases, etc.): what is asserted without evidence can be dismissed without evidence.

The UEW is, unfortunately, a repeat offender on assertions without evidence. This comes up maybe the most with its own reconstructions, which do not seem to follow any definite scheme: there definitely isn’t one expounded on anywhere in the book, and to my knowledge none of the editors have published detailed papers on the topic, either. [1] This results in many junk reconstructions that seem to have only been hastily eyeballed together, sometimes with crass errors.

To avoid excess alarmism though: by “its own reconstructions”, I mean only a subset of the Proto-Uralic (Proto-Finno-Ugric, -Permic, etc.) reconstructions presented, those that seem to have been put together for the first time by the UEW team. Many of the reconstructions are however not all-new, and have been inherited from earlier research. Maybe the most direct source is Collinder’s Comparative Grammar [2], but various bits also trace back to earlier studies on historical phonology, such as Itkonen’s comparative vocalism surveys, or Paasonen and Setälä’s early 1900s Neogrammarian works that mainly involved consonantism, or even the 1800s comparative dictionaries of Budenz and Donner. Alas, none of this is explicitly referenced, and so the reader is left in the dark. Determining what, if anything at all, some particular reconstruction is based on would take a wild goose chase through the un-annotated list of literature found at the end of each entry.

(For non-specialists in Uralic reconstruction, as a quick rule of thumb I would say: any reconstruction with cognates in Finnish + at least two other Uralic subgroups can be treated as relatively safe; so can all remaining reconstructions that are continued in 6+ subgroups, which are usually given in bold; anything continued more narrowly is in principle suspect; anything prefixed with a question mark should be treated as unreliable entirely.)

Even if many of the UEW’s reconstructions are junk, this does not however imply that the etymological comparisons they are attached to would also be. Sometimes it will be fairly easy to work out a better reconstruction. Today I have taken a look at a word for ‘clay’ that the UEW reconstructs as *śojwa, and noticed that this seems to not match any of the descendants given…

Not absolutely everything is wrong, of course. The consonant skeleton *ś-jw- works well enough: we have entirely regularly Samic /č-/ ~ Permic /ś-/ ~ Samoyedic /s-/, and S /-jv-/ ~ P /-j-/ ~ Smy ∅ is reasonable. But the vowel reconstruction *o-a seems to be not really defensible.

  • In Samic, we have reflexes only in Kola Sami: Kildin /čuwwj/ (though apparently чуййв in the written language), Ter /čujjvɛ/. These nominally suggest Proto-Samic *čujvē — but, from earlier *śojwa, we would instead expect to see PS *čoajwē > Kola **čuəjjve. Compare PU *ojwa ‘head’ > PS *oajvē > Kildin /vuəjjv/ вуэййв, Ter /vɨəjjvɛ/.
  • In Permic, we have *o > Komi /o/ ~ Udmurt /u/. This is not a regular reflex of *o: it instead usually continues PU *a or *e. There are various other claimed cases of *o > *o (at least *kojə-ma > *kom ‘male’ — the source of the ethnonym Komi — seems unassailable, even if still possibly irregular), but normally we would expect *o-a to give *u.
  • The Samoyedic examples are a bit hard to assess offhand: we have reflexes only from Selkup and Kamassian, and so Janhunen’s Samojedischer Wortschatz leaves this word unconsidered. /üü/ in the former can go back to various pseudo-diphthongs; including *åj (*såjtə- > /süütɨ-/ ‘to sew’), *oj (*tojmå > /tüüm(ɨ)/ ‘larch’), *uj (*jujtə- > /küütäptɨ-/ ‘to dream’), *əj (*pəj > /püü/ ‘stone’), even *äj (*päjwä > /püü/ ‘warm(th)’). Kamassian /e/ does not seem to match any of these on a quick checkup, but there are probably various conditional developments involved that blur the picture. PU *o-a regularly gives PSmy *å-(å), so maybe the first is what we should bank on… However, in an *A-stem, *jw would be expected to remain in PSmy; and result then in *ľć in Selkup. [3]

The Kola Sami ~ Permic vowel correspondence can be however quite well derived from *a-a; developing to *ō-ē in Proto-Samic. This normally later gives /uu/ in Kildin, /ɨɨ-ɛ/ in Ter, but presumably (see below) earlier *uu was shortened here to /u/ before it could unround in the latter. *a-a also gives Samoyedic *å(-V), i.e. works at least as well as reconstructing *o-a.

I would also favor reconstructing medially *-wj- instead of *-jw-. UEW, I imagine, bases the latter on Ter Sami; however this is actually non-diagnostic, since in the language, there is regular metathesis of PS *-vj- to *-jv-. The Kildin form should be therefore instead taken as evidence for *-wj-. (In literary Kildin Sami, it seems that Ter-esque -ййв- is preferred in place of *-vj-, e.g. тоаййв ‘often’, while T. I. Itkonen’s Koltan- ja kuolanlapin sanakirja gives /tɑwwj/. Does this maybe stand for dialect variation within the language?) This in mind, the ad hoc-sounding shortening (*a > *ō >) *uu > *u also makes decent phonetic sense: we’d be dealing with [uːw] > [uw], a contrast that seems difficult if not impossible to maintain.

I believe no exact precedents are known for the development of *-wj- in Permic, but in general *-w- is lost always, while *-j- remains at least in various clusters; so *-wj- > *-j- seems about as good as could be expected. As for Samoyedic, *-w- is lost syllable-finally: this means we’d expect *śawja > *såj(V), which is at least a decent contender for the Selkup-Kamassian preform. (Preferrably not *såjå; contrast *kåjå > Kamassian /kuja/ ‘sun’. *-a > *ə is however quite common in Samoyedic, maybe in particular after (original?) consonant clusters.)

Altogether, I end up with the conclusion that all words given by UEW under *śojwa are better considered to continue Proto-Uralic *śawja.

These adjustments also open some new vistas. They allow the possibility to consider that my new and updated reconstruction might be a part of the same original root its established synonym: *śawə (UEW: *śawe). This is continued directly only in Finnic (*savi > Fi. savi etc.), but also in various derivatives: *śawə-nV in Mordvinic (*śovəń > Erzya & Moksha сёвонь) [4], Mari (шун) and Komi (сюн); *śaw(ə)-d₂V in Mansi (*suwľ(V) > Northern сӯли) and Khanty (*sawəj > *sawïï) [5]. It seems therefore likely that also the *śawja group is similarly originally a derivative *śaw(ə)-ja. The exact morphology going on remains however mysterious. *-nV is only known as a vague diminutive suffix; *-ja usually forms action nouns; *-d₂V is, to my knowledge, not reconstructible for Proto-Uralic at all (there may be one other parallel within Ob-Ugric though: *ńooɣəď ‘meat’, maybe *ńaKV-d₂a).

It would be also possible to shuffle the *-ja and *-d₂V groups around a bit: *-j in Khanty and Samoyedic can continue either just as well. At least the Mansi form with *ľ and the Samic & Permic forms with *j however must be distinct from each other.

[1] Editor-in-chief Rédei has arguably taken some steps towards this in his 1968 article “A permi nyelvek első szótagi magánhangzóinak a történetéhez” (NyK 70: 35–45). His “pre-Permic” vowel system does end up being identical to the Proto-Uralic vowel system that is currently accepted the most widely, but this may be just a happy accident: he makes no effort at all on the issues of if and how the other Uralic languages could be derived from the same system; and his treatment of which particular original vowel should be assumed in which particular words is very patchy as well, covering only some incidental examples.
[2] His Fenno-Ugric Vocabulary gave only comparative data; their associated reconstructions were only given in an appendix to CompGramm., wherein he had presented his thinking on Uralic comparative phonology and morphology as well.
[3] This oddball soundlaw probably proceeds something like *jw > *jj > *jɟ > *ʎɟ > *ʎtɕ = *ľć.
[4] *o is, I believe, due to the following development: first *a-ə regularly > *å-ə > *o-a, followed by a conditional split: *o > *u before a velar sonorant (regularly established in the case of *-oŋ- and IMO also occurring in the case of *-olk-); lastly *u > *o.
[5] With Kazym /sŏwĭ/, Krasnoyarsk (Southern) /săwə/ regularly retaining PU *-w-.

Bonus Material 2017

A little recap of history: Freelance Reconstruction, the blog you’re currently reading, [1] was originally started as a Tumblr microblog. It turned out though that my blogging style needs a sturdier framework, and for several years now, I’ve been happy to be based on WordPress instead.

This much some old readers may recall. However I never have gotten much into doing quick-paced community engagement blogging on here, in part indeed due to the heavier-duty software. And since I still hang out on Tumblr for unrelated reasons, I’ve also found it useful to have an outlet to comment on things related to linguistics that come up in there.

Thus, enter a new, more casual linguistics sideblog: possessivesuffix.tumblr.com. This has been running for a bit over a year by now, but I don’t think I’ve mentioned anything about it earlier on here. Perhaps I should also request that anon asks be redirected there instead of the old defunct version of this blog?

Here is also a list of some posts on there that might be of interest to the readers on here as well.

1. Original blog posts and commentary on topics:

— on the structure and history of Finnish:

— on Uralic linguistics in general:

— on phonological fun facts and typology:

— other stuff:

2. Links to other blogs, articles etc. without much additional insights of my own:

[1] I’ve seen this blog occasionally linked under the name “Protouralic”, but to be exact, that is only my blog’s URL, not the title. The discrepancy is mainly since I can foresee maintaining this blog long enough that I will no longer be doing freelance reconstruction… It remains to be seen what the blog will be renamed at that point, though.

An old etymology: aistiész

I find it interesting how modern advances in Uralic historical phonology can occasionally turn out to vindicate old sketchy etymological proposals, dating from the earliest phases of scientific comparison of the word stocks of the Uralic languages.

One of these cases appears to be a connection between Finnish aisti ‘sense(s)’ and Hungarian ész ‘reason’. This is a comparison that appears in 1800s work by the likes of Budenz. Already come the 1900s it had mostly been dropped, however (with decent reasons, as we shall see). But regardless, nowadays it seems that the two can be regularly connected after all.

Let’s start from the Finnish side. Any possibility of a comparison with Hungarian can only involve the first syllable, ais-. Once we factor in the other cognates within Finnic though, already internal reconstruction turns out to point towards the rest of the word being suffixal material.

“What other cognates”, you ask? Yes, there are no known cognates in the other Finnic languages, which usually doesn’t bode well for native origin. The Finnish dialects, however, make up a good reserve of lexical diversity (recall that “Finnish” in its widest sense is only a geographical collection of Finnic varieties, not an actual subgroup thereof). We can find in these some interesting parallel formations that allow some deeper exploration of this connection. Standard Fi. only provides aistin ‘sense organ’ and aistia ‘to sense’, both of which could be accounted as derived from aisti itself. More interesting are, however, aisto ‘intent’ (Southwestern dialects) and astaita ‘to observe’ (Tavastian and Far Northern dialects).

The aisti ~ aisto doublet points relatively clearly to an earlier lost verb stem *aistaa. For an exact parallel, cf. paistaa ‘to bake’ → paisti ‘roast’, paisto ‘baking’. Also aistia can be then better analyzed as an iterative aist-i- derived from this stem, not as a zero-derivation of aisti.

The heavy stem structure, in turn, suggests a segmentation of this verb as *ais-ta-. Almost all Finnish verbs of the shape CVXCtA- (where X is any segment: vowel length, semivowel, or consonant) are derivatives, most often of this kind. [1] Likely semantics at this point will be something like *ais(i) ‘senses, observation’ (in any case a nominal) → *ais-ta- ‘to sense, observe’.

So far this reconstructed *aisi does not need to be any older than medieval Finnish. e-stem inflection **aisi : **aise- would be usually be a good sign for dating a word back to at least Proto-Finnic, but we only have evidence for √ais- as purely a root element, not as an independent stem. It is true that consonant stems such as √ais- usually take e as their stem vowel, when required — but this kind of derivation can at least occasionally be based on other stem types as well. E.g. the i-stems kaali(-) ‘cabbage’, viini(-) ‘wine’ have regardless the analogical consonant-stem partitive singulars kaalta, viintä in colloquial Finnish; and verbs in -tA- derived from trisyllabic A-stem adjectives are quite regularly based on a consonant stem, e.g. kavala ‘treacherous’, kumara ‘slouched, bent’, matala ‘low’, viherä ‘green’ [2]kavaltaa ‘to embezzle’, kumartaa ‘to bow’, madaltaa ‘to make shallower’, vihertää ‘to be verdant’.

The key evidence for projecting the root fairly far back comes instead from astaita. Why do we have as- and not ais- in here? I believe the answer is that this is a very old parallel derivative, already from a Proto-Finnic *aisi. The overheavy stem structure *CVXCtA- is innovative in Finnish. In certain very old cases, we instead see consonant cluster simplification to a regular heavy stem CVCtA-. At least the following three cases are still apparent:

  • *kanci > kansi ‘lid’, stem *kant(ə)- > kant(e)-
    → *kant-ta- > *katta- > kattaa ‘to cover’;
  • *nowsə- > nouse- (infinitive nousta) ‘to rise’
    → *nows-ta- > *nos-ta- > nosta- (inf. nostaa) ‘to lift, raise’;
  • *vejcci > veitsi ‘knife’, stem *vejcc(ə)- > veitse-, veis- (partitive veistä)
    → *vejc-tä- > *vec-tä- > dialectal vestä- (inf. vestää) ‘to whittle’.

*ntt > *tt, seen in the first example, has been known for long, and has further support from inflectional morphology (e.g. in the ordinals: *kolmanci : *kolmant-ta > *kolmac : *kolmatta > kolmas : kolmatta ‘third’). Yet another instance of this same sound change, easiest formulated simply as *C₁C₂C₃ > *C₂C₃, is probably the loss of a stop before /st/. This is not evidenced in derivational morphology, but is quite regular in consonant-stem partitives or infinitives (the types lapsi : *laps-ta > lasta ‘child’; juokse- : *jooks-tak > juosta ‘to run’). The cases with loss of a semivowel do not build up a consistent a picture at all, but I think the cases with similar loss of *n, *p, *k etc. allow putting them on firmer ground. — Standard Finnish has analogical veistää for ‘to whittle’, but most other Finnic languages still retain the soundlawful ⁽*⁾vestä-.

Therefore, I would reconstruct here PF *ajs-ta- → *as-ta-, an earlier doublet of the later *ais-ta-. Further derivational extension towards astaita can be well later, however.

If an earlier Finnic stem *aisë- < *ajsə- can be therefore assumed, it turns out that this will an exact equivalent to Hungarian ész. The PU form can be reconstructed as *äśä: in Finnic we have first ä-backing to yield *aśə; followed by palatal breaking to yield *ajśə, and finally depalatalization to *ajsə. The modern Hungarian nominative could just as well continue earlier **eśV, but the short-vocalic (and vowel-stem) plural/accusative/possessive stem esze- clearly requires Proto-Hungarian *ä < PU *ä, just as in e.g. tél : tele- ‘winter’ < PU *tälwä > Fi. talvi.

(My proposal that *aĆV > *ajCV in Finnic still remains without a published defense. Anyone who is skeptical of this is welcome to reconstruct instead *äjśä in the meanwhile and assume cluster simplification in Hungarian.)

While this seems to work in principle, a look into any relatively modern etymological dictionary of Hungarian will present a different, simpler etymology of ész: borrowing from Proto-Turkic *äs ‘memory, mind’. Does this show that the Finnish-Hungarian parallel is only an elaborate coincidence?

I could argue that the loan etymology being “more simple” is mostly cosmetic. An etymology is in fact not more unlikely just because it involves a larger number of sound laws, as long as those sound laws are established well enough in the first place. The entire point of reconstructing sound laws is to group the phonetic development of multiple words under a single assumed event. New examples of a known soundlaw do not constitute new assumptions by themselves. As for morphological complications, my internal reconstruction of pre-Finnic *ajsə < ? *äśä is based solely on the Finnish data, therefore has no bearing on how we analyze the Hungarian.

However, there is also a better option! The connection of Hungarian with Finnish does not mean we have to discard the Turkic comparison entirely: we can simply invert the direction of loaning, and analyze this as a Hungarian loanword in Turkic.

Many of the numerous word comparisons between Hungarian and Turkic originate quite clearly from the Turkic side. Identifying features are common enough, even among the oldest layer of loanwords:

  • unetymological sound structure in Hungarian, e.g. bölcső ‘cradle’ < *belćöw < *belćəɣ ← Turkic *belčik; initial /b-/ is clearly non-native, and there are also no clear precedents for clusters of liquid + affricate in Proto-Uralic.
  • with a loan origin further away than in Turkic, e.g. gyöngü ‘pearl’ < *ďinďü ← Turkic *jinjü ← Chinese (Mandarin zhēnzhū)
  • replacing an established Proto-Uralic term, e.g. hattyú ‘swan’ << *qottVŋ ← Turkic *qotaŋ; contrast PU ? *jëxćə.

But in the absense of any evidence of this sort, it does not seem clear that we would have to continue to simply assume the direction Turkic → Hungarian. In the current case we indeed have equivalent evidence in favor of the other direction (an unproblematic cognate in Finnic, which moreover requires PU *ś > Ugric *s, which change would then be reflected also in Turkic). There is reason to expect more symmetry going on anyway: some of these loanwords go quite far back, to the early 1st millennium CE, when “Turkic” would have still been barely more than a single language (if likely with incipient dialect divisions), while “Hungarian” (maybe less anachronistically: “Magyaric”) would already have been an established branch of Uralic. The fact that Turkic today is a major language family stretching from Anatolia to the Lena, while Hungarian is a single language isolated within its family, is a much later development, from around the 2nd millennium CE.

I expect that a closer look at Hungarian-Turkic lexical parallels will reveal also other cases that can be analyzed as Hungarian loans in Turkic at least equally well as in the opposite direction.

A layer of early Hungarian loans in Turkic could moreover account also for a number of the known “Ural-Altaic” lexical parallels. I’ve posted before about *qujaš ‘sun’. Two quick further examples:

  • Turkic *al- ‘lower, below’: often compared with PU *ëla ‘under, below’. This seems to show the common-in-Uralic sound change *ë > *a, as well as apocope; both of these can be seen also in Hu. al-.
  • Turkic *tāla- ‘to rob, plunder’: well compareable with PU *sala- ‘to steal, hide’. The phonetic development lines the best up with Mansi or Samoyedic, where *s > *t. However, this could be perhaps derived also from a stage of Hungarian where *s > *ɬ had taken place, but further development > *h > ∅ had not. This would be then compareable to how /ɬ/ in Khanty tends to be borrowed as /t/ into Russian or other nearby languages that lack the sound. — EDAL compares the Turkic also with Korean and Japanese verbs for ‘to lure’, but this is a worse semantic match than comparison with Uralic (or, for that matter, with PIE *tsel- ‘to sneak’, whence Germanic *stela- ‘to steal’).

Elsewhere in Uralic, there are no clear inherited cognates that I would know of for my assumed *äśä . There is a Samic reflex though: PS *āj(c)cë- > NS áicat ‘to observe’, but the vowel correspondence *ā-ë ~ *a-e, and the unpalatalized sibilant, clearly point to a loanword from Finnic. (This also seems to have good chances of being one of the pseudo-PS reconstructions that never occurred in Proto-Samic proper.)

For a small tangent — the affricate -c- (= IPA [ts]) is interesting here. It would be possible to explore the possibility that this is somehow metathetical, and based on the suffixed verb *aista- (then further somehow contaminated with the *ë-stem noun to yield an *ë-stem verb). I suspect a different explanation, however.

Namely, the Samic languages are known to fortite inherited prevocalic *ś- to *č-. This unusual sound change could probably be reversed: taken to indicate that *ś- was originally an affricate *ć-, retained in Samic all along vs. normally deaffricated in all other Uralic languages. [3] The same is also suggested by the long-known Indo-Iranian loanwords like *śëta (*ćëta) ‘100’, *śarwə (*ćarwə) ‘horn’ (> PS *čuotē, *čoarvē > NS čuohti, čoarvi). Per the current understanding, these still had an affricate *ć in Proto-Indo-Iranian, retained as c in Nuristani. The fricatives ś in Indic and s in attested Iranian are therefore parallel innovations. Even Proto-Iranian may still require an affricate *c, to account for the development to /θ/ in Old Persian (though perhaps laminal [s̻] would work just as well). There also does not seem to be any reason to assume that any of the old II loans in Uralic would have come from the Indic branch specifically.

In Finnic, there is however no need to assume especially early deaffrication in all positions. We know by now that PF had an affricate *c, partly preserved in South Estonian, but later mostly deaffricated elsewhere. It would seem to be possible to assume that at least word-internal *-ś- in fact yields Proto-Finnic *-c-, not *-s-, and that this is only deaffricated later on, together with *c from other sources (such as the type *wetə > *veti > *veci > vesi ‘water’). — Since SE consistently only indicates s- for PU *ś- (sada ‘100’, sarv ‘horn’, silm ‘eye’, sälg ‘back’, süä ‘hart’, etc. [4]), I would however still assume early word-initial deaffrication: PU *ć- > *ś- > PF *s-. This would run in parallel to the often assumed development of PU *č- > *š- > PF *h-.

Therefore, to immediately correct what I write above, it may be preferrable to assume an original preserved affricate: PU *äćä > *aćə > *ajćə > PF *aici, borrowed at this point into Proto-Samic.

[0] This post is an extended version of an etymology I have presented before at one of the University of Helsinki etymology workshops, in case anyone feels like the basic gist is sounding overly familiar.
[1] E.g. kieltää ‘to deny’ ← kieli : kiel(e)- ‘tongue’; saartaa ‘to surround’ ← saari : saar(e)- ‘island’; köyttää ‘to tie’ ← köysi : köyt(e)- ‘rope’; varttaa ‘to graft’ ← varsi : vart(e)- ‘stem’; haistaa ‘to smell (tr.)’ ← haista : hais(e)– ‘to smell (intr.)’. Perhaps also paistaa ← *pais(e)- ‘to be baked’, given paisua ‘to swell, expand’, perhaps originally used of dough. (This shorter stem in turn has been explained as deriving from PIE √*bʰeh₁- ‘to heat’.) A few “overheavy” verb stems have instead been formed by a suffix -stA- plus a contracted first syllable, though, such as maustaa ‘to season’ < *maɣusta- ← maku ‘taste’.
[2] Mostly replaced in standard Finnish by the metathetic reshaping vihreä.
[3] Without going in too much details, this is a proposal that has already been made by various people, such as Abondolo, Janhunen and Katz. It does have the implication that something needs to be done with traditional PU *ć, though. The only reliable instances seem to be the clusters *-ńć- and *-ćć-. All other words are perhaps better considered later loanwords, diffused between the Uralic varieties. This is also suggested by how the candidates are disproportionally represented in Permic and Ugric anyway, and how they also often show vacillation between traditional *ć and *ś (say, ‘to break’: Permic *ćeg- < *ć-, but Ugric *säŋk- < *ś-).
[4] Võro-eesti synaraamat does have a nursery term tsimmä ‘eye’, but the affricate in here is probably better considered affective variation than an ancient retention. Secondary *ci- < *ti- can however remain; the usual example is tsiga ‘pig’ (~ Fi. sika).

*ü > *i, *ü in Samoyedic

I have noted before that Proto-Uralic *ü, whose reconstruction has at times been opposed by various scholars, has never received a truly detailed defense.

Arguments contra have never been very detailed either — but one recurring claim has been that the contrast *i | *ü might not be reflected in Samoyedic. People who subscribe to a primary division between Finno-Ugric and Samoyedic (I do not, but I recognize that this is not universally held) will be therefore able to propose that the contrast did not yet exist in Proto-Uralic proper, and only emerged later on in the Finno-Ugric group, perhaps in different ways in different descendants. [1]

The most common opinion around is that the principal reflex of *ü in Samoyedic is indeed *i. This is claimed e.g. by Steinitz (1944), Sammallahti (1979) and Janhunen (1981). [2] The known examples all behave well enough:

  • *d₂ümä > *jimä ‘glue’ (~ Finnish tymä)
  • *nüd₁ə > *nir ‘handle’ (~ Fi. nysi, Eastern Khanty /nöl/, etc.) [3]
  • *śüdₓə > *sijə ‘coal’ (~ Fi. sysi, EKh /söj/, etc.)
  • *sülə > *tij ‘fathom’ (~ Fi. syli, Hungarian öl, EKh /löl/, etc.)
  • *türə > *tir ‘full’ (~ Fi. tyrtyä ‘to be satiated’, etc.)
  • *tütkə- > *titə- ‘to open’ (~ EKh /töwətteeɣə-/ ‘to open wide’)

However, Proto-Samoyedic is reconstructed with also an *ü (retained roughly as is in the southern languages: Selkup, Kamassian & Mator). And at least since Sammallahti, also cases with apparent retention from Proto-Uralic have been recognized. The data is rather sparse, but the following look reliable to me:

  • *küntə > *küntə ‘smoke’ (~ Hu. köd ‘fog’)
  • *kütkə- > *küt- ‘to tie’ (~ Hu. köt-, Fi. kytke-)
  • *ńüktä- > *ńüt- ‘to pull’ (~ Fi. nyhtä- ‘to rip off’; Hu. nyű– ‘to rip’) [4]

This kind of two-fold reflection would not be unique in Uralic. In fact, it resembles quite a bit the situation found in at least three other nearby Uralic dialect groups: Southern Khanty, Northern Khanty and Southern Mansi. In all three of these, we find “conditional retention”:

  • *ü remains a labial vowel: SMs /ü/, SKh /o/, [5] NKh /u/ ~ /uu/, when adjacent to a velar consonant: *k, *ɣ, *ŋ. A following *ɣ is also colored to /w/.
  • *ü becomes an illabial vowel: SMs /ä/, SKh /e/, NKh /ä/ ~ /a/, when not adjacent to a velar.

The exact same thing seems to be what is going on in Samoyedic. None of the cases in the first group occur next to an original velar consonant; all cases in the second group do. The contrast *i | *ü would therefore appear to be preserved in Samoyedic after all, although only in this one particular environment.

Phonetic details

Both Mansi and Khanty also suggest that this development is probably not conditional retention — but more likely something I could call “double cheshirization”. First *ü indeed develops to *i, but in the process it colors an adjacent velar consonant, e.g. *k > *kʷ. This phase is attested in Mansi, e.g. *künčə > *kʷinčə >> Western/Eastern /kʷäš/, Northern *kʷas > †/kʷos/ > /kos/ [6] ‘nail’; in the case of medial velar consonants, most often *-ɣ-, also in Surgut Khanty, e.g. *sükśə > *süɣəs >*siɣʷəs > /sewəs/ ‘autumn’.  Then, *kʷ and *ŋʷ develop back to *k and *ŋ, but in the process they color an adjacent *i (or the like) either back to /ü/ (Southern Mansi), or to a back vowel, /o ~ u ~ uu/ (Khanty). So altogether: *kü- > *kʷi- > /kü-/, while e.g. *tü- > *ti- (and not > **tʷi- > **tü-). Expected *ɣʷ however merges with *w, which then generally remains. [7]

One benefit of this approach is that we can assume some amount of vowel rotation to intervene. As seen from the above examples, the de-labialized reflexes of *ü in Mansi and Khanty are not /i/, unlike in Samoyedic. In Mansi, *ü still merges with inherited *i, but they both end up further lowered to *ä (which can be then e.g. secondarily lengthened to /ää/ or backed to /a/). In Khanty, *ü de-labializes at a late stage, by which the reflexes of *ü and *i in Khanty had already drifted out of sync. The development at this time is also probably more [ʏ] > [ɪ] than [y] > [i]. The eventual outcome can be a reduced close ~ mid vowel /e/ (in EKh and SKh; phonetically approx. [ɪ]), a reduced open vowel /ä/ (in the Obdorsk dialect of NKh) or even reduced back open /a/ (in the most innovative dialects of NKh where earlier *a mostly > /o/, such as that of Kazym). By contrast /i/ gives mainly tense /ee/ (~ open reduced /ä/ in Surgut).

For illustration, here are a few of the words from my first list again, with Proto-Uralic *ü in a non-velar environment, now with their reflexes in Southern Mansi, Southern Khanty and Northern Khanty:

  • *nüd₁ə > SMs /näl/, SKh /net/, NKh Kaz. /naɬ/, Obd. /näl/ ‘handle’
  • *sülə > SMs /täl/, SKh /tet/, NKh Kaz. /ɬaɬ/, Obd. /läl/ ‘fathom’

(These two appear to be the only examples retained in both Samoyedic and in all relevant Ob-Ugric dialects.)

Additional parallels

While there is clearly a widely parallel set of innovations involved, it is however not possible to assume *(k)ü- > *(kʷ)i- as a general East Uralic innovation. After all, *ü remains as a rounded front vowel in Eastern Khanty on one hand, Hungarian on the other, regardless of the environment. [8]

But even more impressively, the similarities do not end here! Northern Khanty (but again, not Eastern Khanty) and Mansi (this time in general) also share a related sound change, by which original *wi- becomes *wü-. In the former this is followed by loss of *w-, and due to this the change can be identified as common Mansi rather than exclusively Southern Mansi: e.g. *wittə ‘5’ > *wütə > *ütə >> Proto-Mansi *ätə > SMs /ät/,  NMs ат /at/ ‘5’. [9] Compare EKh (Vakh-Vasjugan) and NKh /weet/.

This split, as well as the Mansi-style loss of *w, occurs even in Hungarian, where the resulting secondary *ü is retained as a labial vowel, just like primary inherited *ü: öt ‘5’, öl- ‘to kill’.

Samoyedic again follows suit:

  • *wittə > *wüət ’10’
  • *wixə- ‘to take somewhere’ > *ü- ‘to drag’

While there are only two examples of this precise development, it can be identified as a more general shift *i > *ü next to labiovelars, with a total of five examples in Samoyedic after all.

I can indeed find no clear examples of Proto-Samoyedic words beginning with *wi-. Most cases that have earlier been reconstructed as such can be now identified as rather continuing *we-; notably *wet pro **wit ‘water’, where original *e is assured both by Old Nganasan †be’, and western cognates such as Fi. vesi. Probably a similar case is PSmy *wi/eŋü ‘son-in-law’, cf. Fi. vävy with open /ä/. The reconstruction of *e is not verifiable, though: Old Nganasan †biŋi has undergone a (regular) assimilation *e-i > *i-i, as also in e.g. *kettä (? *käktä) ‘2’ > *ketä > *śetä > †śiti.

Clear examples of *ü next to a former labiovelar also include *jüjə ‘beard moss’, *kürə ‘rope’ < PU *jäwjə, *käwd₁ə; but here we seem to have a distinct coloring process, perhaps with something like *äw > *öw as its first stage.

Implications within Samoyedic?

Another interesting fact is that the Nganasan reflex of PSmy *ü is *i. It is in principle possible that this actually reflects a more archaic stage than the rest of Samoyedic: if PSmy *ü normally develops by “re-coloring” from an intermediate *i, then perhaps this last stage of the shift never happened in Nganasan.

The contrast *i | *ü is nominally reflected in Nganasan, in that the former palatalizes a preceding *k, while the latter doesn’t (e.g. *kimä > /śimi/ ‘coal’ | *küntə > /kintə/ ‘smoke’). However, this could be also analyzed as reflecting instead an intermediate contrast *ki | *kʷi: just as in e.g. western Romance, plain *k would have been palatalized, yielding /ś/, while labialized *kʷ would have resisted, and later lost its palatalization.

This regardless seems less likely to me than the usual explanation. The shift *ü > /i/ would not be isolated in Nganasan: it is instead part of an extensive vowel chainshift, whereby also *u > /ü/, *o > /u/, *å > /o/. (And though I have not seen this mentioned in sources, presumably also the raising of 2nd syllable open “non-neutral” stem vowels *ä, *å to /i/, /u/ is a part of this.)

Another possible counterargument is that the case of *ńüktä- > *ńüt- shows that at least the re-cheshirization *ikʷ > *ük must be earlier than the loss of *k from consonant clusters. But the latter is reflected in all Samoyedic languages, and would be best dated to Proto-Samoyedic proper.

Exceptions & more

A few additional complications to the scheme above exist as well.

Firstly, there are some cases of *ü in Proto-Samoyedic that do not seem to occur next to a velar consonant.

In loanwords this is a non-issue, e.g. *jür ‘fat’ ← Turkic *jür₂. Many other cases could be also explained in a similar way as ‘to pull’, ‘rope’: from a pre-form with a velar consonant, later regularly lost. Some examples of this type would be e.g. *jü ‘knot’ < pseudo-PU ? *jüKə; *čürə ‘ski pole’ < ? *čäwrV-. There also seem to be cases of *ü that come from the fronting of former *u, such as *jürə- ‘to get lost’, most likely from PU *jurə- (> Samic *jorë-, etc.). In principle *ju- > *jü-, as seen here, could even be regular! There are no examples of PSmy *ju- in vocabulary of Uralic origin.

A second complication is one apparent counterexample. ‘Snake’ in Proto-Uralic is usually reconstructed as *küjə. Several reflexes however point to a secondary derivative along the lines of *küjə-wä. These include Erzya /kijov/, Hungarian kígyó — and in Samoyedic: *kiwä, rather than expected *kʷiwä > *küwä. However, I suspect that the reconstruction with original *ü is erroneous. It seems to be based on Finnic *küü (> Fi. kyy, etc.) in the first place, secondarily also on Udmurt /kɨj/. But if instead of assuming variation *küjə ~ *küjə-wä we reconstructed only a single variant, then perhaps the source of labialization in the Finnic form is instead the “suffix” *-wä, and we can make do with *kijəwä. The later development in Finnic would then seem to be something along the lines of > *kiiwä > *küüwä > *küü.

Maybe we could even reconstruct a simple bisyllabic form *kejwä. This has the benefit of regularly explaining Erzya /i/ (*e-ä > /i/). In Samoyedic, ‘snake’ is not attested from Nganasan, so also *kewä is an equally possible reconstruction; and in Hungarian, í /iː/ usually indicates earlier *e, not *i (e.g. *wetə > víz ‘water’; *me > mi ‘we’; *ke > ki ‘who’). Contraction *ijV > í would be another option in principle, but hardly here: *-j- seems to be instead fortited to yield medial -gy-. This is again less compatible with Udmurt, however, where from *e we would expect Proto-Permic **koj > ˣ/kuj/; something like a Finnic-style assimilation *ej > *ij or a late fronting *u > /ɨ/ (a common enough phenomenon in Udmurt) would have to be assumed.

Also challenging is Tundra Nenets /ṕud/ ‘rope’, which has known cognates in Mordvinic (/piks/), Hungarian (obsolete fiu) and Khanty (*püüɣəL). The Nenets form suggests PSmy *pütə. On first look, this fits well enough into the picture so far: *-t- comes from PU *-ks-, and hence *ü could be conditioned by the former coda velar. But the cognates do not suggest PU *ü; they look more like PU *peksä, or, in principle, *pexsV. [10] Is labialization triggered here by the initial *p-, or is this an example of something more complex along the lines of PU #pewVksV? Or maybe, as Janhunen (1981) suggests, /u/ in Nenets could also be some kind of a late secondary development? There seems to be a parallel of sorts in ‘liver’: PU *mëksa > PSmy *mïtə > TNe /mid/ (literary мыд) ~ /mud/, with /i/ > /u/ between a bilabial and *t. But this could also be purely a coincidence… *mï- > /mu-/ seems to be regular in Enets, and maybe the TNe variant with /u/ for ‘liver’ is simply borrowed from there.

[1] The most consistent defender of this approach has probably been Gyula Décsy. His proposal has been free variation *[i ~ y] in early Finno-Ugric in various labializing environments, later semi-randomly fossilized as separate phonemes. (E.g. 1969: “Die Streitfragen der finnougrischen Lautforschung”, Ural-Altaische Jahrbücher 41.) This is gratuitously vague, though. Any reconstructions involving free variation are probably unverifiable even in principle, though, and I’d like to see some actual precedents for the alleged mechanism of “fossilization of free variation” before I buy any non-explanation of this sort.
[2] The first two of these have been by now added to a new section on my Bibliography page.
[3] The proposed Hungarian cognate nyél is multiply irregular — unexpected palatalization, unexpected vowel height and roundedness — and might not belong here at all.
[4] A new comparison, to my knowledge. The derivative *ńüktä- has previously only been attested from western branches of Uralic: Finnic (Fi. nyhtää ‘to pull (off)’), Mordvinic and Mari. The base root *ńükə-, meanwhile, has cognates also in Ugric, e.g. Hungarian dialectal nyű. The sound correspondences are mostly unproblematic, though SW reconstructs *nüt¹- (= *nüt- or *nüč-) rather than *ńüt- for Proto-Samoyedic. The only reliable reflexes are however from Nenets and Kamassian, which do not distinguish *n- from *ń- before front vowels. Southern Selkup /nüš-/ ‘to tear in half’, with unexpected /š/ as well, is perhaps best left unrelated.
[5] Phonetically the Southern Khanty reflex is actually centralized [ɵ̆] (traditionally transcribed ȯ̆). I follow Honti in analyzing this as an allophone of /o/ next to a velar consonant. Also the opposite interpretation has been proposed though, by Edith Vértes in her editorial comments in the 2nd volume of K. F. Karjalainens Südostjakische Textsammlungen (1997; SUST 225): phonemic /ö/ which conditions a velar ([-back]) allophone [k] of /k/, distinct from phonemic /o/ which conditions the uvular ([+back]) allophone [χ]. However, it is necessary to consider [k] versus [χ] to be phonemic in some other positions (one minimal pair is /keečə/ ‘knife’ versus /xeečə/ ‘mold’; from Proto-Khanty *keečää | *kïïčəɣ), and I think this means that the same would be the preferrable analysis also for the contrast of [kɵ̆] versus [χŏ]. Vértes also notes cases of [ɵ̆] that occur in other environments, though; so Honti’s analysis may have to be back-dated to proto-Southern Khanty, followed by the phonemicization of /ö/ in the separate SKh dialects due to loanwords & such.
[6] /kʷo-/ for PU *kü- > PMs *kʷä- was still recorded by Bernát Munkácsi in his field records from 1888. The field records of Artturi Kannisto from 1905, however, already have just /ko-/, as still in today’s Northern Mansi.
[7] Phonetic [ɣʷ] is attested from Tremjugan Khanty, but this can be interpreted as a post-vocalic allophone of /w/. Márta Csepregi’s chrestomathy gives [ɣʷ] also for /w/ before /uu/.
[8] I suppose a theoretical realignment would be to reconstruct some kind of a different secondary labializing factor for word roots of the ‘glue’, ‘handle’ type (e.g. **d₂imwä, **nid₁wə?), but this does not seem to offer any clear benefits: we would have to assume that this labializing factor gets lost everywhere, but also manages to cause the same kind of rounding effect even in Finnic (which definitely has never been a neighbor of Eastern Khanty specifically).
[9] A final vowel is attested in early Mansi records, though by the late 1800s lost from all varieties. I reconstruct *-ə for such cases; they can be moreover secondarily identified by vowel lengthening in *open syllables in Western and Eastern Mansi, thus  here e.g. /äät/ and not ˣ/ät/.
[10] Janhunen (1981) suggests *piksi (= my *piksə), but this does not actually match any of the descendants.

Linkday #5: Free Resources in Linguistics (Uralic & Otherwise)

I try to keep my sidebar at a manageable size by limiting it to blogs and resources on historical linguistics; but there are obviously many other linguistics sites worth checking out out there as well. One not strictly directly neighboring blog I’ve enjoyed for a while now for has been Humans Who Read Grammars (frequency of gratuitous .gif usage aside perhaps); their most recent post is perhaps exceptionally useful as a concept, in compiling a brief “list of lists of lists” on linguistic resources. (I wonder if a more centralized location such as Glottopedia would be optimal for compiling this type of work eventually.)

Which then brings me to one section on the meta-list: lists of open access journals in linguistics, one compiled by George Walkden, another in-house by HWRG. I am happy to note that Uralic studies seem to be quite well-represented, ranging from traditional establishment journals such as Suomalais-Ugrilaisen Seuran Aikakauskirja (Journal de la Société Finno-Ougrienne) (est. 1886) to newcomers such as Finno-Ugric Languages and Linguistics (est. 2012). At least one older journal also seems to have recently joined the ranks without me having noticed this before: Études Finno-Ougriennes, France’s only regular publication in the field, now added to my links. As could be expected of a slightly older journal (est. 1964), currently only a newer issues are available online. I would still hope to see them expand their coverage further back eventually, though. From what I have heard from the SUSA crew, copyright issues may be of some trouble with digitizing backcatalogue from a few decades back; but surely not undealwithable. The best example for this in Uralic studies is perhaps Hungary’s long-running Nyelvtudományi Közlemények, whose comprehensive online archive spans more than 150 years of issues! — from 1862 to 2015 (though they remain excluded from being considered an open access publication due to their embargo on the most recent issues).

Maybe these developments will also help a bit in dispelling Uralic studies’ alleged status as an arcane and poorly accessible subfield of linguistics. Now, in part this dubious fame is surely a language barrier issue: having substantial amounts of literature only available in Finnish, Hungarian or Russian is a hurdle for most people of any background. And for today’s scholars used to the convenience of literature being available mostly in English, the long-and-thin research history of Uralic studies moreover also makes more substantial demands in German and partly French reading skills than many other subfields. Easy access to literature will therefore not fix everything right away… It will still be a big help for many of us following along from home; I have myself recently moved further away from downtown Helsinki and I am already missing convenient university library access 20 minutes away. Yet it’s also clear that, at least in comparative-historical Uralistics, we continue to be lacking accessible up-to-date reference materials on many of the basics of our field. (I could mention also the lack of reference grammars or dictionaries that would be up to modern standards for many languages — though we’re probably still well ahead of the global curve on this.) That target is regardless drawing slowly closer.

Etymology squib: *paliti

Ranko Matasović, in a recent paper “Substratum words in Balto-Slavic“:

Balto-Slavic also has a number of verbal roots which do not appear to have any cognates elsewhere. (…)
• BSl. *pel-/ *pāl- ‘burn’ > PSl. *paliti ‘burn’

I will take his word on the nonexistence of clear Indo-European cognates. However, we can find a near-identical root right next door in West Uralic (= Samic-Finnic-Mordvinic): *pala- ‘to burn’. This seems like a much clearer point of comparison than Matasović’s proposal of metathesis from PIE √leh₂p- ‘to shine’.

A traditional further comparison within Uralic has been with with Ugric *pad₂ɜ- ‘to freeze’. I’ve never found this compelling. The semantics display a “thermal inversion”, and phonologically this only works by recourse to the dubious PU *ľ, and even then only halfway: in Khanty we’d expect then *Ľ, not *j. I’m more inclined to accept instead the recent connection in Aikio ’16 with the long-known word-family of *pala ‘bit’, more specifically with verbal reflexes in Ugric and Samoyedic meaning ‘to devour’.

Originally I was planning on simply quoting the Uralic material and concluding with “∎”, but no, this does not yet add up to a trivial etymology. For one, even though the narrower distribution clearly suggests West Uralic → Slavic, we could still consider also the opposite direction of loaning (at the cost of abandoning the East Uralic cognates, geographically too far off to be of Slavic or even Balto-Slavic origin; but also at the benefit of dispensing with the semantic shift from ‘devour’). For two, the correspondence Uralic *a ~ Slavic *a poses some difficulty. These are identical graphemes — but before the Great Common Slavic vowel shift, the latter “*a” would have been phonologically a long vowel *ā. [1] Could *a on the Uralic side, not originally subject to a length distinction, been substituted as long *ā > CSl. *a, instead of short *a > CSl. *o? Possible, perhaps, but no such phenomenon surfaces in any of the (rather few) known old Uralic loans into Germanic and Baltic. Alternately, could this be a loan late enough to have skipped the CSl. vowel shift altogether? Again, maybe this is possible. But we clearly end up with some uncertainties in how this supposed loan could have been routed.

For three, the WU root alternates with an “ablaut variant” *pol-(t)ta- ‘to burn (tr.)’, which has never been properly explained. Under current knowledge, we could maybe derive the Samic and Mordvinic variants (*poaltē-, *pultə-) from earlier *palə-ta- ~ *palə-tta-, though Finnic still remains problematic. [2] The existence of comparanda in Slavic opens some new options, though. Some kind of back-loaning is one possibility; for another one, since I am not 100% convinced that this is a U → Sl loan and not the opposite, maybe we could derive the Uralic variants from actual IE ablaut variants, such as an earlier full grade *pōl- versus an otherwise lost zero grade *pal- (from earlier *poh₃l- ~ *pᵊh₃l-)?

Later on in the paper, Matasović also gives a list of various voiced/voiceless doublets, mostly from Baltic. He then adds a strange comment: “In some cases, words showing this alternation may be Uralic loanwords, or they may reflect the pronunciation of originally Baltic words by speakers of Uralic, who underwent language shift.” This does not seem to be combined with any attempt to find Uralic equivalents though… and in many cases such a search would be doomed anyway. At minimum, doublets with word-initial consonant clusters (a bit over half of the cases, e.g. Latvian sniekt ~ sniegt ‘to give’; Lith. klusnus ~ glusnus ‘obedient’) would be clearly alien to Uralic phonology. The only case in his list that I could possibly see as connected to any actual Uralic words is Lithuanian viskėti ~ vizgėti ‘to swing’, which has some similarity with Finnic *viskat- ‘to throw, cast’ (but maybe not enough to actually matter).

I don’t want to harp on Matasović in particular. But this regardless strikes me as a part of a wider disconnect between IE and Uralic studies. The oversights here — the false negative of *paliti being supposedly isolated, and the (weak) false positives of words like klusnus being called possibly Uralic — both fit into a pattern where Uralic gets unwarrantedly treated as lexical terra incognita, despite extensive research to the contrary; much of it in readily accessible languages like German and English, even.

Plenty of Uralic-IE lexical comparisons have of course been compiled over the times… by Nostraticists and Indo-Uralicists. Skepticism on macro-comparison hypotheses like these should not be taken as a reason to neglect the raw data used, though: by now many cases have been shown or at least proposed to be loans from IE to U instead. I would expect close analysis to also reveal some number of cases better explainable as U to IE loans, too, if we would only pause to at least consider the possibility. [3]

[1] This type of error is committed every so often in IE/U loanword studies, where numerous traditional transcription schemes clash. Also worth mentioning is at least the similar graphemic identity but phonological non-identity of PU *a (an open back vowel, [ɑ ~ ɒ]) and PII *a (a non-close central vowel, [a ~ ə]); which explains e.g. why PII *ćata ‘100’ gets loaned as PU *śëta (with *ë, a mid non-front vowel, [ʌ ~ ɤ]) rather than *śata.
[2] I’ve sometimes wondered if the stem type shift *a-ə > *o-a could have under some conditions, such as after *p, extended to Finnic as well. However, by this approach it would be mysterious why we end up with Fi. polttaa and not a contraction verb polata : polaa- (*polat-). Estonian põlema ~ Votic põlõa ‘to burn (intr.)’ also seem to evidence instead a base root *polë- < *polə-. One other solution would be to suppose instead *palə- > *poolë- (just as in *pälä > *palə > *pooli ‘side, half’), followed by suffixation to transitive *pool-tta-; which might have been then shortened to *poltta-, due to a general ban on CVVCC syllables in Proto-Finnic times. After this Estonian and Votic might have back-derived the stem *polë- from this…
[3] For one other possible example, cf. the case of Uralic *pisə- ‘to put (in)’ ~ Balto-Slavic *p(e)is- ‘to push, to fuck’, briefly discussed in this blog’s comments earlier.

Tagged with: , , , ,
Posted in Commentary, Etymology

Observations on second-syllable vocalism in Khanty

This summer I’ve finished digitizing the main bulk of comparative data from László Honti’s Geschichte des obugrischen Vokalismus der ersten Silbe (1982): his 724 Proto-Ob-Ugric reconstructions and their descendants in the individual Mansi and Khanty varieties. Before making this available in any form though, I’m planning on eventually cross-checking at least a few other key sources. For one, there are Steinitz’ DEWOS, Kannisto’s recently released Vogulisches Wörterbuch, and some other materials for additional dialect coverage; for two, there are UEW and similar sources covering inherited vocabulary that has only been retained in one of Mansi and Khanty; for three, I will be also adding the data Honti includes but considers uncertain (this part already underway). [1] A potential fourth extension could be the known loanwords of Komi / Turkic / Tungusic / older Russian origin, at least whenever attested in both Mansi and Khanty: they should be able to offer substantial evidence for constraining speculation on historical phonology.

However, even at this stage, the data can be assumed to contain a substantial part of the inherited lexicon of Mansi and Khanty. So I have taken the opportunity to do some preliminary comparative analysis.

One interesting underresearched topic is second-syllable vocalism, which actually includes even the basic groundwork within Mansi or Khanty. This might have importance for Uralic comparison in general, since our current understanding of Proto-Uralic word stem types comes mostly by extrapolation from Finnic and Samic. Although the basic division into *A-stems (~ *a-stems & *ä-stems) and *ə-stems (~ *e-stems or *i-stems) finds some substantial confirmation from Mordvinic and Samoyedic, it fares substantially more poorly with Mari, and within Permic and Ugric, there is not too much direct evidence to work with second-syllable vowel contrasts at all the first place. Attempting to reconstruct other second-syllable contrasts from conditional vowel developments in the first syllable is theoretically possible (I believe Zhivlov (2014) is still the most recent example of this), but this carries often a risk of circular logic, and if low on data, may also run into accidental correspondences between unrelated phenomena.

There is regardless some direct evidence of second-syllable vocalism in Ugric. Looking in the rest of this post at Khanty in particular: the Khanty evidence has been explored in the 60s in some aspects by Gerhard Ganschow and Gert Sauer, [2] but mostly the topic has gone without detailed research. Steinitz’ Geschichte des ostjakischen Vokalismus (1950) does not treat the subject and only focuses on the first-syllable system.

A few overview notes on unstressed syllables, without detailed analysis of the data, are given by Sauer in Die Nominalbildung im Ostjakischen (1967) and Honti in Chrestomathia Ostiacica (1984). These outline a division into five stem categories:

  1. Basic consonant stems (the most common).
  2. *A-stems, with an open full vowel (*ää, *aa). Decently preserved in inlaut (verb roots, CVCAC and other longer stem types), but in absolute auslaut in the nominative of noun roots, the vowel is widely reduced and possibly lost entirely.
  3. *I-stems, with a close full vowel (*ii, *ïï). Preserved somewhat more widely, again better in inlaut than in auslaut.
  4. A third vocalic stem type, yielding *I-stems in Eastern Khanty but *A-stems in Western Khanty.
  5. *əɣ-stems: these behave as ordinary consonant stems in EKh, but vocalize in WKh to merge with the *I-stems.

This certainly covers most of the bases. A close look at the comparative data, however, suggests that this picture should be probably modified and perhaps also expanded.

The *I-stems, reinterpreted

I would propose as an initial adjustment that the *I-stems are to be reinterpreted as a part of the consonant stems: as *əj-stems. This is indirectly suggested by the absense of stems of the shape *CVCəj from the Proto-Khanty lexicon, even though *CVj is well-attested (e.g. *ɬöj ‘pus’, *pooj ‘ice crust’, *saaj ‘goldeneye’) and examples of *CVjəC occur too (*kaajəm ‘ash’, *waajəɣ ‘animal’). Direct support is provided by at least *ńooɣïï ‘meat’, *sooɣïï ‘clay’, cognate to Mansi *ńaawľ, *suwľ respectively. Instead of parallel suffixation, these can be analyzed as reflecting the typical sound correspondence Mansi *ľ ~ Khanty *j (< PU *ď; the intermediates are not obvious, but that question is irrelevant for now). Dating *-əj > *-I as a proto-Khanty innovation does not seem to be possible either, since in verb stems, Southern Khanty still retains /-əj-/: e.g. ‘to break’, Far Eastern (Vakh-Vasyugan) /aarïï-/ ~ Southern /oorəj-/. (And see below for some related considerations concerning the *əɣ-stems.)

This also accounts for a minor typological paradox. Why is second-syllable *-I better retained than *-A in the Khanty dialects, even though we would expect a close vowel to be more readily subject to reduction? A promising answer would be that the vocalization *-əj > *-I, despite being reflected in all Khanty varieties, is more recent than the partial reduction of *-A in some varieties. Sound changes *ej > /iij/ ~ /ij/, *je > /jii/~ /ji/ are also well-known in Northern Khanty, [3] and I suspect this is additionally a part of the same wave of vowel coloring, in these varieties further generalized to the first syllable. This would date *-əj > *-I as at minimum more recent than the Southern / Northern split.

I have seen the sound change *-əj > *-I mentioned in various works already (Sauer, Honti, Helimski…), but not anyone willing to bite the bullet and note that this can be taken as the definitive original source of this stem type.

Also, one secondary sound change. It appears that in Obdorsk Khanty, word-final *-I > /-aa/ after /x/: *ńalkïï > /ńalxaa/ ‘Siberian fir’; *ńooɣïï > */ńoxaa/ ‘meat’. This appears related to loss of vowel harmony. In NKh, first-syllable *ïï > *ee before velars instead of > *ii elsewhere, and I suspect something similar is involved here. I would assume first *-kïï > *-xïï > *-xëë, then *-ëë lowers to /-aa/ instead of backness neutralization to **-ee.

This would then seem to show that yes, Western Khanty too (or at least Obdorsk Khanty) has gone through a vowel-harmonic stage with *-ii ~ *-ïï, instead of directly vocalizing *-əj to front *-ii everywhere.

The *Aj-stems

Reconstructing *əj-stems also sheds light on the fourth stem category in the outline above. I would side with Honti in reconstructing these as *Aj-stems. Southern Khanty provides clear evidence in favor: e.g. /xašŋääj/ ‘ant’ ~ Far Eastern /koočŋïï/. Sauer considers /j/ in SKh to be instead epenthetic, generalized from inflected forms where a vowel-initial suffix followed, but we can again appeal to comparative evidence from Mansi, where we find e.g. Northern /xooswoj/. I would add that with correct relative chronology, the development *-Aj > *-I in EKh drops right out of the other attested sound laws, with no need to posit any additional changes particular to this stem category: start with the reduction *A > *ə, follow up with *-əj > *-I.

The origin of the *Aj-stems also appears to be clarifiable. Words such as ‘ant’ point in the direction that they often originate in compounds. I believe that in many cases, their second member is likely the root seen in Ms *wuuj ‘animal’, though found independently in Khanty only in the suffixed form *waajəɣ. Some other *AAj-stems in Khanty that seem to have this origin include: *jeetərɣääj ‘black grouse’; *kaaməɭkaaj ‘water beetle’ (maybe with a first component akin to *koomɭəŋ ‘bubble’); #karŋaaj ‘woodpecker’ (and thus, contra Honti, not segment-for-segment identifiable with Hungarian harkály); #wuurŋaaj ‘crow’.

Many of these words also show irregular vacillation between medial *-ŋ- and *-ɣ-. My hypothesis is that this might be a trace of the PU genitive suffix *-n, and e.g. what I write as approximate #karŋaaj (Obdorsk metathesized /xaŋraa/; Konda spirantized /xaxrääj/; Surgut /kajaaïï/ and Far Eastern /kajərkïï/, maybe by metathesis and dissimilation: < *kaɣərKəj < *karɣəKaaj?) should be thus reconstructed as something like #karkə-n_waaj > #karɣəŋɣaaj, reflecting an original genitive attribute construction: ‘animal of the beak’, or something to that effect.

Compound origin would additionally explain also the complete absense of *Aj-stems among verbs.

It’s also possible I am late to the scene here. I’ve seen references to a 2003 paper by Anna Widmer “Zur Geschichte des obugrischen Tiersuffixes”, [4] and it sounds like this covers this same topic, but I do not (currently) have access to it.

The *əɣ-stems

Among the *əɣ-stems, an interesting complementary distribution appears that I have not seen remarked on before. Many sources note that the reflexation in Northern Khanty in nouns is somewhat inconsistent: in some cases we find Kazym /-i/, Obdorsk /-ii/, the same as in *I-stems; but, in others, we find Kazym and Obdorsk zero. (Southern Khanty and the “transitional” Nizyam dialect have consistently /-ə/ in both cases.) Verbs also only show the development to *-I-.

This split distribution seems to be conditioned by the preceding consonant: *-əɣ > *-I appears after obstruents, *-əɣ > ∅ after sonorants. Some examples of the former:

  • ‘owl’: Vakh /jewəɣ/ ~ Kazym /jipi/
  • ‘Khanty’: Vakh /kantəɣ/ ~ Kazym /xanti/
  • ‘birch bark’: Vakh /tontəɣ/ ~ Kazym /tonti/
  • ‘barbel’: Vakh /mööɣtəɣ/ ~ Kazym /meewti/
  • ‘duck’: Vakh /wääsəɣ/ ~ Kazym /waasi/
  • ‘knife’: Vakh /kööčəɣ/ ~ Kazym /keeši/
  • ‘pine’: Vakh /ɔɔɳčəɣ/ ~ Kazym /wooɳši/

And some examples of the latter:

  • ‘song’: Vakh /äärəɣ/ ~ Kazym /aar/
  • ‘roach’: Vakh /läärəɣ/ ~ Kazym /ɬaar/
  • ‘crane’: Vakh /taarəɣ/ ~ Kazym /tɔɔr/
  • ‘bowl’: Vakh /ääɳəɣ/ ~ Kazym /aaɳ/
  • ‘lightweight’: Jugan /köńəɣ/ ~ Kazym /keeɳ/
  • ‘bog’: Vakh /kɔ̈ɔ̈ɭəɣ/ ~ Kazym /kaaɭ/
  • ‘animal’: Vakh /waajəɣ/ ~ Kazym /wɔɔj/

There is only one example involving Proto-Khanty *L (a cover symbol representing both *ɬ and *l, which are medially neutralized everywhere). [5] It appears to align with the sonorants:

  • ‘rope’: Nizyam /keetə/ ~ Kazym /keeɬ/

Inconveniently, here *-L- continues PU *-d-. It is therefore not possible to clearly tell if we are dealing with Proto-Khanty *-l- or *-ɬ-, since both paths of development have been suggested. In principle, though, this example would support a claim that the development was in fact first to *-l- (a sonorant), as also in Permic / Mansi / Hungarian.

I am not sure how the split development here should be interpreted phonetically, either. The core motivation seems to be a general cross-linguistic one at least: sonorant codas are more licensable than obstruent codas. But at least secondary loss of /-i/ after sonorants is ruled out, since in genuine Proto-Khanty *I-stems (*əj-stems) this remains. Examples are not numerous (by far most occur following /r/), but they exist:

  • ‘riverbed’: Vakh /uurïï/ ~ Kazym /woori/, Nizyam /uurə/
  • ‘sturgeon’: Vakh /köörii/ ~ Kazym /kari/, Nizyam /karə/
  • ‘scab’: Vakh /kaľïï/ ~ Kazym /xaɬ´i/, Nizyam /xaťə/

This thus ends up further supporting my above-suggested chronology, where *-əj > *-ij > /-i/ took place only after the separation of Northern Khanty: the *-əɣ > ∅ group likely never went through an *-əj-stage. In other words, whatever the exact split development here was, it would have predated the common (but not Proto-!) Western Khanty shift *-əɣ > *-əj.

Maybe this could even be equated with the development of post-tonic (“non-stem”) *ɣ to /j/ in Obdorsk Khanty under certain conditions (e.g. ‘father’: EKh /jeɣ/, Nizyam /jiɣ/, Kazym /jiw/ ~ Obdorsk /jiij/; ‘power’: Vakh /wööɣ/, Nizyam & Kazym /weew/ ~ Obdorsk /weej/). This would then require rather early separation between Obdorsk and the other NKh dialects though, perhaps early enough to invalidate the concept of “Northern Khanty” as a genetic group altogether, and turning it into merely an areal subset of Western Khanty varieties.

I would not take this last corollary as a huge problem though, since I actually suspect the same already on other grounds as well… For just two examples:

  • The word for ‘grass’. Far Eastern and Obdorsk have /paam/, while the other dialects have reflexes pointing to *pɔɔm. This surely involves an irregular (“non-provably regular”?) labialization between two bilabial consonants; [6] and yet this labialization cuts across the conventionally accepted grouping of the Khanty dialects.
  • The treatment of supposed Proto-Khanty *ɔ̈ɔ̈ and *öö. These yield in some contexts /oo/ in Obdorsk, but *ää and *ee respectively in the rest of Western Khanty. Yet, the elimination of front rounded vowels is pan-WKh, and e.g. Honti and Steinitz claim it as indeed proto-WKh. [7] But if so, we have to route Obdorsk /oo/ differently. I wonder if another early shunt will work: if, following Helimski etc. we reconstruct lax open *ä, *a instead of *ee, *öö, *oo, then it will be possible to re-route “*öö > /oo/” as *ä > *a > /oo/, involving a pre-Obdorsk conditional retraction of *ä to *a in some environments.

— For some reason, nearly all words of the *-əɣ > ∅ group also involve Proto-Khanty low *aa, *ää, or mid *ee, *öö, *oo (= *ä, *a?). Perhaps there is also something more going on in here. This is also suggested by one example with a close vowel, where in Northern Khanty we find metathesis instead, viz. ‘eight’: Vakh /ńïïləɣ/ etc. ~ Nizyam /ńiwtə/, Kazym /ńiwəɬ/, Obdorsk /ńiijəl/ (< virtual PNKh *ńiiɣəɬ).

I also wonder how the changes *-əɣ > *-əj > *-I would interact with another innovation common to all of Khanty: the cluster contraction *-jt- > *-ć- (often involving the PU verbalizing suffix *-ta-, e.g. in *uj-ta- > PKh *ɔɔć- ‘to swim’). The more economical approach — that *-jt- > *-ć- was Proto-Khanty while *-əj > *-I was post-PKh — would however predict that we should find cases where an *I-stem noun or intransitive verb has a corresponding intransitive or transitive verb (respectively) ending in *-əć-. Offhand I cannot locate any such cases, however. But maybe this type of derivation was morphotactically impossible in the pre-PKh period? For comparison, in Finnic *-i < pre-PF *-j is a common suffix of deminutive nouns, and *-i- < *-j- is a common suffix for iterative verbs, but these generally do not form further verbal derivatives: any corresponding verbs are instead formed from the underived root.

At least one word also suggests the possibility of *əj > *-I being earlier than the contraction to *-ć-: ‘to split’, Vasyugan /ɭaaŋkïït-/ ~ SKh /laaŋxət/ ~ Kazym /ɭooŋkit-/, where we would seem to have PKh *ɭaaŋkəjt-. However, this could also be a later derivative, formed after *-jt- > *-ć- had ceased to operate.

There also seems to be a lack of PKh words ending in coronal + *-I, that is, earlier  *-təj, *-səj, *-nəj, *-Ləj. (There are a few examples with a /Ct/  consonant cluster though, e.g. *aŋtïï < *aŋtəj ‘horn’; *maartïï < *maartəj ‘mythical land of birds’.) Maybe this indicates a parallel palatalization, and pre-Khanty *-Cəj or *-CjV resulted in a stem-final palatal instead of an *I-stem. Stems of the shape CVĆ are not very common in the current dataset either, though. But maybe any examples of this simply have not been connected with their equivalents in Mansi or elsewhere in Uralic yet?

Retaking inventory

Since it turns out that close second-syllable vowels in Khanty are secondary, from the Proto-Khanty perspective I should be probably talking about vocalizable stems, not “vowel stems”. This then suggests that a sixth category should be also distinguished: PKh *Aɣ-stems. These would then fill up a neat 2×3 system:

  • vowel stems: *-A(C), *-Aj, *-Aɣ
  • consonant stems: *-∅/-əC, *-əj, *-əɣ

A few words ending in *-Aɣ are indeed reconstructed by Honti, and they indeed also show distinctive development of their own. A representative example would be the adverb *koɳčaaɣ ‘on back’: Far Eastern /koɳčaaɣ/, Surgut /koɳɣïï/, Southern /xončää/, Nizyam & Kazym /xonšaa/, Obdorsk /xonsaa/. So we have here:

  • loss/vocalization of *-ɣ in WKh, versus its retention in EKh (same as in *əɣ-stems);
  • retention of *-A in not just EKh but also WKh, presumably protected by the earlier word-final consonant (partially same as in *Aj-stems);
  • a strange development to /-ɣïï/ in Surgut, perhaps through metathesis (*-aaɣ > *-ïïɣ > *-ɣïï)?

Kind of paralleling *Aj-stems being mainly animal names, all of Honti’s examples seem to be adverbs. The other two are *koomtaaɣ ‘overhead’, *pertääɣ ‘back’. I would add to this group also *maakaaɣ ‘previous’, which he reconstructs as *maakaaj, despite SKh /maxaa/ and not ˣ/maxääj/.

The *A-stems

Moving onto the main bulk of *A-stems, these may also need to be analyzed as partially secondary. This, however, requires taking a few steps back to look at the wider context.

While the modern Khanty varieties and also most reconstructions of Proto-Khanty abound in consonant stems of the shape CVC, CVCC or CVCəC, it is clear that this is an innovation, and that in Proto-Uralic the dominant root structure was bisyllabic *CV(C)CV. It is also clear that the transition towards consonant stems across a wide central area among the Uralic languages has taken place mostly as areal drift, not as a diagnostic subgroup innovation. Marginal languages of this type, such as Estonian, Nenets and Skolt Sami, still remain at a “thematic inflection” stage, showing consonantal nominative singular forms but vocalic inflectional stems. A good example would be Estonian nom.sg. silm : gen.sg. silm-a ‘eye’, where the latter form is at least from a historical point of view better viewed as silma-∅ (and thus structurally identical to Finnish silmä-n). Verbal roots, which generally cannot stand alone, also generally retain original second-syllable vocalism. And due to the lucky fact that the largest clear subgroups of Uralic all occur near the edges (Finnic, Samic, Samoyedic), in all of these cases we will be able to compare these languages with close relatives that remain at a firmly vowel-stem-centric inflection type (e.g. Votic, Inari Sami, Nganasan, respectively).

A transitional stage, one of several possible, is represented by Hungarian, where nouns retain a trace of thematic inflection (nom.sg. hal : plural hal-a-k ‘fish’; but nom.sg. dal : pl. dal-o-k ‘song’). However, in adjectives and verbs, presumable earlier lexically determined stem vocalism has been levelled entirely, and in most word forms second-syllable vocalism is now better analyzed as morphologically determined. Constantly vocalic stems have also been reintroduced among nouns, primarily in loanwords (e.g. balta : baltá-k ‘axe’, from Turkic), but also in derivatives (e.g. apa ‘father’, where -a has been interpreted as a fossilized possessive suffix).

Sauer’s old work proposes that *A-stems would be a retention from Proto-Uralic in one environment specifically: stem-finally in nominals, as suggested by a few equations like PU *neljä > PKh *ńeLää ‘4’. This would imply that elsewhere they aren’t retentions. The PKh situation as currently reconstructed therefore seems to derive from something close to the Hungarian situation, where original stem vowels have first been almost always phonetically reduced or analogically reshuffled away; then new ones are introduced.

Loanwords can of course fill in new second-syllable vowels, e.g. EKh /aarkaan/ ‘thick rope’, from Turkic; *ajaa > EKh /ajaa/ ~ /ajə/, WKh /aj/ ~ /oj/ ‘luck’, from Tungusic. In native vocabulary though, the most natural source for new second-syllable vowels are original third-syllable vowels. Given the original trochaic stress pattern of Proto-Uralic (as still continued in Samic, Finnic, partly Hungarian and Samoyedic), foot-final vowels would be expected to be the first ones to fall. After this, earlier 3rd-syllable vowels will move one syllable forward, becoming new unreduced 2nd-syllable vowels.

In at least some of the examples I’ve discussed above, 2nd syllable *-A clearly derives from an original 3rd syllable. *koomtaaɣ ‘overhead’, for example, is probably a derivative of PU *kuma- ‘overturned’, i.e. descends from pseudo-PU *kuma-takV. The entire animal name group also falls under this.

Now, the crucial question is — at what point in the history of Khanty was the distinction between “primary” 2nd syllable vowels, retained since PU, and “secondary” 2nd < 3rd syllable vowels lost for good? I think there’s reason to think that this, too, was post-Proto-Khanty.

Relatively poor retention of absolute final *-A is maybe best attributed to specifically word-final reduction/loss. The numeral ‘4’ for example, does not surface with a final full vowel anywhere: the reflexes are Far Eastern /ńelə/, Surgut /ńeɬə/, SKh /ńetə/, Nizyam /ńitə/, Kazym /ńaɬ/, Obdorsk /ńiil/. In many other cases, only the Vasyugan dialect delivers: e.g. *paraa ‘raft’ > Vy. /paraa/, Vakh, Surgut & Demyanka (SKh) /parə/, Obdorsk & most SKh /par/, Nizyam & Kazym /por/.

(It’s unclear at least to me what’s up with the loss of *-A in SKh and Nizyam in ‘raft’, versus its retention as /ə/ in ‘4’. Both patterns have further examples; retention is more common. I’m not sure if I would want to utilize a “primary/secondary” distinction just for these.)

A bigger problem though is that “primary” *-A is mostly lost also in verbs, even though in these the vowel would have been always protected by an inflectional ending. For example *kalaa- ‘to die’ yields Far Eastern /kalaa-/, Surgut /kaɬ-/, SKh & Nizyam /xat-/, Kazym /xaɬ-/, Obdorsk /xal-/. This is in clear contrast to “secondary” *-A in words such as ‘height’: VVy /peläät/, Tremjugan (Surgut) /peɬiit/ (?), Nizyam /pataat/, Kazym /paɬaat/, Obdorsk /päläät/ — which, again, clearly comes from a longer proto-form, being a derivative from PU *pidə > PKh *peL ‘tall’ (and probably further cognate to also e.g. Fi. pituus : pituude- ‘length’, allowing a PU reconstruction #pidə-(w)Otə).

There seems to be some evidence for a “primary/secondary” distinction to be found in *-AC nominals, too. A good example might be *raɣaam ‘relative’ > Vakh /raɣaam/, but Tremjugan /raɣəm/, WKh /raxəm/; derived from a base verb ‘to approach, be near’ — only attested in WKh, and it could be from PKh *raɣaa- rather than simply *raɣ-.

Even if Proto-Khanty had a contrast between two types of *A-stems, trying to reconstruct this in the original 2nd syllable / 3rd syllable fashion seems like the wrong approach, though. In cases like ‘height’, this would lead to awkward vowel-cluster reconstructions such as **peLəäät. In cases like ‘overhead’, nothing would immediately stand out typologically in reconstructing **koomətaaɣ, but this still has at least one undesirable consequence: we can no longer treat *ə as a purely epenthetic vowel in PKh, inserted to resolve consonant clusters (reconstructions like *waajəɣ ‘animal’ are in fact better taken as phonologically */waajɣ/), and at least some cases would have to be assumed underlying.

I have another hypothesis in mind: the distinction may have been prosodic. 3rd syllable vowels in PU would have originally born secondary stress, and this might have been retained in some form even after the loss of a preceding 2nd syllable.  It’s not clear if an outright iambic stress pattern should be assumed though (*peˈLäät), or if something like a monosyllabic initial stress group followed by secondary stress will suffice (*ˈpeL|ˌäät). In principle it would be also possible to leverage the tenseness distinction, well-attested in initial syllables: *peLäät with tense *-ää, versus *ńeLä with lax *ä? For now, I will notate this distinction as *-À (“primary”, “unstressed”; individually *-a, *-ä) versus *-Á (“secondary”, “stressed”; individually *-aa, *-ää). Regardless of the phonetic specifics, later on *-À would have been generally reduced (*raɣam > /raɣəm/), while *-Á would have remained (*peLäät > /peLäät/).

The stress hypothesis finds some amount of direct confirmation as well: cases of fully iambic second-syllable stress have been reported at least from Eastern Khanty (Far Eastern /peˈläät/, Surgut /peˈɬäät/).

Stress in EKh does not appear to be a direct archaism, however. Per all descriptions I have seen, the attested distribution is purely phonological: stress is primarily initial, except when the 1st syllable contains a lax vowel and the 2nd syllable a tense one. This also rakes in cases of “unstressed” *-À; e.g. Far Eastern /kaˈlaa-/ ‘to die’. This seems like another point in favor of some kind of a more subtle distinction in PKh. I would suppose that in varieties of EKh, *-À was early on partly tensed to merge with *-Á, and could have actually acquired stress only later. Wherever this change failed to take place (including in all varieties of WKh), *-À was then reduced/lost.

In summary

Altogether, I propose the following general chronology for the development of second-syllable vocalism in the Khanty varieties:

  1. The partial merger of *-À and *-Á in Eastern Khanty (with variable conditions); including *-Áj > *-Àj.
  2. The reduction of remaining *-À across all of Khanty; loss of *-əɣ in Kazym and Obdorsk after sonorants.
  3. *-əɣ > *-əj across all of Western Khanty.
  4. *-əj > /-I/ across all of Khanty (with variable conditions); in parallel, *-Aj  > /-A/ in Northern Khanty.

All of these changes are very heavily areal, and do not seem to define any substantial genetic subgroups. The main divisions of Eastern Khanty, the Far Eastern and Surgut groups, would have to be assumed to have split already before step 1 (*kala- > *kalaa- vs. *kal-); the Nizyam / Kazym / Obdorsk dialects of Northern Khanty, already before step 2 (*äärəɣ > *äärəɣ vs. *äär). The split of Nizyam and Southern Khanty could be in principle delayed until step 4 (making Nizyam a “Northernized Southern” rather than a “Southernized Northern” dialect after all), but this seems like a poor idea, even if for now I cannot refute it explicitly.

Areality seems to be further proven by how most parts of this scheme have parallels also in Mansi (e.g. *-əɣ > Northern and Pelymka (Western) Mansi /-iɣ/, Eastern and rest of Western Mansi /-i/; *-A > EMs, WMs -∅). But a detailed look into this will be a task for later.

Further implications

So what can we do with this?

The above analysis leads to at least one more general interesting corollary for Khanty historical phonology. If PKh *À-stems were in the early common Khanty period reduced en masse — then this opens the possibility that several cases could have been lost entirely from the data. Already Sauer notes that all inherited word-final cases of PKh *A-stems seem to occur either following the PKh lax vowels (*e *ö *o *a), or the traditionally reconstructed tense mid ones (*ee *öö *oo). Other cases could have existed as well … we may just be currently unable to directly distinguish them from consonant stems.

There may be, however, indirect evidence to draw such distinctions. The notorious Khanty “ablaut” system (which I am afraid I cannot explain in detail in this post) has for a while now been explained as being instead a partly morphologized system of former umlaut. [8] Per this hypothesis, alternations like EKh (*)ɬɔɔj ‘finger’ ~ (*)ɬuuj ‘thimble’ would continue something like earlier *ɬɔɔj(A) ~ *ɬuuj-(i), either with i-umlaut of *ɔɔ to *uu in the derivative ‘thimble’; or a-umlaut of *uu to *ɔɔ in the base root ‘finger’. I am more inclined to side with the latter (Honti’s view) than with the former (Helimski’s). If close/open ablaut in Khanty is fundamentally based on a-umlaut, the assumed umlaut trigger could be then identified as *-À, and we could then amend ‘finger’ to PKh *ɬɔɔja instead. This in turn also accords fairly well with the PU reconstruction: *suwd₂a (with Samic *čuvðē, Samoyedic *təjå clearly indicating an original *A-stem). By contrast, Helimski’s assumed *I-stems seem to be nowhere supported by actual data: they are simply circularly inserted into proto-forms where a close-grade vowel eventually surfaces.

Perhaps even un-umlauted *ɬuuja is a possibility for PKh. Vowel alternation in many cases occurs only in EKh, not WKh, and I would not dismiss offhand the possibility that this reflects unstressed vowel isoglosses in early common Khanty. In this case we indeed find WKh *ɬuuj (SKh /tüüj/, Kazym /ɬuj/, etc.) and not *ɬɔɔj > **ɬooj. Instead of assuming levelling from ‘thimble’, or from possessed forms (Vakh /luujəm/ ‘my finger’), maybe no umlaut took place here to begin with, and the discrepancy between EKh *ɬɔɔj ~ WKh *ɬuuj goes back to already earlier *ɬuuja ~ *ɬuuj(ə), with some kind of an early conditional loss of *-À in WKh.

Some other cases of “umlaut” might turn out to be illusory entirely. I am on board with the “Helimski school” reanalysis of “Steinitz school” PKh *ee, *öö, *oo as lax open vowels, and PKh *e, *ö, *o as lax close vowels (though I would be content to keep on using the symbols *e, *ö, *o for the latter). However, the associated reanalysis of Steinitz’ lax open *a as close *ï seems unsatisfactory. In most cases, this continues PU open *a; it is also continued as lax open /a/ in most Khanty varieties. Moreover, we can identify numerous instances where this occurs in an *À-stem instead. The clearest evidence are “thematic verbs” such as ‘to die’, where at least in Eastern Khanty the surface alternation is between /oo/ (/kool-/) and /a-aa/ (/kalaa-/). Since Helimski considers *ï to be the i-umlaut counterpart of *a, he ends up proposing the phonetically nonsensical solution that *A-stems would have triggered i-umlaut!

Instead of a back-and-forth development *a > *ï > /a/, purely for the sake of making way for *a > /oo/, I would propose that the rewriting of *ee, *öö, *oo as *ä, *a does not reflect mechanical identity. Rather, the alternation of the sort /oo/ ~ /a-aa/ is again perhaps post-Proto-Khanty entirely. PKh lax *a and *ä were only tensed and raised to /oo/, /ee/ ~ /öö/ when stressed; when unstressed, they were left as is (and not umlauted to anything at all). The first-syllable alternation /oo/ ~ /a-aa/ should be taken back to an earlier stress alternation /á(-ə)/ ~ /a-á/, in turn going back to earlier *á-ə ~ *á-a, through the Eastern Khanty stress retraction shift *-À > *-Á.

Filling up the details on this hypothesis (and possible similar approaches to other ablaut patterns) will need a much closer analysis, though. But ultimately, it may be able to reduce the somewhat sprawling Proto-Khanty vowel system into a more manageable shape.

[1] Infuriatingly, he does not provide any comments on what has motivated the division of the data. There are hints, of course. Much of the “second-tier” data seems to have relatively limited dialect distribution on one or both sides, e.g. only in Northern Mansi, or only in Southern Khanty; or relatively irregular sound correspondences. I get the impression that he considers it likely that some of this data is either unrelated; are parallel loans from some third source; or consists of loans from Khanty to Mansi (or perhaps vice versa). On the other hand, I think even the main part of the data likely contains a number of cases of this kind. Are these oversights, or does he have any actual reasons in mind to consider some initially spotty-looking cases stronger than others?
[2] In their respective C2IFU contributions “Zur Geschichte der Nominalstämme in den ugrischen Sprachen”; “Nominalstämme auf *-a/*-ä im Ostjakischen”.
[3] Bear in mind that Proto-Khanty had a contrast between full and reduced vowels, not in vowel length, and e.g. “long” *aa *uu should be read simply as [ɑ] [u]. “Short” *e is then a reduced vowel, [ə] or [ɪ], and is traditionally indeed transcribed ə in close transcription by fieldworkers on Khanty. Thus, *ej > /iij/ does not involve seemingly unmotivated lengthening, but rather tensing: [jɪ] > [ji].
[4] Published in László Honti’s Festschrift (Ünnepi kötet Honti Lászó tiszteletére). The University of Helsinki library does have a copy, but it’s on loan currently. If by any chance the culprit happens to be reading this, please feel welcome to get in touch with me…
[5] The overall rarity of roots ending in *-Ləɣ in Khanty is not a mystery: it is due to the common (Proto-?) Ob-Ugric metathesis of PU *-lk-, *-sk- > East Uralic *-lɣ-, *-ɬɣ- > OUg *-ɣl-, *-ɣɬ-.
[6] At least two other examples exist of *aa > *ɔɔ before bilabials. 1) ‘Bird cherry’: *jɔɔm in place of expected *jaam, from PU *ďëmə. 2) ‘Hair’: Far Eastern *aawət < *aapət regularly continues PU *ëptə, but other dialects, including Obdorsk, indicate *ɔɔpət. On the other hand, there are counterexamples against assuming a regular change, e.g. *kaam ‘coffin’ (~ Mansi *kaməl), *kaap ‘boat’ (~ Mansi *këëpə), *saam ‘scales’ (~ Mansi *sëëmə, < PU *sëmə).
[7] To be exact, Steinitz and Honti only claim this about tense *üü, *öö, *ɔ̈ɔ̈. PKh reduced *ö has labial reflexes more widely in WKh, including fronted [ɵ] in SKh. However, this is only the case adjacent to velars; elsewhere we see the expected delabialization to *e. I would propose that this development involves “double cheshirization” (and is areally connected to the same in Southern Mansi): *kö > *kʷe, then re-coloring: *kʷe > South [kɵ] (= phonemically /ko/), North /kuu/.
[8] For a starting point, see e.g. E. Helimski (1999): “Umlaut in Diachronie – Ablaut in Synchronie: Urostjakischer Umlaut und ostjakischer Ablaut.” — Diachronie in der synchronen Sprachbeschreibung. Mitteilungen der Societas Uralo-Altaica 21: pp. 39–44.

Workflows in historical linguistics

A few too many of my blog posts seem to end up ballooning into mini-articles and consequently spend months if not years languishing in my drafts. Let’s see if I can keep this one brief.

An adage sometime seen in historical linguistics is “classification before reconstruction”. On one level, I agree. But, on a few others, this seems to be often abused as an excuse to skimp on proper rigor.

What this means, in my opinion:

  • It’s not possible to do comprehensive comparative reconstruction work with data from unrelated languages. Reconstruction can only be attempted once we have a reasonable amount of certainty that some particular language family exists at all.

What this does not mean:

  • Classification having to precede work in historical phonology entirely. Realiable classification cannot be done by vague casual eyeballing of data. “A reasonable amount of certainty” for the relatedness of some particular languages requires being able to locate regular sound correspondences within their shared vocabulary (preferrably non-trivial ones, but any regularity is a start). [1] In the absense of regular sound correspondences, all vocabulary comparisons can potentially be suspected to be either coincidental, or loanwords rather than strict cognates.
    In other words: sound correspondences are not reconstructions, in themselves. In the case of binary comparison, this distinction may end up blurred, since it’s possible to kind of put together an initial “trivial reconstruction” by just listing all your correspondences, and giving each of them some kind of a vague phonetic label. [2] If the family has more members, though, the bare sound correspondences typically end up looking more like networks — since sound correspondences are not transitive. If /tʃ/ in language 1 can correspond to /s/ in language 2, and /s/ in language 2 can correspond to /h/ in language 3, this does not automatically guarantee that a correspondence /tʃ/ ~ /h/ between 1 and 3 would be demonstrable, or even expected at all. Perhaps /s/ in language 2 is a merger of two separate proto-phonemes; perhaps these correspondences do continue the same proto-phoneme, but under mutually exclusive conditions; perhaps one of these correspondences indicates loanwords after all and not native vocabulary.
  • Subclassification having to precede reconstruction. On the contrary, it is reconstruction that often allows us to put together arguments in favor of subgroups, by providing a root for our sound correspondences. If we have a correspondence such as t ~ t ~ s ~ s, it’s likely that either the t-group or the s-group has innovated, and constitutes a subgroup. But it is also very possible that the other group has not, and is paraphyletic. Without reconstruction work, this is not resolvable.
  • Reconstruction being unable to inform classification. A reconstruction of the parent of a set of languages might end up coming out closer to some other language, that we may have suspected (but haven’t dared to declare) to be also related. It could even turn out that this language newly under comparison is not only related, but it is indeed a direct descendant of this same proto-language; just a very divergent one! — Or maybe the proto-language turns out to be substantially less similar to the other language being compared, and the earlier suspicion of a relationship evaporates entirely, or has to be reanalyzed as a late loanword layer.
  • Language isolates‘ history being unreconstructible. Internal reconstruction combined with loanword evidence can allow identifying probable sound changes and lexical intrusions just fine… though I suppose it will be unlikely to get especially far with this technique.

A more detailed workflow for historical linguistics, if starting from zero, would therefore look something like the following:

  1. Acquire data; sort out some initial vocabulary comparisons that look promising.
  2. Analyze sound correspondences; use these to look for more comparisons.
  3. Look at the big picture to see if some particular subset of languages should be indeed considered related.
  4. Attempt reconstructing the proto-language.
  5. Use the proto-language POV to clarify the status of issues like problematic etymologies, possible external relatives, or possible subgroups.
  6. Use modified analyses of data to improve the proto-language reconstruction.
  7. Iterate 5 and 6 until you’ve run out of insights to gain from the data.

This could also work as a kind of a typology of how far along research on a particular language family is. To date, I don’t think any language family has yet exhausted stage 7. Most are stuck in limbo somewhere around stage 3; only a few have reached stage 5, and Indo-European might be the only one to have indisputably gone through one cycle of stage 7. Big disputed hypotheses grouping well-accepted families together can probably be divided according to if they’re closer to stage 1 (e.g. Amerind, Nilo-Saharan) or stage 2 (e.g. variations of Nostratic). Smaller disputed hypotheses often seem to be either at stage 2 or stage 4, depending on who you ask (e.g. Altaic). (To which I might reply: if these really are supposed to be already at stage 4, bring on stage 5, please.)

Of course there are many major facets of historical linguistics still missing here. We also want to account for typology at some points, morphology too at others, semantics three, periodically research loanwords and then weed them out of the proto-language, maybe entertain some substrate hypotheses.

[1] Some people will claim that vocabulary is strictly optional and you can show relatedness solely on the basis of grammar. I am skeptical; but if this were to be the case — then the implication is that we will not be doing any lexical reconstruction work at any point at all.
[2] Maybe with subscripts to disambiguate overlapping sets if you’d prefer, but anything goes in principle. If your heart desires to see more wingdings in linguistics papers, there is nothing formally wrong in re-labeling a t ~ tʰ correspondence as *☕.

Consonant clusters growing, wilting and syllabic

From a Uralicist perspective, one thing that I find goes underappreciated in Indo-European studies is the extensive phonotactic complexity of most IE languages. Certain types of studies on PIE consonant clusters can be found these days in abundance, yes… but these mostly focus on the resolution of the most extreme things that the morphology of PIE, with its abundant zero-grade morphemes, can come up with: monstrosities like *HHR-, *CRH-, *RHC-, *-CHCR-. The fate of the more common, though still remarkable on a worldwide scale, consonant clusters like *bʰl-, *sp-, *tw-, *-zd-, *-ktj- appears to be considered basically trivial. (I am open for reading suggestions, though: IE studies is a big field and I expect I am still missing out on many specifics.)

Within Europe, at least the fate of simple two-consonant initial clusters really is at least mostly trivial, though. The Germanic and Balto-Slavic languages retain most PIE initial clusters fairly well, incidental changes in the individual consonants aside (as in *tw- > English thw-, Lithuanian tv-). Latin and Greek are not far behind, though they mostly get rid of *sR clusters (as in e.g. slime ~ līmus; snow ~ nix). We would have to look at Albanian and the more eastern languages (Armenian, modern Indo-Iranian) before seeing major cluster simplification or transformation trends. As for Celtic, Tocharian and Anatolian, I can’t say I have much of a handle on the big picture at all… which is one reason why a detailed overview of phonotactics issues in the IE languages, either from the perspective of particular classes of clusters or particular languages’ overall histories, would sound appealing to me.

To be fair, it’s not as if this kind of a thing has been done much in Uralic studies either. There have been a few phonotactic analyses of the cluster stock in various reconstructed proto-languages, though with naïvely synchronic methodology. From a more firmly diachronic angle, a few interesting topics that may require more detailed investigation could be

  • the nearly complete cluster simplification trends in Permic, Hungarian and Enets, transforming the inherited *(C)V(C)CV root structure into roughly √(C)V(C)(V). To a lesser extent similar things happen also in e.g. Mari and Proto-Samoyedic.
  • the rise of numerous complex clusters in Mordvinic, e.g. in initial position, Erzya kši ‘bread’, kšna ‘strap’, pśkiźems ‘to have diarrhea’, promo ‘gadfly’. This seems to run a bit too deep-set to be blamed just on late Russian influence: the first two are earlier Baltic or Balto-Slavic loanwords (~ Fi. kyrsä ‘loaf’, hihna ‘strap’), the last two native Uralic (~ Fi. paskoa ‘to shit’, paarma ‘gadfly’).
  • the slightly less daunting but still strong expansion of consonant cluster complexity in Finnic (as I’ve briefly covered before) and Samic, probably mainly due to Indo-European loanwords.

But back to IE, for a few scattered observations.

At least one of the initial consonant clusters reconstructed for Proto-Indo-European is an exception of sorts to any retention tendencies, even from an European perspective. This is *sr-: the cluster is alien to most European languages today, even ones that may otherwise allow sibilant+/r/: English shr-, German schr- from earlier *skr-. (The Slavic languages do have newly created examples though, generated after syncope; e.g. Polish srebro ‘silver’ < *sьrebro.) Given the wide palette of word-initial clusters of the type CR- and even sTR- tolerated in IE languages, this is a notable hole in the system.

In Greek *sr- is simplified the usual way, through *s-aspiration, yielding word-initial ῥ- /rʰ/. Elsewhere, however, special developments seem to kick in.

Germanic and Balto-Slavic share here a non-trivial isogloss: *sr (of any position) is resolved by epenthesis of *t, generating correspondences such as stream, Latvian straume, Polish strumień ~ Greek ῥεῦμα (< *srew-m-os, *srew-m-eh₂). The change has however not reached standard Lithuanian, which still has e.g. sraumuo; [1] therefore showing that this is a relatively late diffused sound change, not a data point in favor of a Germano-Balto-Slavic proto-dialect. Perhaps even one that has been innovated multiple times in parallel: homorganic stop epenthesis in clusters of continuant+glide is commonplace after all (æmyrge > *emrə > ember in English surely requires no especial connection with hominem > *homre > hombre in Spanish), and while the phonetic development is less trivial here, the prior existence of *str- has probably helped to motivate *t-epenthesis.

This sound change likely also accounts for the intrusive -t- in ‘sister’ in Germanic (sister etc.) and the relevant parts of Balto-Slavic (OCS сестра, Old Prussian swestro, but again, Lithuanian sesuo; and as I’m looking these up, I am also learning that Latvian has apparently lost this word entirely!). This was probably generalized from the genitive, *swesrés or *susrés. Some degree of analogical support from the mother, father, brother, daughter group surely has played a part as well, but I would think the fact that this only occurs in languages that also show *sr > *str as a general sound change is not a coincidence.

This development also seems to have interesting interaction with the PIE syllabic consonants. Some time ago I ran across a small article by Krzysztof Witczak (1991), “Indo-European *sr̥C in Germanic“, which proposes that this epenthesis also took place before syllabic *r̥. The evidence is scarce but looks believable. Interestingly, this then demonstrates that at some point an actual syllabic [r̩] must have indeed occurred in Germanic (contra some of my earlier suspicions that some kind of an epenthetic schwa might have been hanging around all along in here).

Also, returning to ‘sister’: while I have no ready means to see if this checks out in the other older Germanic languages, Wiktionary actually gives a PGmc genitive *swesturz > Gothic swistrs, which looks more like pre-Gmc *swesr̥s.

Even more interestingly, there seems to be some evidence for similar business also in Baltic.

The word for ‘roe deer’ in Latv. and Lith. is stirna, corresponding to Slavic *sьrna. These look like derivatives from the ‘horn’ root, *ḱer(h₂)-, or in particular the derivative *ḱr̥(h₂)nos, as reflected also in e.g. Germanic horn. Derksen’s etymological dictionary of Baltic (2015) has no comment other than that “the anlaut is problematic”… I suspect however that the Baltic words could be explained by a development *šr̥ > *str̥, taking place before the breaking *r̥ > *ir. [2] This all will also have to be later than *ḱ > *š, but this is already assured to be quite early by the evidence of loanwords in Finnic.

On the other hand, there are more than enough other words, even derivatives from this same root, that show no such epenthesis, e.g. Old Prussian sirwis ‘roe deer’ < *šr̥wis (whence also Fi. hirvi ‘elk’); Latvian sirsenis, Lithuanian širšė ‘hornet’ < *šr̥Hšō (whence also Fi. herhiläinen). To get around this issue, we would probably need to assume either dialect mixture of some kind — as will be already required to explain why we have *t-epenthesis now showing up in Lithuanian also. An irregular shift from *šr̥nos to *sr̥nos might also work. (Or as long as I’m fucking around with relative chronology, even the regular shift of *š to *s in Latvian?)

This is moreover complicated by how all these words must be, to some degree, analogical anyway. The reason for this is “Weise’s Law”: [3] the neutralization of *Ḱr- and *Kʷr- as *Kr-, common to all Satem languages. We would again not expect this to distinguish between syllabic *r̥ and non-syllabic *r, and apparently the Sanskrit data indeed confirms this. Thus Balto-Slavic *šr̥nas and other such derivatives (including, from Sanskrit, śiraḥ ‘top’ < *ćr̥Has) would have to be assumed to get their palatal onset by analogy with the abundant other derivatives of *ḱer(h₂)-. So… another possibility is then that stirna is the earliest word where *ḱ > *š was restored in this way, followed by epenthesis, followed by the remaining cases of analogical *š-restoration.

Or maybe this is all barking down the wrong root entirely. Something that also looks worth further investigation is if the Baltic words for ‘roe deer’ might be actually rather cognate with German Stirn?

A different angle on getting rid of *sr- is exhibited in Italo-Celtic: > *θr- > fr-, reflected at least in Brythonic (e.g. Welsh ffrwd ‘stream’) and in Latin (the best examples seem to be word-medial and have an expected further development to -br-, e.g. crābrō < *kr̥Hsrō 'hornet'). Irish has what looks like retained sr- (e.g. sruth ‘stream’). Schrijver proposes that this is a reversal from the *θr stage, [4] but given the situation in Baltic, I would not bet on it. Note that reversal in Lithuanian is clearly not possible, since inherited *str- remains. Again, it seems plausible that the first stages of the Goidelic/Brythonic split go far back enough that the latter could have still participated in common developments with Italic.

Irish also seems to have a general shift *st- > s- (ser ‘star’, sab ‘staff’, etc.), so actually even an earlier development of the Germanic-Balto-Slavic flavor is theoretically possible.

A quick scan-over of IE etymological sources at my disposal reveals no special developments of *sr̥- in Celtic or Latin. LIV has two Latin examples that seem to have retained s-: sariō ‘I hoe’ < *sr̥h₃yé-, sarciō ‘I mend’ < *sr̥kyé-. Witczak's article gives Latin fariō ‘salmon trout’, compared with the Germanic sturgeon word family and derived from *sr̥Hyón-; but this also seems to come from Old Latin sariō, thus aligning with the previous group. That these all have -ar- rather than the usual -or- as the reflex of *r̥ however probably indicates a relatively early epenthesis of *ə > *a. Schriver reconstructs a rule *CCCC- > *CaCCC- being already common Italo-Celtic (argued in full in The Reflexes of the Proto-Indo-European Laryngeals in Latin).

At any rate, the moral is that simplifications or epentheses in consonant clusters of the shape *CR might make a more general opening for investigating the history of the PIE syllabic sonorants.

I’ve another example as well, though probably less illustrative. Sticking still to the European languages, there is perhaps something to be made of PIE *Tl-. Word-initially this was a rare cluster, but one established example is *dl̥h₁gʰos ‘long’ (> e.g. Slavic *dьlgъ, Greek δολιχός, Sanskrit dīrgha-). Now, the Baltic languages are known to have word-medially eliminated *-tl-, *-dl- by dissimilation to *-kl-, *-gl-. So would we find a similar initial development here?

We do not; but we do find something unusual: wholesale loss of the initial consonant, resulting in Lith. ilgas, Latv. ilgs! Perhaps this could be again explained by assuming word-initial *Tl-, *Tl̥- > *l-, *l̥-, already before *l̥ > *il? A previously known case with non-syllabic *Tl- is Lith. lokys, Latv. lācis ‘bear’ ~ Old Prussian clokis ‘bear’ (which would then show that this simplification is Eastern Baltic specifically). Unfortunately, there are again also several counterexamples with *Tl̥- > *Til-, e.g. Lith. tiltas, Latv. tilts ‘bridge’ < *tl̥h₂tós. Go figure…

[0] This post has been prompted by me resuming work for a little while on constructing a reference table on the fate of PIE consonant clusters on Wikipedia.
[1] Jānis Endzelīns (1973), Comparative Phonology and Morphology of the Baltic Languages: 73 informs that other dialects of Lithuanian, however, do have this change, and so we can also rule out this as a datapoint in favor of a Latvian-Slavic grouping (as has sometimes been suggested). Interestingly even Old Prussian has this epenthesis, so this all could instead testify for the Latvian-Lithuanian split, maybe even some of the inter-Lithuanian dialect splits, going quite a while back. — Most evidence I’ve seen in favor of the East Baltic group in fact looks quite easy to reinterpret as more or less areal: e.g. the sound change bundle *ai > *ei > *ē > ie is basically trivial, and has parallels in most neighboring languages (the first in Slavic, Scandinavian and core Finnic; the second in Swedish and Livonian, as well as Slavic in a different form; the last in Western Slavic and in most of Finnic).
[2] I’m not going to start probing the issue, but a sound change or two along the lines of *št > *st might also help in explaining the famously inconsistent application of RUKI in Baltic; e.g. Lith. pisti (not ˣpišti) ‘fucks’ ← PIE √peis- ‘to crush, push’.
— It also just now occurs to me that western Uralic *pisə- ‘to put, stick (in)’ (Samic, Finnic, Mordvinic, Mari) is probably derived from this last-mentioned IE root. This contrasts with widespread native Uralic counterparts: #pënə- ‘to put’ (absent only from Samic and Hungarian), #texə- (maybe *tejwä-??) ‘to push’ (F, P, Hu, Ms, Kh), *puskə- ‘to poke’ (S, F, Ms, Kh), which is usually a good indication for an innovation of some sort.
[3] An old idea, but only recently named and reviewed by Kloekhorst. — I would suggest though that his group of six counterexamples involving derivatives of the type *CeḰ-ro- should not be accounted by “phonetically regular analogy”: they might rather indicate Weise’s Law applying only to syllable-initial palatovelars (*Ḱr-, *-Ḱr̥-) but not to syllable-final ones (*-Ḱ.r-). This would also cover his three counterexamples of the shape *CeḰ-ru-, in which case there is then no need to date the law as any older than common Satemic.
[4] Schrijver, Peter (2015): “Pruners and trainers of the Celtic family tree“.

