Phonology squib: ‘Clay’ in Proto-Uralic

I have a principle that applies quite often when working with quantity-over-quality mass comparative dictionaries (papers, databases, etc.): what is asserted without evidence can be dismissed without evidence.

The UEW is, unfortunately, a repeat offender on assertions without evidence. This comes up maybe the most with its own reconstructions, which do not seem to follow any definite scheme: there definitely isn’t one expounded on anywhere in the book, and to my knowledge none of the editors have published detailed papers on the topic, either. [1] This results in many junk reconstructions that seem to have only been hastily eyeballed together, sometimes with crass errors.

To avoid excess alarmism though: by “its own reconstructions”, I mean only a subset of the Proto-Uralic (Proto-Finno-Ugric, -Permic, etc.) reconstructions presented, those that seem to have been put together for the first time by the UEW team. Many of the reconstructions are however not all-new, and have been inherited from earlier research. Maybe the most direct source is Collinder’s Comparative Grammar [2], but various bits also trace back to earlier studies on historical phonology, such as Itkonen’s comparative vocalism surveys, or Paasonen and Setälä’s early 1900s Neogrammarian works that mainly involved consonantism, or even the 1800s comparative dictionaries of Budenz and Donner. Alas, none of this is explicitly referenced, and so the reader is left in the dark. Determining what, if anything at all, some particular reconstruction is based on would take a wild goose chase through the un-annotated list of literature found at the end of each entry.

(For non-specialists in Uralic reconstruction, as a quick rule of thumb I would say: any reconstruction with cognates in Finnish + at least two other Uralic subgroups can be treated as relatively safe; so can all remaining reconstructions that are continued in 6+ subgroups, which are usually given in bold; anything continued more narrowly is in principle suspect; anything prefixed with a question mark should be treated as unreliable entirely.)

Even if many of the UEW’s reconstructions are junk, this does not however imply that the etymological comparisons they are attached to would also be. Sometimes it will be fairly easy to work out a better reconstruction. Today I have taken a look at a word for ‘clay’ that the UEW reconstructs as *śojwa, and noticed that this seems to not match any of the descendants given…

Not absolutely everything is wrong, of course. The consonant skeleton *ś-jw- works well enough: we have entirely regularly Samic /č-/ ~ Permic /ś-/ ~ Samoyedic /s-/, and S /-jv-/ ~ P /-j-/ ~ Smy ∅ is reasonable. But the vowel reconstruction *o-a seems to be not really defensible.

  • In Samic, we have reflexes only in Kola Sami: Kildin /čuwwj/ (though apparently чуййв in the written language), Ter /čujjvɛ/. These nominally suggest Proto-Samic *čujvē — but, from earlier *śojwa, we would instead expect to see PS *čoajwē > Kola **čuəjjve. Compare PU *ojwa ‘head’ > PS *oajvē > Kildin /vuəjjv/ вуэййв, Ter /vɨəjjvɛ/.
  • In Permic, we have *o > Komi /o/ ~ Udmurt /u/. This is not a regular reflex of *o: it instead usually continues PU *a or *e. There are various other claimed cases of *o > *o (at least *kojə-ma > *kom ‘male’ — the source of the ethnonym Komi — seems unassailable, even if still possibly irregular), but normally we would expect *o-a to give *u.
  • The Samoyedic examples are a bit hard to assess offhand: we have reflexes only from Selkup and Kamassian, and so Janhunen’s Samojedischer Wortschatz leaves this word unconsidered. /üü/ in the former can go back to various pseudo-diphthongs; including *åj (*såjtə- > /süütɨ-/ ‘to sew’), *oj (*tojmå > /tüüm(ɨ)/ ‘larch’), *uj (*jujtə- > /küütäptɨ-/ ‘to dream’), *əj (*pəj > /püü/ ‘stone’), even *äj (*päjwä > /püü/ ‘warm(th)’). Kamassian /e/ does not seem to match any of these on a quick checkup, but there are probably various conditional developments involved that blur the picture. PU *o-a regularly gives PSmy *å-(å), so maybe the first is what we should bank on… However, in an *A-stem, *jw would be expected to remain in PSmy; and result then in *ľć in Selkup. [3]

The Kola Sami ~ Permic vowel correspondence can be however quite well derived from *a-a; developing to *ō-ē in Proto-Samic. This normally later gives /uu/ in Kildin, /ɨɨ-ɛ/ in Ter, but presumably (see below) earlier *uu was shortened here to /u/ before it could unround in the latter. *a-a also gives Samoyedic *å(-V), i.e. works at least as well as reconstructing *o-a.

I would also favor reconstructing medially *-wj- instead of *-jw-. UEW, I imagine, bases the latter on Ter Sami; however this is actually non-diagnostic, since in the language, there is regular metathesis of PS *-vj- to *-jv-. The Kildin form should be therefore instead taken as evidence for *-wj-. (In literary Kildin Sami, it seems that Ter-esque -ййв- is preferred in place of *-vj-, e.g. тоаййв ‘often’, while T. I. Itkonen’s Koltan- ja kuolanlapin sanakirja gives /tɑwwj/. Does this maybe stand for dialect variation within the language?) This in mind, the ad hoc-sounding shortening (*a > *ō >) *uu > *u also makes decent phonetic sense: we’d be dealing with [uːw] > [uw], a contrast that seems difficult if not impossible to maintain.

I believe no exact precedents are known for the development of *-wj- in Permic, but in general *-w- is lost always, while *-j- remains at least in various clusters; so *-wj- > *-j- seems about as good as could be expected. As for Samoyedic, *-w- is lost syllable-finally: this means we’d expect *śawja > *såj(V), which is at least a decent contender for the Selkup-Kamassian preform. (Preferrably not *såjå; contrast *kåjå > Kamassian /kuja/ ‘sun’. *-a > *ə is however quite common in Samoyedic, maybe in particular after (original?) consonant clusters.)

Altogether, I end up with the conclusion that all words given by UEW under *śojwa are better considered to continue Proto-Uralic *śawja.

These adjustments also open some new vistas. They allow the possibility to consider that my new and updated reconstruction might be a part of the same original root its established synonym: *śawə (UEW: *śawe). This is continued directly only in Finnic (*savi > Fi. savi etc.), but also in various derivatives: *śawə-nV in Mordvinic (*śovəń > Erzya & Moksha сёвонь) [4], Mari (шун) and Komi (сюн); *śaw(ə)-d₂V in Mansi (*suwľ(V) > Northern сӯли) and Khanty (*sawəj > *sawïï) [5]. It seems therefore likely that also the *śawja group is similarly originally a derivative *śaw(ə)-ja. The exact morphology going on remains however mysterious. *-nV is only known as a vague diminutive suffix; *-ja usually forms action nouns; *-d₂V is, to my knowledge, not reconstructible for Proto-Uralic at all (there may be one other parallel within Ob-Ugric though: *ńooɣəď ‘meat’, maybe *ńaKV-d₂a).

It would be also possible to shuffle the *-ja and *-d₂V groups around a bit: *-j in Khanty and Samoyedic can continue either just as well. At least the Mansi form with *ľ and the Samic & Permic forms with *j however must be distinct from each other.

[1] Editor-in-chief Rédei has arguably taken some steps towards this in his 1968 article “A permi nyelvek első szótagi magánhangzóinak a történetéhez” (NyK 70: 35–45). His “pre-Permic” vowel system does end up being identical to the Proto-Uralic vowel system that is currently accepted the most widely, but this may be just a happy accident: he makes no effort at all on the issues of if and how the other Uralic languages could be derived from the same system; and his treatment of which particular original vowel should be assumed in which particular words is very patchy as well, covering only some incidental examples.
[2] His Fenno-Ugric Vocabulary gave only comparative data; their associated reconstructions were only given in an appendix to CompGramm., wherein he had presented his thinking on Uralic comparative phonology and morphology as well.
[3] This oddball soundlaw probably proceeds something like *jw > *jj > *jɟ > *ʎɟ > *ʎtɕ = *ľć.
[4] *o is, I believe, due to the following development: first *a-ə regularly > *å-ə > *o-a, followed by a conditional split: *o > *u before a velar sonorant (regularly established in the case of *-oŋ- and IMO also occurring in the case of *-olk-); lastly *u > *o.
[5] With Kazym /sŏwĭ/, Krasnoyarsk (Southern) /săwə/ regularly retaining PU *-w-.

  1. Ante Aikio says:

    Thanks for an interesting etymological squib! A couple of comments came to my mind:

    As regards Finn. savi, I think we also need an explanation for why Vote savvi (GEN savvõõ) and Estonian sau (GEN saue) suggest an original geminate *v (PFi *savvi : *savve-). The same sound correspondence occurs in the words savu ‘smoke’, ovi ‘door’ and povi ‘bosom’.

    As for Mari šun, the for in the Malmyzh dialect is sun, which problematically points to PMari / PU *s. From PU *ś one would expect PMari *š which gives Malmyzh š-. I have no idea how this could be accounted for – is Mari šun ’clay’ unrelated to Mordvin śovoń ‘clay’ and Komi śun ‘blue clay’ after all? And then on the other hand, these forms could perhaps also be compared to Inari Saami čunoi ‘fine sand’, North Saami (dial.) čunu (gen čudno) ‘fine sand’ (Proto-Saami *čunōj). One could perhaps reconstruct something like *śuwinV(w), which then gave *śuwnoj = *śūnoj in Pre-Proto-Saami. Finn. savi ‘clay’ would then have to be etymologically unrelated.

    Because Komi śoj (Jazva Komi śu̇·j) and Udmurt śuj ‘clay’ could also reflect a front vocalic form (*śeji, or the like), they could perhaps be instead compared to quite different Ob-Ugric forms: cf. Khanty VVj sej, Sur sej, sĕj ‘sand on a shore’, Ni sej ‘sandy shore, sand’, Kaz sej ‘sand’, O săj-χis ‘quicksand’ (χis ‘fine sand’) (Proto-Khanty *säj), Mansi LU So sēj ‘sand, mud’, sēɣi ‘sand’ (Proto-Mansi *sīj, *sīɣī).

    The Selkup word for clay does not have (Proto-)Selkup *ǖ but the rare Proto-Selkup front diphthong *üǝ instead (see Sölkupisches Wörterbuch under *süǝ). The exact background of this diphthong is unclear to me, and there are only a couple of examples of it in inherited vocabulary.

    I’d suggest a different path of development for the “oddball soundlaw” *jw → *ľć in Selkup. Because *w developed into *k in Selkup, the first intermediate step was probably *jk. After this the velar was palatalized by the preceding *j, which itself changed to *ľ.

    • j. says:

      I think we also need an explanation for why Vote savvi (GEN savvõõ) and Estonian sau (GEN saue) suggest an original geminate *v

      My current hypothesis is that words of this type mostly reflect PF *sau, *pou etc. with vowel stems analogically rebuilt, sometimes with gemination in the process. SE Tavastian Finnish appears to retain the original PF monosyllabic nominatives: kiy, lou, ou, sau (for kivi, lovi, ovi, savi), and also käy for the 3PS past kävi, which have been thought to show “regular” but fairly out-of-place apocope of *-i after /v/.

      Reconstructing *CVU is perhaps more secure for the group savu, tävy, vävy where we have CVU reflexes quite widely across the Finnic varieties (and also *käü- > Fi. käydä but Livvi kävyy, Ludian kävydä); they seem to represent soundlawful unpacking of *-AU into *-AvU across most of Northern Finnic.

      It’s conceivable that the *śɜwɜn group could be originally distinct, but in the case of *śuwVnV I would expect **śɨn in Komi. In Ob-Ugric though, Mansi /sēj/ seems likely to be a loan from Khanty. The correspondences (in traditional notation) *ii ~ *ee or *ii ~ *öö are very rare and never occur in inherited vocabulary. The only examples attested also in Western and/or Southern Mansi seem to be *miinkʷ ~  *mööŋkʷ ‘forest goblin’, perhaps likely to be a substrate loan, and *piiľ- ~ *peeL- ‘to stick’, where you suggested a while ago that the Khanty form actually continues PU *pisə- (and therefore the Ms form has to be either a loan or unrelated). (Words with Ms *uu ~ Kh *oo also have similar issues.) If so, maybe the Khanty form is then a very old loan from Permic, with *śaj(V) (before > *śoj) being borrowed as *säj(V) > sej?

      The Selkup word for clay does not have (Proto-)Selkup *ǖ but the rare Proto-Selkup front diphthong *üǝ instead (…) The exact background of this diphthong is unclear to me

      A good survey of vowel-glide stem contraction in Selkup, or the development of the Selkup vocalism in general, would be very nice to finally have around one of these days.

