(Part ca. 3 of n in my irregularly scheduled series of Introducing Named Soundlaws in Uralic Studies. [0])
The issue, as I see it
Most of the vowel correspondences we now think to be regular between Samoyedic and the rest of Uralic are those that were outlined by Janhunen in 1981. The actual sound laws behind them have regardless often gotten re-tooled or re-dated by now, much in the same way how many of them already had earlier precedents in some form (primarily from Lehtisalo or Steinitz). E.g. the chainshift *e > *i, *ä > e has been by now shown by Helimski to be post-Proto-Samoyedic, given Nganasan evidence for *e > †e > i̮. On follow-up, also the reflexes of *ä > “*e” can be relatively open in some languages: Salminen (2012) has pointed this out about modern Forest Enets (e.g. *tät³tə > tät ‘4’), and to me it seems e.g. that the conditional developments *ä-a, *ä-å > *a in pre-Selkup also seem to presume an open value for *ä. Cf. *ān-uj ‘true’ < PS *änå, or *kuəsə ‘iron’ < *wåsV < *wasV < PS *wäsa.
What I call “Janhunen’s Law” is, though, not any sound change in Samoyedic, but a proposal that he had in the same paper for an innovation in some uncertain amount of western branches: PU *oCə > *uCə. Sammallahti (1988) indeed adopted it as an already Proto-Finno-Ugric innovation. Since then though there does not seem to have been too much support for it — but then neither critique or any other analysis either.
On any kind of closer look, it does seem clear this cannot be quite as simple as Janhunen suggests. First of all, also a correspondence western *o ~ PS *o exists. Janhunen identifies two examples: *koj(-wV) ~ *koəj ‘birch’, *kopa ~ *kopå ‘bark’. This number can be increased: clear examples also include *koj(ə)ra ~ *korå ‘male animal’; *kokə- ~ *ko- ‘to check, see’ (all of these with *ko-, but this looks simply accidental; *ko- > *kå- can be also attested in e.g. *kåmpå ‘wave’, *kåsə- ‘to dry’, *kåət ‘spruce’). Possibly also *ńoxə- ~ *ńo- ‘to pursue, hunt’, though Janhunen assumes that Finnic *nouta- continues earlier *ńux-ta-, thru a similar lowering as in *sou-ta ‘to row’ ~ PS *tu- < PU *suxə-, and this does not look entirely impossible.
I’ve observed already long ago (first presented at the 2nd International Winter School of FU Studies in Szeged in 2014) that there seems to be evidence for further conditioning. First, all of Janhunen’s positive examples involve front consonants in the medial consonantism: alveolars and labials. Four cases are immediately unambiguous:
- *lumə ~ *jom ‘snow’;
- *kusə- ~ *kot- ‘to cough’;
- *purə- ~ *por- ‘to bite’;
- *tulə- ~ *toj- ‘to come’.
I would add first of all two cases that should be reconstructed with *-w- and not, as proposed by Janhunen, *-x-:
- *śuwə ~ *śo(-j) ‘mouth, throat’; *-w- is clearly indicated by Southern Sami tjovve.
- *tuwə ~ *to ‘lake’; *u reflected at least in Permic *ti̮. Original *-w- seems to be indicated by Northern Khanty *tŭw, Konda tŏw, and maybe the oddly front-vocalic təw in rest of Southern Khanty. [1]
Probably even a third is *luwə ~ *lë ‘bone’. *-w- is again indicated by Western Khanty forms — mostly rhyming with ‘lake’, e.g. Konda tŏw, other Southern təw, Nizyam tŭw, Kazym ɬŭw (but in Obdorsk lăw, versus tuw ‘lake’). Samoyedic *ë could indicate a shift *ëw > *ow in other languages already before *o-ə > *u-ə (a tentative Proto-Finno-Ugric innovation — though this seems a bit too trivial and devoid of parallels to be relied on for that).
One additional example that was not known to Janhunen shows a palatalized alveolar medial: *wuďə ‘new’ ~ *oj- > North Selkup oć-əŋ ‘again’, a neglected etymology from Helimski (1976). [2] Note further that positing *o > *u here explains the rare initial combination *wu-, not reconstructed anywhere else in Uralic vocabulary and probably phonotactically impossible in Proto-Uralic proper.
Looking beyond Samoyedic, it also seems to be the case that from the evidence of other languages, we cannot really reconstruct word roots of shapes like *CoPə, *CoTə, *CoRə. The best two contenders are *monə ‘many’, *wolə- ‘to be’, but the first is readibly under doubt as being a loan from Indo-European (also Permic *-mi̮n, Mansi *-mān, Hungarian -vAn in names of decads does not particularly have to be related to ‘many’ in Finnic and Samic), and the latter looks more likely to have been *walə-. On the contrary, many reconstructions of the shape *CoKə have been already presented: at least *jokə ‘river’, *rokə- ‘to hack, cut’, *soŋə- ‘to enter’, *šokə- ‘to say’, *toxə- ‘to bring’; maybe also e.g. *poŋə ‘bosom’, *oŋə ‘hole’ (if not rather *poŋŋə, *aŋə). I take this also as grounds to suppose that there has indeed been a sound change *-oCə > *-uCə, for C ≠ velar.
I suspect also palatal *-j- might have blocked raising: cf. *kojə ‘male’ (though this is mostly continued in derivatives like *koj-ma, *koj-ra). An interesting case on this front is ‘to swim’, usually reconstructed as *ujə- per Finnic (Finnish uida, Estonian ujuma etc.), but most cognates (clearly at least Samic *vōjë-, Mordvinic *uj-, Permic *uji̮-, SKhanty üj-) better point to *ojə-. As I’ve noted by now in a talk from 2018, even within Finnic, Livonian vȯigõ (? < *oi-kV-) seems to still retain *o. The reflex in Samoyedic, on the other hand, mysteriously enough, is still indeed *u- or *uj-.
An alternative view?
The only counterproposal in any clear detail that I’ve seen comes from Jaakko Häkkinen, first in his Master’s thesis and later, much more briefly, on his 2009 paper on locating Proto-Uralic. He suggests inverting Janhunen’s Law, to apply in Samoyedic and not outside of it: *CuCə > *Co(C). I have seen / heard something similar by other colleagues in a variety of discussions, but I do not recall any defense of this being published. At most, see some discussion in this blog’s comments starting here, with Ante Aikio listing some notes about *o ~ *u variation within Samoyedic and additional irregular-looking examples of *o. Among these I would doubt at least the reconstruction PS *počå- ‘soak, ooze’, though. This probably refers to the words appearing in UEW under *poča- ‘become wet’; but Nganasan and (with irregular b-) Kamassian seem to point rather to *påTå-, with evidence for *o limited to Nenets–Enets. Or, since (old) Nganasan fo- can continue not just *på- but also *pə-, and Enets has o < *ə regularly, another option, maybe better still, would be that this was *pəčå- in PS after all, as would be expected per the Udmurt, Khanty and Mansi cognates; and that the Nenets word is a loan from Enets, while the Kamassian word doesn’t belong here at all. (Donner’s original data actually has not just a voiced b but palatalized bʲ, which is also difficult to explain.) In some other examples I don’t see any particular reason to think that they point to secondary *u > *o rather than secondary *o > *u (thus so maybe in “*num” ‘heaven’) or to *o at all (thus so in Nganasan tui ‘fire’ for expected ˣtüi: this looks like unclear retention of *u, which has other parallels).
Anyway, the major problem that I see in the inverted approach is explaining where Proto-Samoyedic *Cu(C) then comes from. There is solid evidence at least for a rime *-uj:
- *tuj ‘fire’ < PU *tulə (a minimal pair with *toj- ‘to come’!);
- *uj ‘pole’ < PU *ul(k)ə;
- *kuj ‘spoon’ < PU ? *kujə (cf. Finnish kuiri ~ kuiru ‘id.’; I am not committed either way on if proposed Komi and Ob-Ugric cognates meaning ‘trough ~ mortar’ belong);
- *puj ‘eye of a needle, etc.’ < *pujə.
The last two probably show PU *-jə > ∅ and PS *j as some derivative suffix, [3] but this alone cannot explain *u rather than *o, since also the latter readily occurs in CV stems: *ko-, *ńo-, *to, *śo-j. A few PS roots also show *u: natively at least *tu- ‘to row’ < PU *suxə; of unknown origin, *ku- ‘cord’, *ju ‘warm’ [4]. Some other CVC examples can be found too, including *pur ‘smoke’ < PU *purkə; *ut ‘road’ < PU ? *uktə. But at least these two examples we might argue to be irrelevant due to continuing PU *u in an original closed syllable, just with exceptional loss of *-ə after some probably very early cluster simplifications.
As comes to the lack of PS roots of shapes such as **Cup, **Cun, **Cuŋ, this could indicate that something happened to such cases, but it doesn’t follow that the result must have been *o. Other options would readily include reduction to *ə, already suggested by Janhunen in e.g. *təŋ ‘summer’ < PU *suŋə.
Future hypotheses
So far I do side with the hypothesis that Janhunen’s Law is a real phenomenon. Its exact extent and conditions seem to require review, however. I have some reasons to suspect that PU *o was in *CoCə stems retained not just in Samoyedic, but partly also elsewhere. E.g. *purə- / *porə- ‘to bite’ yields in Permic *puri̮-; *tulə- / *tolə- ‘to come’ yields in Mari *tola-; both more in line with development from *o than *u. An interesting recent discovery, premiered a few weeks ago on Twitter, has also been to note Khanty *lāńć ‘snow’ (> e.g. Surgut ɬ´åńť, Nizyam tɔńś, Obdorsk laś). UEW derives this from a distinct *ľomćɜ, listing here also some derivatives of PS *jom and probably incorrect Kola Sami reflexes meaning ‘frost’. But if we did reconstruct *lomə and not *lumə already in PU, the Khanty words, too, can be simply considered derived reflexes, at the PU level seemingly *lom-ća: *o-a > *ā is regular, and there does not seem to be counterevidence to assuming *mć > *ńć. Closer review might identify more cases like these that support the reconstruction of PU *o in the involved words.
As more of a long shot, there are also two unclear cases where evidence for *o might be found in Indo-European. For one, ‘to bite’ seems compareable with PIE *bʰe/orH-, root meaning probably ‘to strike, pierce’. The PU verb also probably meant specifically ‘bite thru’ (in contrast to *soskə- ‘to chew’), coming fairly close to ‘pierce’. Its descendants can be also used not of just biting with teeth, but also working with tools (cf. e.g. Fi. sahanpuru ‘sawdust’, as if “saw-biting”) — similar later development is attested in derivatives on the IE side too (Latin forō, Germanic *burō- ‘to bore, drill’) [5] and LIV goes as far as to give a gloss ‘mit scharfem Wergzeug bearbeiten’. Distribution all the way into Samoyedic makes it difficult to assume loaning, though, while a hypothesis about an old Indo-Uralic cognate would not, at the current state of research, rule out an original *u that was lowered to ablauting *e/o in PIE. — For two, there is Finno-Mordvinic *unə ‘sleep’, which Koivulehto (1991) has already compared with Greek ὄναρ, ὄνειρο- and explained exactly thru Janhunen’s Law: early IE *oner → early Uralic *onə > *unə. Whether the Greek word goes back far enough in IE for this to be feasible looks very dubious to me though, especially when there is a much better-attested PIE word for ‘sleep’, *swépnos.
A yet further possibility I would wish to look into in more detail in the future is, does the raising of *o that we seem to see really have the “same” *o as its starting point as is usually reconstructed in PU? Namely, traditional PU *o is in Samoyedic by default lowered to *å — such that its “survival” in Janhunen’s Law cases really looks to be also innovative really. As outlined in yet another presentation a few years ago, I have also developed a hypothesis that the unbalanced inventory of rounded vowels in Proto-Uralic: *ü *u *o but no **ö, probably comes by a chainshift from pre-PU *u *o *ɔ. (I have not discussed this on the blog in detail so far and, alas, cannot do so right now either.) Then, the common tendency of PU *o to be lowered to *a / *å probably indicates that this chainshift had actually not fully taken place by PU: that “*o” was really still open-mid *ɔ. Janhunen’s Law positions, however, look like they might have already had close-mid *o. This would allow us to do away with a raising that happened all across “Finno-Ugric” with seemingly no motivation, while still also not folding the vowel correspondence entirely into PU *u.
There would be also another option on the relationship of this *o with my pre-PU *u *o *ɔ. Rather than early raised cases of (pre-)PU *ɔ, they might be also straggling non-raised cases of pre-PU *o… And then was this *o really just an allophone of *ɔ either? *u is a very common vowel in PU, and perhaps this is partly because even some further cases should be likewise reconstructed as *o. This might be possible if we identified other evidence for it than retention as *o in Samoyedic. For the sake of example, one case might be Mansi *u: PU *u yields in Proto-Mansi either *u, *ŏ, *ă with no very strong conditioning apparent. (Some similarly open issues remain in Khanty and Hungarian.) So just maybe … could it be that PMs *u is a sign of PU *o as distinct from both *u and *ɔ in general? such that not only will we then reconstruct PU *por- ‘to bite’ (> PMs *pur-), but also e.g. *końćə ‘urine’ (> PMs *kuńćə), with *o > *u now also in Samoyedic in this environment (> PS *kunsə)? This would even have a good parallel among the front vowels: PMs *i is generally from PU (close-)mid *e, not from close *i. — But in the interests of putting these notes finally out at least in a somewhat assembled form, I will leave this line of thought open for now.
[0] See previously at least: Lehtinen’s Law; Moosberg’s Law; and one that definitely requires a name but I’m still mulling over what to call it precisely is *Ä-backing in Finnic. Several future installments remain planned too.
[1] On the contrary, an irregular fronting already in Proto-Western Khanty would also account for most of these reflexes: *tŭɣ > *tü̆ɣ > *tĭɣʷ > *təw, preserved in SKh and giving NKh *tŭw (cf. e.g. ‘fall’: PKh *sü̆ɣəs ~ *sü̆ɣs > SKh səwəs ~ süs, NKh *sŭws or *sūs). But it seems preferrable to me to restrict this irregularity to Southern Khanty and treat Konda tŏw and NKh *tŭw as regular reflexes. — Maybe there is some possibility that the SKh development here and in ‘bone’ can be explained as *ŭw > *ū > *ǖ > *ü̆w > əw, leveraging the known fronting *ū > *ǖ? It doesn’t look like *ŭw and *ū actually contrast at all, so the first step here might be entirely virtual.
[2] Хелимский, Е. А.: О соответствиях уральских a- и e-основ в тазовском диалекте селькупского языка. – Советскoе финно-угроведение 12: 113–132. No cognates known elsewhere in Samoyedic, but the simplification *wo- > *o- would have to be pre-PS anyway, since by PS a new *wo- does exist and per two examples yields in Selkup *ko- as expected: *woəj > *ko ‘island, hill’; *wotå > *kotə ‘blueberry’.
[3] Though, since PS shows *r > *l / C_ in various suffixes, could it be possible that after *j, the resulting cluster further coalescend to *ľ, and then evolved into just *j as usual? In this case Fi. kuiri and PS *kuj could both go back to PU *kujrə (now with no especial reason to suspect a suffix in there).
[4] For a formal match and semantics within speculation distance, cf. PU *luwə ‘south’ ≈ ‘direction where the weather is warm’?? Seems unlikely but not impossible.
[5] And cf. further PU *pura ‘drill’, also already proposed to be an IE loan. So far it seems morphologically unclear to me how to connect this with either the PU or PIE verbs, though.
ü can break: first, Bavarian unrounded the rounded front vowels; soon thereafter, L-umlaut struck in Central Bavarian, turning e.g. *il into *ü; then, this *ü, preserved in the east, broke in western Central Bavarian, yielding [ui].
We very briefly discussed this in August 2021, starting here. :-)
The problem with Khanty *lāńć ‘snow’ is that it should be reconstructed as *Lāńć on account of Verkhne-Kalymsk jäńt́əŋ ‘snowy’ (DEWOS 871). This dialect, along with Vasjugan, preserves the difference between Proto-Khanty *l (from PU *l) and Proto-Khanty *L (from PU *s and š).
Well, that’s troubling. But maybe initial *Ľ, as we see in Surgut and Southern, can sometimes also yield Vj. VK /j/. A parallel would look to be /jäľ/ ‘war’, with reflexes elsewhere mostly suggesting *ĽǟĽ. A conditional development before another palatal, perhaps?
Incidentally, this latter word vaguely looks like PU *śoďa ‘war’, as if with a few place-of-articulation assimilations and irregular fronting. Straightforwardly we’d expect PKh **sāj, but instead seem to get ca. *sāĽ > *ĽāĽ > *ĽǟĽ.
PKh *Lāńć ‘snow’ shows exactly the same consonant correspondences as PKh *Lāńć- ‘to stand (tr.)’ (from PU *sańća-): assimilation to a following palatal consonant in Surgut and Irtysh, but not in Nizjam, Kazym and Obdorsk. So, I would expect that the identical correspondences result here from the same sound changes, which requires PKh *L- in ‘snow’.
I have suggested (in an academia.edu session on Aikio’s UED) that the Khanty word for ‘war’ is a Mansi loan. The reflex of *ćoδ’a is not attested in Mansi, but we would expect something like *sVĺ. The evidence of (Indo-)Iranian loanwords shows that the “sibilant shift” happened in Khanty later than in Mansi, so a scenario “Mansi *s from PU *ć gets borrowed as Khanty **s > *L” is possible.
PS *ju ‘warm’ reminds me of Meadow Mari ju ‘(pleasant) coolness’, Hill Mari ju ‘chill’ (< ? '*unpleasant coolness') < PMar. *jûw 'coolness'.
I wonder if something like PU *juxV 'moderately warm' could explain both the Samoyedic and Mari words.
Or the ancestral ‘*lukewarm, tepid’, to be more precise
Incidentally, Alexander, I have taken a cursory look at your reconstruction of PMari first-syllable vocalism and will consider it more deeply over the coming days, but you may be interested in some of the dialectal data I presented in my article in this year’s issue of Finnisch-Ugrische Forschungen, which will be published this month, as it covers similar ground. Where you reconstruct a tense vowel, however, my working hypothesis for some of these correspondences has been a Proto-Mari vowel sequence *-Və-. I would also reject the UEW etymology for ter ‘sled’ < *tärja, but I ultimately removed my alternative etymology (comparing it to other material in Permian) from that article, as I didn’t have time to fully work it out, but I may get around to that next year.
With regard to your new Mari etymologies, your etymology of iksa ‘bay; channel’ (also ‘deadwater’) does not consider those dialectal forms where an affricate is found (JT ikca, Eastern Mari (Veršinin) ikč́a, Birsk jikč́a, possibly also MK ikś), which IMO points to PMari *(j)iŋća instead.
And for pə̑dal- ‘defend’, Aikio has already prepared for publication an Uralic etymology from *pintili (which I believe was first presented on this very blog).
I am aware of this etymology of ‘defend’, but according to Aikio himself (“Mari etymologies”, p. 90), the compound suffix *-intili- yields PM *-edəlä-, which I almost entirely agree with (*-edəl-, since -ä- does not belong here). A frequent source of PM *e in non-first syllables are pre-PM sequences with *-j- (in this case, from *iń *pińtili- > PM *pîjdəl- > E piδəl-, W piδəl-. (Cf. cases such as *penti- > *pińdə- > PM *pîjdə- ‘to bind’.) At the same time, E pə̑δal-, W pəδäl- point to *pɪdäl-, and it is difficult not to identify *-äl- with the well-known semelfactive/perfective suffix.
As for ‘bay’, I agree that the forms with an affricate are challenging the Chuvash etymology. I haven’t registered all the Eastern Mari forms that may be diagnostic for the PM reconstruction yet, but I am going to do so before publishing it in a proper form.
A part of my post was clipped because of wrong symbols: read
“(in this case, from *iń *pińtili- > PM *pîjdəl- > E piδəl-, W piδəl-.”
as
“(in this case, *j from *in through the intermediate *iń). In the first syllable I would expect *pintili > *pińtili- > PM *pîjdəl- > E piδəl-, W piδəl-.”
For ä-backing, perhaps Kallio’s Law, as from your writeup it seems Kallio was the first to establish true regularity?