The case of Mansi *ś > *š, part 1

A long-standing mystery of Uralic historical linguistics is a split representation of Proto-Uralic *ś in Mansi. Aside from confusion between *ś and *ć (widespread across the entire Uralic family), there are also two plain sibilant reflexes: *s and *š.

What I gather is the current best explanation, as stated e.g. in the two handbooks (by Sammallahti in ’88 and Honti in ’98), is *š being the default reflex; *s would have originated by dissimilation before another palatalized consonant, and would then have spread to other positions by some unexplained mechanism. This implies the split would have been initiated during the original depalatalization of *ś, i.e. at the Ugric or possibly even the East Uralic stage.

I think this is completely on the wrong track. On an examination of the evidence, it seems to me that there is a different, fairly simple solution.

To begin, let us consider the word-initial position. Before the PU illabial non-open front vowels *i and *e, leading to Proto-Mansi *i, *ee and *ä, the main development is *ś > *š:

  • *śeðäm(ə) > *šim ‘heart’ [1]
  • *śekä > *šiɣ ‘catfish, burbot’ [2]
  • *śenä > *šeenəɣ ‘mushroom’
  • *śepä > *šip ‘neck’
  • *śilmä > *šäm ‘eye’
  • *śilä > *šilt ‘fat (of bear)’ [3]

Before the PU labial front vowel *ü > PMs *ü, and the open front vowel *ä > PMs *ää, the main development is *ś > *s:

  • *śüðʲə > *süĺ ‘coal’ [4]
  • *śülkə- > *süĺɣə- ‘to spit’
  • *śäŋ(k)ə- > *sääŋk- ‘to break’
  • *śäŋkə  > *sääŋkʷ ‘hip, loin’
  • *śäxə > *säɣ ‘plait’ [5]

The development *ś > *s is also frequent before all back vowels:

  • *śarta > *surtə ‘(young) reindeer’
  • *śarma > *surəm ‘hole in ceiling’
  • *śawŋa > *suuw ‘pole’
  • *śojə- > *suj- ‘to sound’
  • *śajəmɜ > *saajəm ‘brook’
  • *śo/ëðka > *sëëĺ ‘duck’
  • *śëla- > *sëël- ‘lightning’
  • *śëlka > *sëëɣla ‘pole’
  • *śëmə > *sëëm ‘fish scales’
  • *śura > *sar ‘narrow’
  • *śuka > *sow ‘bark’
  • *śurə-ma > *sorəm ‘death’
  • *śuwə > *su-nt ‘mouth’
  • *śuwə > *suw-ĺ ‘clay’

By this point it is easy to craft a hypothesis: at some point in the development of Mansi, PU *ś, or perhaps rather the depalatalized Proto-Ugric *s, was palatalized to *š before the descendants of original *i, *e. [6] This would have been phonemicized when the lenition *č > *š occurred. I think this already stands well up to the apparently commonly accepted “dissimilation” handwave: I count 6 examples of *s before a palatal/ized consonant, versus 13 before something else, not exactly a strong track record.

Still, some exceptions to the coherent picture above are found as well. I’ve gotten together 12 cases that have proposed cognates outside of Ob-Ugric. Many of them seem likely to not be actually inherited from Proto-Uralic though.

First, in two words it appears that PU *ä yields PMs *i, each with a different development of the initial consonant:

  • *śäkśə > *siɣəs ‘gull’
  • *śälä- > *šil-t- ‘to cut’

I’ve commented before that the former of these does not have a particularly good claim to being an inherited PU root. The latter word could be assumed to have acquired its non-open vowel sufficiently erly to have participated in the same palatalization as words where *i goes back to PU *e. Though it also seems there is a possibility that /š/ in this word does not go back to Proto-Mansi: the Lower Konda dialect normally retains *š (/šiɣ/ ‘burbot’, /šip/ ‘neck’, etc.), but here it has /silt-/.

One other case of *s being found before *i in Mansi is *sir ‘order, way’. This has cognates in many Uralic languages, most of which incidate PU *i or *e (e.g. Komi /śer/, Mari *sʏr, Hungarian szër), and this appears to be a clear exception. I wonder if loaning from the Khanty cognate *siir could be the explanation however: the vowel correspondence *i ~ *ii is etymologically quite irregular, but would make phonetic sense as a substitution (the vowels in Khanty have a normal : overshort contrast). Moreover, the Mansi word is mostly not attested from the dialects in the least contact with Khanty: Southern Mansi, or any Western dialect other than that of Pelymka.

A truly puzzling word is *säjə or *sejə > *säj ‘pus’. The other Uralic languages are all in quite good agreement that the original initial was plain *s-, not palatalized *ś- (e.g. Samic *sējë, Mordvinic *sij, Khanty *ɬöj, Hungarian ev). No matter how Mansi ended up retaining *s- here, this case cannot tell us anything about the development of PU *ś. [7]

There is also a bunch of words where *ś > *š is supposed to occur before back vowels. Five of these are more or less widespred, while three others have suggested cognates only in one Uralic branch beside Mansi.

  • *śëta > *šëët ‘100’
  • *śora- > *šuurl- ‘to dry’
  • *śarwə > *šaarəp ‘horn’
  • *śosra > *šaatər ‘1000’
  • *śuðʲa > *šaĺ ‘frost’
  • Samic *čoanē ~ Ms *šun ‘sleigh’
  • Finnic *sopa ~ Ms *šup ‘shirt’
  • Finnic *sampi ~ Ms *šupəɣ ‘sturgeon’

The last three I think can be discarded offhand. Mansi *u usually derives from PU *u (*luwə > *luw ‘bone’, *purə- > *pur- ‘to bite’, etc.) and does not regularly correspond with Samic *oa or Finnic *o or *a. The last item also has a discrepancy between Finnic *mp and Mansi *p. For the second-to-last, a loan etymology from Mongolian /čuba/ ‘coat’ might be possible.

Considering the other five, a striking fact is that no less than three of these are Indo-Iranian loanwords (while only one IE loan occurred among the 26 words I suggest to have developed regularly: *śëlka ← *ǵalgo- ‘pole’):

  • PIE *ḱm̥tóm > PII *ćata ‘100’
  • PIE *ǵʰeslo- > PII *ȷ́ʰasra ‘1000’
  • PIE *ḱerw- > PII *ćr̥va ‘horn’

I propose that these words, previously thought to have been loaned into Proto-Uralic (or the illusory ‘Proto-Finno-Ugric”), were actually loaned after the separation of Ugric, sufficiently late that in some ancestral stage of Mansi, *š was available as a substitute for the initial palatalized consonant. I actually have further arguments supporting this position for each of the words, to be detailed later.

As for ‘to dry’, I suspect late loaning from Komi /šural-/ ‘to dry’, which is from an unrelated PU root *šorwa-. The -l- formant, the unexpected short vowel in Sosva Mansi /surl-/, and the distribution in only Western and Northern Mansi all seem coherent with this explanation.

I have no ideas on what to do with ‘frost’.

Also left for later, for now: investigating the reflexation of *ś in word-internal positions, and in common Ob-Ugric vocabulary absent elsewhere in Uralic.

[1] West Uralic cognates such as Finnic *südän point to *ü, but East Uralic rather supports *e: cf. e.g. Hungarian szív, Khanty *sem, Selkup *siićə.
[2] Normally reconstructed as *śäkä, but actually only Finnic *säkä, *säkiä points to this: Mansi, Khanty (*seɣ) and Mordvinic (*śijə) all agree on *śekä. Mari *šij is ambiguous between these options.
[3] In light of Nganasan /sela/ this word may however instead rather belong among the words where *ä > *i.
[4] Proto-Mansi *ü in this and ‘to spit’ remains something of an anomaly, considering that I’ve proposed doing away with this PMs vowel altogether. At the Proto-Uralic level *ü is well established here though, and thus there is no problem in assuming that *śü > *sü. FWIW I’ve entertained the idea that perhaps the contrast between *s, *š was actually [sʷ], [s] at an earlier stage — but this doesn’t quite work with cases of *s next to the illabial PU *ë > PMs *ëë, such as ‘lightning’.
[5] Irregularly shortened. Finnish säie ~ Khanty *sööɣ- confirms the reconstruction with original *ä.
[6] Assuming palatalization only before illabial front vowels is not anything special: this has an exact parallel in Nganasan, where Proto-Samoyedic *ki, *kü are reflected as /si/, /ki/. The Finnic assibilation of *t to *c before *i but not *ü can also be compared here.
[7] An early assimilation *s-j > *ś-j might be a tempting assumption here, but this cannot be a regular development either, per *so/ëja > *tëëjət ‘sleeve’.

11 comments on “The case of Mansi *ś > *š, part 1
  1. Juho says:

    Additional note on Proto-Mansi *u: You might find my dismissal of three items with PMs *šu- supposedly from PU *śa- or *śo- (“sleigh”, “shirt”, “sturgeon”) unfair, when I also include three items with PMs *su- supposedly from PU *śa/śo- (“reindeer”, “hole in ceiling”, “to sound”) in the data. Actually a few of these may need reassessment as well.

    “Young reindeer” /surti/, is only attested from Northern Mansi and the Lower Konda dialect of Eastern Mansi, and corresponds irregularly to Khanty *suurtïï. This could have been loaned from Khanty in a similar way as I suspected of “order, way”. This would also explain the LK form: normally, PMs *u yields LK /ʊ/.
    (Note to self: check if any examples of this correspondence Ms *i, *u ~ Kh *ii, *uu have a more southern/western distribution.)

    “Hole in ceiling”, *surəm, is aside from Mansi only attested in Samoyedic. Perhaps this should be reconstructed as PU *śurma → Proto-Samoyedic *sərmå rather than *śarma → *sårmå. Both would be compatible with Nenets /saːrwa/, and though Enets /samaʔa/ seems to indicate PSmy *å, there is Kamass /-zəro/ (apparently only found in compounds?) which seems to indicate PSmy *ə.

    Only in “to sound” does the *u seem genuine. Here also Khant *soj- indicates former *u (cf. *loɣ “bone”, *por- “to bite”). Perhaps there was a PU variant *śujə-, alongside *śojə- as indicated by e.g. Finnish soida. OTOH, there are a few parallels, such as PU *kojə, *kojə-ma → PMs *kuj, *kum “man”, of the development *o → *u before *j.

  2. David Marjanović says:

    I propose that these words, previously thought to have been loaned into Proto-Uralic (or the illusory “Proto-Finno-Ugric”), were actually loaned after the separation of Ugric, sufficiently late that in some ancestral stage of Mansi, *š was available as a substitute for the initial palatalized consonant. I actually have further arguments supporting this position for each of the words, to be detailed later.

    Exciting! :-)

  3. Merol Muspi says:

    What is the etymology of the Finnish -nen/-s heteroclitics?

  4. A small note on the word meaning ‘smoke hole’: NenT сарва = /sarwa/ quite unambiguously points to PSam *sårmå. From PSam *sǝrmå we would expect NenT сăрва = /sărwa/ (or /sǝrwa/, /sørwa/ acc. other notations) – PSam first syllable *ǝ > NenT /ă/, whereas PSam *å > NenT /a/.

    Still, it seems quite likely that the correlation between Ms *surǝm and PSam *sårmå reflects some kind of borrowing rather than PU inheritance – the distribution is too narrow.

  5. Jyri says:

    Sorry to take this a bit off-topic (I think I’d be willing to follow you on the main points of PU *ś(i/e) > PMns *šV). But it just catches my eye when you talk off-hand about the ‘illusory “Proto-Finno-Ugric”‘. Can you recapitulate the most important reasons for forgetting about any kind of PFU completely?

    I agree that as a reconstructed phonological system it is probably illusory, but I can’t see a reason to dismiss it as a protolanguage, because of common lexicon (I think there’s a demand for proof that all of it is retentions), derived forms (just off the top of my head, PFU *ńoma-la vs. PSd *ńoma ‘hare’, maybe even (PFi *korva ~ PPe *kor <) PFU ?*kaw-ra vs. PSd *kåw 'ear'), loans from early Indo-European subgroups, etc. So why not a discrete protolanguage stage but with isoglosses that had already formed earlier?

    • Jyri says:

      Also, to my point about early Indo-European borrowings, you nicely bring up the irregularity of a certain etymon, i.e. PFMrd *śata ?~ PMa *šüdö ?~ PPe *śo ?~ PMns *šëët ‘100’. This is great. Note that this word can’t be used as evidence for the original value of PU *ś as having been really ś. More on this later.

    • Juho says:

      Welcome to the blog, Jyri!

      If I read you right, you are in agreement that there is no evidence for a uniform PFU stage, and that there is furthermore some evidence for a East Uralic stage that comprised Ugric and Samoyedic? This I think is already sufficient reason to do away with the concept.

      What you propose instead I find rather strange, though: a protolanguage that would already have featured “isoglosses that had formed earlier”? This, in my opinion, is a conceptual misunderstanding. A “proto-language” by definition is the last single, uniform ancestor of a set of languages. It follows that no proto-language ever had dialectal variation: as soon as isoglosses that only affect a part of the speaker group have been introduced, we are dealing with a post-protolanguage dialect continuum stage.

      There are two factors that probably contribute to confusion here. First, contemporary sister-dialect varieties of any proto-language quite probably did exist. Any such ones, however, must have left no direct descendants (at least, none in the language group we’re considering!) Or in other words: a proto-language in the strict sense was only one, tho ultimately the most successful, of the dialectal variants of its original speaker community. Second, internal variation occurs in all natural language varieties… but stylistic, sociolectal or free variation must be here distinguished from geographic, between-dialect variation. It is only this last type of variation, that iterated, gives rise to language diversification. For that matter, I do not believe we even have any methodological tools to probe the question of other types of language variation in reconstructed languages of the past?

      We could still perhaps speak of a Finno-Ugric areal unit of mutually intelligible dialects that existed shortly after the breakdown of the Proto-Uralic unity. Yet this probably coexisted with a similar East Uralic or perhaps Siberian Uralic areal as well. “Is in dialect relationship with” is not a transitive relation and cannot be used to partition a dialect continuum into multiple distinct units.

      (Alternately, were we to relax the definition and to allow overlapping proto-languages, we’d end up with quite confusing terminology, e.g. describing Mansi being simultaneously descended from “the eastern dialect of Proto-Finno-Ugric” and “the western dialect of Proto-East-Uralic”; obscuring how these would in actuality refer to the one and the same post-Proto-Uralic dialect.)

      This approach would allow finer precision as well: we could consider smaller yet, but still more or less simultaneous groupings — such as the “Central Uralic” areal, comprising Permic and Ugric (previously suggested by Abondolo and Helimski). This should help with the issue of lexicon: it is entirely possible that ancient Permic-Ugric contacts have played a part in accentuating the differences between Ugric and Samoyedic. Any innovative vocabulary has a high chance of spreading across dialect boundaries and I do not believe strong evidence for genetic subgrouping can be based on it.

      Also, the number of words found in all eight Finno-Ugric branches in some form, yet not in Samoyedic, is actually close to trivial: no more than 18. Two of these (“100”, “horn”) are moreover known II loans, and as stated above, I believe they were actually acquired independently in the Ugric languages. A 3rd possible IE loan is found among these as well (Fi. keri, cf. PIE *sker-). And once we do allow loss to occur in the other languages as well, the idea that Samoyedic might have lost a noticable part of its key Uralic vocabulary under strong foreign influence is not so unthinkable either.

      A similar story holds for the supposed derivational innovations as well. E.g. in the case of korva, the meaning “ear” seems to have been only a Finnic innovation that replaced the more widely-spread PU term for “ear”, *peljä, since the Permic and Hungarian cognates indicate an original meaning “flap, stalk”. Or, in the case of “hare”, the word is absent entirely from Ob-Ugric as well as Mari — so it seems worth contemplating if the l-element in Hungarian nyúl (which is, after all, merely a “deminutive” affix carrying no semantic information) might be a diffused innovation during the early dialectal period from some more western Uralic idiom?

      • Jyri says:

        I find no such definitions helpful that seem to be confusing. If a protolanguage is the last stage of a uniform system of forms, it’s not like a real language; it’s an idealized system of an earlier dialect. Languages have dialects, why not to say protolanguages have protodialects – and of course protolanguages can be in contact with other (proto-)languages – instead of making up terminology saying different protolanguages have been involved in areal units?

        The areal diffusion of derivational affixes is of course possible. But for some derivatives to have a Finno-Ugric spread in my opinion points to deeper borders than just variation in phonological or morphological form. What seems to me to be more like a dialectal, and later, areal unit is “Proto-Finno-Permic”. But we’ll see.

        • Juho says:

          It would be, of course, possible to introduce the concept of “proto-language dialect”. This regardless seems to be a new addition to the standard terminology of historical linguistics. Normally, when we speak of a proto-language, we speak of a single well-defined language variety, with a single well-defined phonology, lexicon, grammar, etc. After all, the purpose of the concept is to showcase what features among a set of languages are shared due to common inheritance, and not due to later convergence. This purpose will end up abandoned if we conflate the concept of a proto-language as a point of common origin, and the concept of a continuum of related dialects. Hence I do not see what benefit this approach brings (other than, perhaps, adherence to traditional Uralistic terminology).

          Moreover: where can we draw the line if not at the introduction of heterogeneity? If a pineapple is known as ananas from Turku to Käkisalmi, does this “shared innovation” indicate that we all still speak Proto-Finnish to this day; if perhaps separated into many distinct proto-dialects? I see no qualitative difference between a case like this… and one in which a word such as *śëta is recognized to have been introduced separately to (at minimum) pre-Ugric and pre-Finno-Permic, but regardless will be considered as present in “a” Proto-Finno-Ugric language.

          Ultimately, when I speak of the “illusority of PFU”, what I actually claim is that there never existed a stage during which none of the Finno-Ugric groups had in any way begun to separate, that would not also be ancestral to the Samoyedic languages. And by contrast: that for any well-founded proto-language, this kind of a past situation did exist.

          Also, epistemological side note: strictly speaking, the Finno-Ugric spread of any single innovation of course is evidence for a genetic status of the group. But the counterevidence has to be weighed as well! If we ignore the evidence for competing possibilities, or fail to seek any in the first place, we only end up confirming our initial biases.

          — The question of a “Proto-Finno-Permic” is almost entirely independent from the question of a “Proto-Finno-Ugric”. I find the grouping to still have some potential, but this is not so much due to better absolute evidence as much as the absense of strong competing proposals. This in turn might be more due to a lack of investigation than due to a lack of evidence. There is much work yet to be done on the earliest development of the original Uralic dialect continuum…

  6. David Marjanović says:

    Assuming palatalization only before illabial front vowels is not anything special: this has an exact parallel in Nganasan, where Proto-Samoyedic *ki, *kü are reflected as /si/, /ki/.

    English has this, too: every /kɪ/ of native origin (kin, king…) comes from Old English cy.

    • David Marjanović says:

      …with the exception of the largely defunct diminutive suffix -kin, which has /ɛ/ throughout at least Continental West Germanic.

