Observations on second-syllable vocalism in Khanty

This summer I’ve finished digitizing the main bulk of comparative data from László Honti’s Geschichte des obugrischen Vokalismus der ersten Silbe (1982): his 724 Proto-Ob-Ugric reconstructions and their descendants in the individual Mansi and Khanty varieties. Before making this available in any form though, I’m planning on eventually cross-checking at least a few other key sources. For one, there are Steinitz’ DEWOS, Kannisto’s recently released Vogulisches Wörterbuch, and some other materials for additional dialect coverage; for two, there are UEW and similar sources covering inherited vocabulary that has only been retained in one of Mansi and Khanty; for three, I will be also adding the data Honti includes but considers uncertain (this part already underway). [1] A potential fourth extension could be the known loanwords of Komi / Turkic / Tungusic / older Russian origin, at least whenever attested in both Mansi and Khanty: they should be able to offer substantial evidence for constraining speculation on historical phonology.

However, even at this stage, the data can be assumed to contain a substantial part of the inherited lexicon of Mansi and Khanty. So I have taken the opportunity to do some preliminary comparative analysis.

One interesting underresearched topic is second-syllable vocalism, which actually includes even the basic groundwork within Mansi or Khanty. This might have importance for Uralic comparison in general, since our current understanding of Proto-Uralic word stem types comes mostly by extrapolation from Finnic and Samic. Although the basic division into *A-stems (~ *a-stems & *ä-stems) and *ə-stems (~ *e-stems or *i-stems) finds some substantial confirmation from Mordvinic and Samoyedic, it fares substantially more poorly with Mari, and within Permic and Ugric, there is not too much direct evidence to work with second-syllable vowel contrasts at all the first place. Attempting to reconstruct other second-syllable contrasts from conditional vowel developments in the first syllable is theoretically possible (I believe Zhivlov (2014) is still the most recent example of this), but this carries often a risk of circular logic, and if low on data, may also run into accidental correspondences between unrelated phenomena.

There is regardless some direct evidence of second-syllable vocalism in Ugric. Looking in the rest of this post at Khanty in particular: the Khanty evidence has been explored in the 60s in some aspects by Gerhard Ganschow and Gert Sauer, [2] but mostly the topic has gone without detailed research. Steinitz’ Geschichte des ostjakischen Vokalismus (1950) does not treat the subject and only focuses on the first-syllable system.

A few overview notes on unstressed syllables, without detailed analysis of the data, are given by Sauer in Die Nominalbildung im Ostjakischen (1967) and Honti in Chrestomathia Ostiacica (1984). These outline a division into five stem categories:

  1. Basic consonant stems (the most common).
  2. *A-stems, with an open full vowel (*ää, *aa). Decently preserved in inlaut (verb roots, CVCAC and other longer stem types), but in absolute auslaut in the nominative of noun roots, the vowel is widely reduced and possibly lost entirely.
  3. *I-stems, with a close full vowel (*ii, *ïï). Preserved somewhat more widely, again better in inlaut than in auslaut.
  4. A third vocalic stem type, yielding *I-stems in Eastern Khanty but *A-stems in Western Khanty.
  5. *əɣ-stems: these behave as ordinary consonant stems in EKh, but vocalize in WKh to merge with the *I-stems.

This certainly covers most of the bases. A close look at the comparative data, however, suggests that this picture should be probably modified and perhaps also expanded.

The *I-stems, reinterpreted

I would propose as an initial adjustment that the *I-stems are to be reinterpreted as a part of the consonant stems: as *əj-stems. This is indirectly suggested by the absense of stems of the shape *CVCəj from the Proto-Khanty lexicon, even though *CVj is well-attested (e.g. *ɬöj ‘pus’, *pooj ‘ice crust’, *saaj ‘goldeneye’) and examples of *CVjəC occur too (*kaajəm ‘ash’, *waajəɣ ‘animal’). Direct support is provided by at least *ńooɣïï ‘meat’, *sooɣïï ‘clay’, cognate to Mansi *ńaawľ, *suwľ respectively. Instead of parallel suffixation, these can be analyzed as reflecting the typical sound correspondence Mansi *ľ ~ Khanty *j (< PU *ď; the intermediates are not obvious, but that question is irrelevant for now). Dating *-əj > *-I as a proto-Khanty innovation does not seem to be possible either, since in verb stems, Southern Khanty still retains /-əj-/: e.g. ‘to break’, Far Eastern (Vakh-Vasyugan) /aarïï-/ ~ Southern /oorəj-/. (And see below for some related considerations concerning the *əɣ-stems.)

This also accounts for a minor typological paradox. Why is second-syllable *-I better retained than *-A in the Khanty dialects, even though we would expect a close vowel to be more readily subject to reduction? A promising answer would be that the vocalization *-əj > *-I, despite being reflected in all Khanty varieties, is more recent than the partial reduction of *-A in some varieties. Sound changes *ej > /iij/ ~ /ij/, *je > /jii/~ /ji/ are also well-known in Northern Khanty, [3] and I suspect this is additionally a part of the same wave of vowel coloring, in these varieties further generalized to the first syllable. This would date *-əj > *-I as at minimum more recent than the Southern / Northern split.

I have seen the sound change *-əj > *-I mentioned in various works already (Sauer, Honti, Helimski…), but not anyone willing to bite the bullet and note that this can be taken as the definitive original source of this stem type.

Also, one secondary sound change. It appears that in Obdorsk Khanty, word-final *-I > /-aa/ after /x/: *ńalkïï > /ńalxaa/ ‘Siberian fir’; *ńooɣïï > */ńoxaa/ ‘meat’. This appears related to loss of vowel harmony. In NKh, first-syllable *ïï > *ee before velars instead of > *ii elsewhere, and I suspect something similar is involved here. I would assume first *-kïï > *-xïï > *-xëë, then *-ëë lowers to /-aa/ instead of backness neutralization to **-ee.

This would then seem to show that yes, Western Khanty too (or at least Obdorsk Khanty) has gone through a vowel-harmonic stage with *-ii ~ *-ïï, instead of directly vocalizing *-əj to front *-ii everywhere.

The *Aj-stems

Reconstructing *əj-stems also sheds light on the fourth stem category in the outline above. I would side with Honti in reconstructing these as *Aj-stems. Southern Khanty provides clear evidence in favor: e.g. /xašŋääj/ ‘ant’ ~ Far Eastern /koočŋïï/. Sauer considers /j/ in SKh to be instead epenthetic, generalized from inflected forms where a vowel-initial suffix followed, but we can again appeal to comparative evidence from Mansi, where we find e.g. Northern /xooswoj/. I would add that with correct relative chronology, the development *-Aj > *-I in EKh drops right out of the other attested sound laws, with no need to posit any additional changes particular to this stem category: start with the reduction *A > *ə, follow up with *-əj > *-I.

The origin of the *Aj-stems also appears to be clarifiable. Words such as ‘ant’ point in the direction that they often originate in compounds. I believe that in many cases, their second member is likely the root seen in Ms *wuuj ‘animal’, though found independently in Khanty only in the suffixed form *waajəɣ. Some other *AAj-stems in Khanty that seem to have this origin include: *jeetərɣääj ‘black grouse’; *kaaməɭkaaj ‘water beetle’ (maybe with a first component akin to *koomɭəŋ ‘bubble’); #karŋaaj ‘woodpecker’ (and thus, contra Honti, not segment-for-segment identifiable with Hungarian harkály); #wuurŋaaj ‘crow’.

Many of these words also show irregular vacillation between medial *-ŋ- and *-ɣ-. My hypothesis is that this might be a trace of the PU genitive suffix *-n, and e.g. what I write as approximate #karŋaaj (Obdorsk metathesized /xaŋraa/; Konda spirantized /xaxrääj/; Surgut /kajaaïï/ and Far Eastern /kajərkïï/, maybe by metathesis and dissimilation: < *kaɣərKəj < *karɣəKaaj?) should be thus reconstructed as something like #karkə-n_waaj > #karɣəŋɣaaj, reflecting an original genitive attribute construction: ‘animal of the beak’, or something to that effect.

Compound origin would additionally explain also the complete absense of *Aj-stems among verbs.

It’s also possible I am late to the scene here. I’ve seen references to a 2003 paper by Anna Widmer “Zur Geschichte des obugrischen Tiersuffixes”, [4] and it sounds like this covers this same topic, but I do not (currently) have access to it.

The *əɣ-stems

Among the *əɣ-stems, an interesting complementary distribution appears that I have not seen remarked on before. Many sources note that the reflexation in Northern Khanty in nouns is somewhat inconsistent: in some cases we find Kazym /-i/, Obdorsk /-ii/, the same as in *I-stems; but, in others, we find Kazym and Obdorsk zero. (Southern Khanty and the “transitional” Nizyam dialect have consistently /-ə/ in both cases.) Verbs also only show the development to *-I-.

This split distribution seems to be conditioned by the preceding consonant: *-əɣ > *-I appears after obstruents, *-əɣ > ∅ after sonorants. Some examples of the former:

  • ‘owl’: Vakh /jewəɣ/ ~ Kazym /jipi/
  • ‘Khanty’: Vakh /kantəɣ/ ~ Kazym /xanti/
  • ‘birch bark’: Vakh /tontəɣ/ ~ Kazym /tonti/
  • ‘barbel’: Vakh /mööɣtəɣ/ ~ Kazym /meewti/
  • ‘duck’: Vakh /wääsəɣ/ ~ Kazym /waasi/
  • ‘knife’: Vakh /kööčəɣ/ ~ Kazym /keeši/
  • ‘pine’: Vakh /ɔɔɳčəɣ/ ~ Kazym /wooɳši/

And some examples of the latter:

  • ‘song’: Vakh /äärəɣ/ ~ Kazym /aar/
  • ‘roach’: Vakh /läärəɣ/ ~ Kazym /ɬaar/
  • ‘crane’: Vakh /taarəɣ/ ~ Kazym /tɔɔr/
  • ‘bowl’: Vakh /ääɳəɣ/ ~ Kazym /aaɳ/
  • ‘lightweight’: Jugan /köńəɣ/ ~ Kazym /keeɳ/
  • ‘bog’: Vakh /kɔ̈ɔ̈ɭəɣ/ ~ Kazym /kaaɭ/
  • ‘animal’: Vakh /waajəɣ/ ~ Kazym /wɔɔj/

There is only one example involving Proto-Khanty *L (a cover symbol representing both *ɬ and *l, which are medially neutralized everywhere). [5] It appears to align with the sonorants:

  • ‘rope’: Nizyam /keetə/ ~ Kazym /keeɬ/

Inconveniently, here *-L- continues PU *-d-. It is therefore not possible to clearly tell if we are dealing with Proto-Khanty *-l- or *-ɬ-, since both paths of development have been suggested. In principle, though, this example would support a claim that the development was in fact first to *-l- (a sonorant), as also in Permic / Mansi / Hungarian.

I am not sure how the split development here should be interpreted phonetically, either. The core motivation seems to be a general cross-linguistic one at least: sonorant codas are more licensable than obstruent codas. But at least secondary loss of /-i/ after sonorants is ruled out, since in genuine Proto-Khanty *I-stems (*əj-stems) this remains. Examples are not numerous (by far most occur following /r/), but they exist:

  • ‘riverbed’: Vakh /uurïï/ ~ Kazym /woori/, Nizyam /uurə/
  • ‘sturgeon’: Vakh /köörii/ ~ Kazym /kari/, Nizyam /karə/
  • ‘scab’: Vakh /kaľïï/ ~ Kazym /xaɬ´i/, Nizyam /xaťə/

This thus ends up further supporting my above-suggested chronology, where *-əj > *-ij > /-i/ took place only after the separation of Northern Khanty: the *-əɣ > ∅ group likely never went through an *-əj-stage. In other words, whatever the exact split development here was, it would have predated the common (but not Proto-!) Western Khanty shift *-əɣ > *-əj.

Maybe this could even be equated with the development of post-tonic (“non-stem”) *ɣ to /j/ in Obdorsk Khanty under certain conditions (e.g. ‘father’: EKh /jeɣ/, Nizyam /jiɣ/, Kazym /jiw/ ~ Obdorsk /jiij/; ‘power’: Vakh /wööɣ/, Nizyam & Kazym /weew/ ~ Obdorsk /weej/). This would then require rather early separation between Obdorsk and the other NKh dialects though, perhaps early enough to invalidate the concept of “Northern Khanty” as a genetic group altogether, and turning it into merely an areal subset of Western Khanty varieties.

I would not take this last corollary as a huge problem though, since I actually suspect the same already on other grounds as well… For just two examples:

  • The word for ‘grass’. Far Eastern and Obdorsk have /paam/, while the other dialects have reflexes pointing to *pɔɔm. This surely involves an irregular (“non-provably regular”?) labialization between two bilabial consonants; [6] and yet this labialization cuts across the conventionally accepted grouping of the Khanty dialects.
  • The treatment of supposed Proto-Khanty *ɔ̈ɔ̈ and *öö. These yield in some contexts /oo/ in Obdorsk, but *ää and *ee respectively in the rest of Western Khanty. Yet, the elimination of front rounded vowels is pan-WKh, and e.g. Honti and Steinitz claim it as indeed proto-WKh. [7] But if so, we have to route Obdorsk /oo/ differently. I wonder if another early shunt will work: if, following Helimski etc. we reconstruct lax open *ä, *a instead of *ee, *öö, *oo, then it will be possible to re-route “*öö > /oo/” as *ä > *a > /oo/, involving a pre-Obdorsk conditional retraction of *ä to *a in some environments.

— For some reason, nearly all words of the *-əɣ > ∅ group also involve Proto-Khanty low *aa, *ää, or mid *ee, *öö, *oo (= *ä, *a?). Perhaps there is also something more going on in here. This is also suggested by one example with a close vowel, where in Northern Khanty we find metathesis instead, viz. ‘eight’: Vakh /ńïïləɣ/ etc. ~ Nizyam /ńiwtə/, Kazym /ńiwəɬ/, Obdorsk /ńiijəl/ (< virtual PNKh *ńiiɣəɬ).

I also wonder how the changes *-əɣ > *-əj > *-I would interact with another innovation common to all of Khanty: the cluster contraction *-jt- > *-ć- (often involving the PU verbalizing suffix *-ta-, e.g. in *uj-ta- > PKh *ɔɔć- ‘to swim’). The more economical approach — that *-jt- > *-ć- was Proto-Khanty while *-əj > *-I was post-PKh — would however predict that we should find cases where an *I-stem noun or intransitive verb has a corresponding intransitive or transitive verb (respectively) ending in *-əć-. Offhand I cannot locate any such cases, however. But maybe this type of derivation was morphotactically impossible in the pre-PKh period? For comparison, in Finnic *-i < pre-PF *-j is a common suffix of deminutive nouns, and *-i- < *-j- is a common suffix for iterative verbs, but these generally do not form further verbal derivatives: any corresponding verbs are instead formed from the underived root.

At least one word also suggests the possibility of *əj > *-I being earlier than the contraction to *-ć-: ‘to split’, Vasyugan /ɭaaŋkïït-/ ~ SKh /laaŋxət/ ~ Kazym /ɭooŋkit-/, where we would seem to have PKh *ɭaaŋkəjt-. However, this could also be a later derivative, formed after *-jt- > *-ć- had ceased to operate.

There also seems to be a lack of PKh words ending in coronal + *-I, that is, earlier  *-təj, *-səj, *-nəj, *-Ləj. (There are a few examples with a /Ct/  consonant cluster though, e.g. *aŋtïï < *aŋtəj ‘horn’; *maartïï < *maartəj ‘mythical land of birds’.) Maybe this indicates a parallel palatalization, and pre-Khanty *-Cəj or *-CjV resulted in a stem-final palatal instead of an *I-stem. Stems of the shape CVĆ are not very common in the current dataset either, though. But maybe any examples of this simply have not been connected with their equivalents in Mansi or elsewhere in Uralic yet?

Retaking inventory

Since it turns out that close second-syllable vowels in Khanty are secondary, from the Proto-Khanty perspective I should be probably talking about vocalizable stems, not “vowel stems”. This then suggests that a sixth category should be also distinguished: PKh *Aɣ-stems. These would then fill up a neat 2×3 system:

  • vowel stems: *-A(C), *-Aj, *-Aɣ
  • consonant stems: *-∅/-əC, *-əj, *-əɣ

A few words ending in *-Aɣ are indeed reconstructed by Honti, and they indeed also show distinctive development of their own. A representative example would be the adverb *koɳčaaɣ ‘on back’: Far Eastern /koɳčaaɣ/, Surgut /koɳɣïï/, Southern /xončää/, Nizyam & Kazym /xonšaa/, Obdorsk /xonsaa/. So we have here:

  • loss/vocalization of *-ɣ in WKh, versus its retention in EKh (same as in *əɣ-stems);
  • retention of *-A in not just EKh but also WKh, presumably protected by the earlier word-final consonant (partially same as in *Aj-stems);
  • a strange development to /-ɣïï/ in Surgut, perhaps through metathesis (*-aaɣ > *-ïïɣ > *-ɣïï)?

Kind of paralleling *Aj-stems being mainly animal names, all of Honti’s examples seem to be adverbs. The other two are *koomtaaɣ ‘overhead’, *pertääɣ ‘back’. I would add to this group also *maakaaɣ ‘previous’, which he reconstructs as *maakaaj, despite SKh /maxaa/ and not ˣ/maxääj/.

The *A-stems

Moving onto the main bulk of *A-stems, these may also need to be analyzed as partially secondary. This, however, requires taking a few steps back to look at the wider context.

While the modern Khanty varieties and also most reconstructions of Proto-Khanty abound in consonant stems of the shape CVC, CVCC or CVCəC, it is clear that this is an innovation, and that in Proto-Uralic the dominant root structure was bisyllabic *CV(C)CV. It is also clear that the transition towards consonant stems across a wide central area among the Uralic languages has taken place mostly as areal drift, not as a diagnostic subgroup innovation. Marginal languages of this type, such as Estonian, Nenets and Skolt Sami, still remain at a “thematic inflection” stage, showing consonantal nominative singular forms but vocalic inflectional stems. A good example would be Estonian nom.sg. silm : gen.sg. silm-a ‘eye’, where the latter form is at least from a historical point of view better viewed as silma-∅ (and thus structurally identical to Finnish silmä-n). Verbal roots, which generally cannot stand alone, also generally retain original second-syllable vocalism. And due to the lucky fact that the largest clear subgroups of Uralic all occur near the edges (Finnic, Samic, Samoyedic), in all of these cases we will be able to compare these languages with close relatives that remain at a firmly vowel-stem-centric inflection type (e.g. Votic, Inari Sami, Nganasan, respectively).

A transitional stage, one of several possible, is represented by Hungarian, where nouns retain a trace of thematic inflection (nom.sg. hal : plural hal-a-k ‘fish’; but nom.sg. dal : pl. dal-o-k ‘song’). However, in adjectives and verbs, presumable earlier lexically determined stem vocalism has been levelled entirely, and in most word forms second-syllable vocalism is now better analyzed as morphologically determined. Constantly vocalic stems have also been reintroduced among nouns, primarily in loanwords (e.g. balta : baltá-k ‘axe’, from Turkic), but also in derivatives (e.g. apa ‘father’, where -a has been interpreted as a fossilized possessive suffix).

Sauer’s old work proposes that *A-stems would be a retention from Proto-Uralic in one environment specifically: stem-finally in nominals, as suggested by a few equations like PU *neljä > PKh *ńeLää ‘4’. This would imply that elsewhere they aren’t retentions. The PKh situation as currently reconstructed therefore seems to derive from something close to the Hungarian situation, where original stem vowels have first been almost always phonetically reduced or analogically reshuffled away; then new ones are introduced.

Loanwords can of course fill in new second-syllable vowels, e.g. EKh /aarkaan/ ‘thick rope’, from Turkic; *ajaa > EKh /ajaa/ ~ /ajə/, WKh /aj/ ~ /oj/ ‘luck’, from Tungusic. In native vocabulary though, the most natural source for new second-syllable vowels are original third-syllable vowels. Given the original trochaic stress pattern of Proto-Uralic (as still continued in Samic, Finnic, partly Hungarian and Samoyedic), foot-final vowels would be expected to be the first ones to fall. After this, earlier 3rd-syllable vowels will move one syllable forward, becoming new unreduced 2nd-syllable vowels.

In at least some of the examples I’ve discussed above, 2nd syllable *-A clearly derives from an original 3rd syllable. *koomtaaɣ ‘overhead’, for example, is probably a derivative of PU *kuma- ‘overturned’, i.e. descends from pseudo-PU *kuma-takV. The entire animal name group also falls under this.

Now, the crucial question is — at what point in the history of Khanty was the distinction between “primary” 2nd syllable vowels, retained since PU, and “secondary” 2nd < 3rd syllable vowels lost for good? I think there’s reason to think that this, too, was post-Proto-Khanty.

Relatively poor retention of absolute final *-A is maybe best attributed to specifically word-final reduction/loss. The numeral ‘4’ for example, does not surface with a final full vowel anywhere: the reflexes are Far Eastern /ńelə/, Surgut /ńeɬə/, SKh /ńetə/, Nizyam /ńitə/, Kazym /ńaɬ/, Obdorsk /ńiil/. In many other cases, only the Vasyugan dialect delivers: e.g. *paraa ‘raft’ > Vy. /paraa/, Vakh, Surgut & Demyanka (SKh) /parə/, Obdorsk & most SKh /par/, Nizyam & Kazym /por/.

(It’s unclear at least to me what’s up with the loss of *-A in SKh and Nizyam in ‘raft’, versus its retention as /ə/ in ‘4’. Both patterns have further examples; retention is more common. I’m not sure if I would want to utilize a “primary/secondary” distinction just for these.)

A bigger problem though is that “primary” *-A is mostly lost also in verbs, even though in these the vowel would have been always protected by an inflectional ending. For example *kalaa- ‘to die’ yields Far Eastern /kalaa-/, Surgut /kaɬ-/, SKh & Nizyam /xat-/, Kazym /xaɬ-/, Obdorsk /xal-/. This is in clear contrast to “secondary” *-A in words such as ‘height’: VVy /peläät/, Tremjugan (Surgut) /peɬiit/ (?), Nizyam /pataat/, Kazym /paɬaat/, Obdorsk /päläät/ — which, again, clearly comes from a longer proto-form, being a derivative from PU *pidə > PKh *peL ‘tall’ (and probably further cognate to also e.g. Fi. pituus : pituude- ‘length’, allowing a PU reconstruction #pidə-(w)Otə).

There seems to be some evidence for a “primary/secondary” distinction to be found in *-AC nominals, too. A good example might be *raɣaam ‘relative’ > Vakh /raɣaam/, but Tremjugan /raɣəm/, WKh /raxəm/; derived from a base verb ‘to approach, be near’ — only attested in WKh, and it could be from PKh *raɣaa- rather than simply *raɣ-.

Even if Proto-Khanty had a contrast between two types of *A-stems, trying to reconstruct this in the original 2nd syllable / 3rd syllable fashion seems like the wrong approach, though. In cases like ‘height’, this would lead to awkward vowel-cluster reconstructions such as **peLəäät. In cases like ‘overhead’, nothing would immediately stand out typologically in reconstructing **koomətaaɣ, but this still has at least one undesirable consequence: we can no longer treat *ə as a purely epenthetic vowel in PKh, inserted to resolve consonant clusters (reconstructions like *waajəɣ ‘animal’ are in fact better taken as phonologically */waajɣ/), and at least some cases would have to be assumed underlying.

I have another hypothesis in mind: the distinction may have been prosodic. 3rd syllable vowels in PU would have originally born secondary stress, and this might have been retained in some form even after the loss of a preceding 2nd syllable.  It’s not clear if an outright iambic stress pattern should be assumed though (*peˈLäät), or if something like a monosyllabic initial stress group followed by secondary stress will suffice (*ˈpeL|ˌäät). In principle it would be also possible to leverage the tenseness distinction, well-attested in initial syllables: *peLäät with tense *-ää, versus *ńeLä with lax *ä? For now, I will notate this distinction as *-À (“primary”, “unstressed”; individually *-a, *-ä) versus *-Á (“secondary”, “stressed”; individually *-aa, *-ää). Regardless of the phonetic specifics, later on *-À would have been generally reduced (*raɣam > /raɣəm/), while *-Á would have remained (*peLäät > /peLäät/).

The stress hypothesis finds some amount of direct confirmation as well: cases of fully iambic second-syllable stress have been reported at least from Eastern Khanty (Far Eastern /peˈläät/, Surgut /peˈɬäät/).

Stress in EKh does not appear to be a direct archaism, however. Per all descriptions I have seen, the attested distribution is purely phonological: stress is primarily initial, except when the 1st syllable contains a lax vowel and the 2nd syllable a tense one. This also rakes in cases of “unstressed” *-À; e.g. Far Eastern /kaˈlaa-/ ‘to die’. This seems like another point in favor of some kind of a more subtle distinction in PKh. I would suppose that in varieties of EKh, *-À was early on partly tensed to merge with *-Á, and could have actually acquired stress only later. Wherever this change failed to take place (including in all varieties of WKh), *-À was then reduced/lost.

In summary

Altogether, I propose the following general chronology for the development of second-syllable vocalism in the Khanty varieties:

  1. The partial merger of *-À and *-Á in Eastern Khanty (with variable conditions); including *-Áj > *-Àj.
  2. The reduction of remaining *-À across all of Khanty; loss of *-əɣ in Kazym and Obdorsk after sonorants.
  3. *-əɣ > *-əj across all of Western Khanty.
  4. *-əj > /-I/ across all of Khanty (with variable conditions); in parallel, *-Aj  > /-A/ in Northern Khanty.

All of these changes are very heavily areal, and do not seem to define any substantial genetic subgroups. The main divisions of Eastern Khanty, the Far Eastern and Surgut groups, would have to be assumed to have split already before step 1 (*kala- > *kalaa- vs. *kal-); the Nizyam / Kazym / Obdorsk dialects of Northern Khanty, already before step 2 (*äärəɣ > *äärəɣ vs. *äär). The split of Nizyam and Southern Khanty could be in principle delayed until step 4 (making Nizyam a “Northernized Southern” rather than a “Southernized Northern” dialect after all), but this seems like a poor idea, even if for now I cannot refute it explicitly.

Areality seems to be further proven by how most parts of this scheme have parallels also in Mansi (e.g. *-əɣ > Northern and Pelymka (Western) Mansi /-iɣ/, Eastern and rest of Western Mansi /-i/; *-A > EMs, WMs -∅). But a detailed look into this will be a task for later.

Further implications

So what can we do with this?

The above analysis leads to at least one more general interesting corollary for Khanty historical phonology. If PKh *À-stems were in the early common Khanty period reduced en masse — then this opens the possibility that several cases could have been lost entirely from the data. Already Sauer notes that all inherited word-final cases of PKh *A-stems seem to occur either following the PKh lax vowels (*e *ö *o *a), or the traditionally reconstructed tense mid ones (*ee *öö *oo). Other cases could have existed as well … we may just be currently unable to directly distinguish them from consonant stems.

There may be, however, indirect evidence to draw such distinctions. The notorious Khanty “ablaut” system (which I am afraid I cannot explain in detail in this post) has for a while now been explained as being instead a partly morphologized system of former umlaut. [8] Per this hypothesis, alternations like EKh (*)ɬɔɔj ‘finger’ ~ (*)ɬuuj ‘thimble’ would continue something like earlier *ɬɔɔj(A) ~ *ɬuuj-(i), either with i-umlaut of *ɔɔ to *uu in the derivative ‘thimble’; or a-umlaut of *uu to *ɔɔ in the base root ‘finger’. I am more inclined to side with the latter (Honti’s view) than with the former (Helimski’s). If close/open ablaut in Khanty is fundamentally based on a-umlaut, the assumed umlaut trigger could be then identified as *-À, and we could then amend ‘finger’ to PKh *ɬɔɔja instead. This in turn also accords fairly well with the PU reconstruction: *suwd₂a (with Samic *čuvðē, Samoyedic *təjå clearly indicating an original *A-stem). By contrast, Helimski’s assumed *I-stems seem to be nowhere supported by actual data: they are simply circularly inserted into proto-forms where a close-grade vowel eventually surfaces.

Perhaps even un-umlauted *ɬuuja is a possibility for PKh. Vowel alternation in many cases occurs only in EKh, not WKh, and I would not dismiss offhand the possibility that this reflects unstressed vowel isoglosses in early common Khanty. In this case we indeed find WKh *ɬuuj (SKh /tüüj/, Kazym /ɬuj/, etc.) and not *ɬɔɔj > **ɬooj. Instead of assuming levelling from ‘thimble’, or from possessed forms (Vakh /luujəm/ ‘my finger’), maybe no umlaut took place here to begin with, and the discrepancy between EKh *ɬɔɔj ~ WKh *ɬuuj goes back to already earlier *ɬuuja ~ *ɬuuj(ə), with some kind of an early conditional loss of *-À in WKh.

Some other cases of “umlaut” might turn out to be illusory entirely. I am on board with the “Helimski school” reanalysis of “Steinitz school” PKh *ee, *öö, *oo as lax open vowels, and PKh *e, *ö, *o as lax close vowels (though I would be content to keep on using the symbols *e, *ö, *o for the latter). However, the associated reanalysis of Steinitz’ lax open *a as close *ï seems unsatisfactory. In most cases, this continues PU open *a; it is also continued as lax open /a/ in most Khanty varieties. Moreover, we can identify numerous instances where this occurs in an *À-stem instead. The clearest evidence are “thematic verbs” such as ‘to die’, where at least in Eastern Khanty the surface alternation is between /oo/ (/kool-/) and /a-aa/ (/kalaa-/). Since Helimski considers *ï to be the i-umlaut counterpart of *a, he ends up proposing the phonetically nonsensical solution that *A-stems would have triggered i-umlaut!

Instead of a back-and-forth development *a > *ï > /a/, purely for the sake of making way for *a > /oo/, I would propose that the rewriting of *ee, *öö, *oo as *ä, *a does not reflect mechanical identity. Rather, the alternation of the sort /oo/ ~ /a-aa/ is again perhaps post-Proto-Khanty entirely. PKh lax *a and *ä were only tensed and raised to /oo/, /ee/ ~ /öö/ when stressed; when unstressed, they were left as is (and not umlauted to anything at all). The first-syllable alternation /oo/ ~ /a-aa/ should be taken back to an earlier stress alternation /á(-ə)/ ~ /a-á/, in turn going back to earlier *á-ə ~ *á-a, through the Eastern Khanty stress retraction shift *-À > *-Á.

Filling up the details on this hypothesis (and possible similar approaches to other ablaut patterns) will need a much closer analysis, though. But ultimately, it may be able to reduce the somewhat sprawling Proto-Khanty vowel system into a more manageable shape.

[1] Infuriatingly, he does not provide any comments on what has motivated the division of the data. There are hints, of course. Much of the “second-tier” data seems to have relatively limited dialect distribution on one or both sides, e.g. only in Northern Mansi, or only in Southern Khanty; or relatively irregular sound correspondences. I get the impression that he considers it likely that some of this data is either unrelated; are parallel loans from some third source; or consists of loans from Khanty to Mansi (or perhaps vice versa). On the other hand, I think even the main part of the data likely contains a number of cases of this kind. Are these oversights, or does he have any actual reasons in mind to consider some initially spotty-looking cases stronger than others?
[2] In their respective C2IFU contributions “Zur Geschichte der Nominalstämme in den ugrischen Sprachen”; “Nominalstämme auf *-a/*-ä im Ostjakischen”.
[3] Bear in mind that Proto-Khanty had a contrast between full and reduced vowels, not in vowel length, and e.g. “long” *aa *uu should be read simply as [ɑ] [u]. “Short” *e is then a reduced vowel, [ə] or [ɪ], and is traditionally indeed transcribed ə in close transcription by fieldworkers on Khanty. Thus, *ej > /iij/ does not involve seemingly unmotivated lengthening, but rather tensing: [jɪ] > [ji].
[4] Published in László Honti’s Festschrift (Ünnepi kötet Honti Lászó tiszteletére). The University of Helsinki library does have a copy, but it’s on loan currently. If by any chance the culprit happens to be reading this, please feel welcome to get in touch with me…
[5] The overall rarity of roots ending in *-Ləɣ in Khanty is not a mystery: it is due to the common (Proto-?) Ob-Ugric metathesis of PU *-lk-, *-sk- > East Uralic *-lɣ-, *-ɬɣ- > OUg *-ɣl-, *-ɣɬ-.
[6] At least two other examples exist of *aa > *ɔɔ before bilabials. 1) ‘Bird cherry’: *jɔɔm in place of expected *jaam, from PU *ďëmə. 2) ‘Hair’: Far Eastern *aawət < *aapət regularly continues PU *ëptə, but other dialects, including Obdorsk, indicate *ɔɔpət. On the other hand, there are counterexamples against assuming a regular change, e.g. *kaam ‘coffin’ (~ Mansi *kaməl), *kaap ‘boat’ (~ Mansi *këëpə), *saam ‘scales’ (~ Mansi *sëëmə, < PU *sëmə).
[7] To be exact, Steinitz and Honti only claim this about tense *üü, *öö, *ɔ̈ɔ̈. PKh reduced *ö has labial reflexes more widely in WKh, including fronted [ɵ] in SKh. However, this is only the case adjacent to velars; elsewhere we see the expected delabialization to *e. I would propose that this development involves “double cheshirization” (and is areally connected to the same in Southern Mansi): *kö > *kʷe, then re-coloring: *kʷe > South [kɵ] (= phonemically /ko/), North /kuu/.
[8] For a starting point, see e.g. E. Helimski (1999): “Umlaut in Diachronie – Ablaut in Synchronie: Urostjakischer Umlaut und ostjakischer Ablaut.” — Diachronie in der synchronen Sprachbeschreibung. Mitteilungen der Societas Uralo-Altaica 21: pp. 39–44.

15 comments on “Observations on second-syllable vocalism in Khanty
  1. Blasius B. Blasebalg says:

    Just a spontaneous note on:

    *-əɣ > *-I appears after obstruents, *-əɣ > ∅ after sonorants.
    I am not sure how the split development here should be interpreted phonetically, either.

    The split sounds immediately plausible (and I hope this is not due to my ignorance on (Proto-)Khanty phonotactics):
    The obstruents in your examples are either voiceless or w.
    Being a voiced fricative, ɣ is somewhat “sonorant-ish” – so that a syllable *Rəɣ may inspire dissimulation. Also, this does not seem so vexing with *wəɣ.
    As the final syllable is already reduced, dropping it seems the most sensible way of resolving this situation – changing the final consonant might give too much weight to the reduced syllable, and changing the sonorant to something else triggers at least a few problems with understanding – as phonetic changes always do, but the motivation seems to small to justify that. On the other hand, the split change may even enhance understanding because it is now easier to tell the consonants before the *əɣ apart.
    This means I suggest the following addition to the relative chronology of the split:
    1. *Təɣ / *Rəɣ
    2. *Təɣ / *R
    3. *Təj / *R
    4. *Ti / *R
    … in the predecessors of the likes of Kazym.

    This might provide another argument to explain the split.

    • Blasius B. Blasebalg says:

      Just cancel the mention of *w; I see this should be an original *p (which does not make w an obstruent).

    • j. says:

      Sure, dissimilation would be possible.

      There are many other possibilities though. For one example, *-rɣ or *-ɳɣ would still be a reasonably pronounceable cluster, while **-pɣ or **-tɣ would not. So the initial split could also have been by syncope: *-Rəɣ > *-Rɣ (while *-Təɣ ≡), then cluster reduction > *-R.

      • Blasius B. Blasebalg says:


        Besides, kudos for even asking the question.
        I’d bet many linguists would have stopped contently when reaching a rather clear-cut conditional split, not seeking an explanation for the difference in behavior.

  2. Blasius B. Blasebalg says:

    Meanwhile I read Zhivlov’s paper. One worthwhile aspect is the attempt to explain the alternation between *ə and *å in Proto-Samoyedic (even though I don’t really know how certain the assginment is in each case given evidence from daughter languages).

    However, I see one downside in his stem structure model for Proto-Uralic:
    While he argues for stem vowel *-o, he does not seem to assume a corresponding **-ö.
    If I am not mistaken he never mentions that restriction.
    Nevertheless, the *-o he reconstructs is not neutral in terms of vowel harmony, but assciates with back vowels.
    (I write *-o instead of *-a_1 for simplicity while acknowledging that Zhivlov considers *-o just as one possible realisation of *a_1.)

    This would leave us with a rather complicated system of stem types:
    – auslaut in *-a/*-ä, depending on first syllable vowel;
    – auslaut in *-ə (Zhivlov: *-i), independent from first syllable vowel;
    – auslaut in *-o, only possible after back vowel syllable(s).

    While in principle this system might plausibly occur in a language, it looks quite different from the situations in current Uralic languages.

    By the way, is it known what is the source for short single vowel endings other than a ä i in Finnish (e.g. lintu, laakso)?

    • j. says:

      There’s no first-syllable **ö either, though. This could add up to a system similar to Proto-Finnic and modern Votic, with a/ä harmony but unharmonizing o. Or perhaps Proto-Uralic did not have any vowel harmony, after all?

      Zhivlov also only examines in the article the development of first-syllable back vowels, and it might be possible to consider similar adjustments also for the reconstruction of the front vowels. *ä also has several unsolved issues in its reflexes (e.g. in Khanty mostly *ee, but also often enough *ää) and there are some kinks with the development of *e too. Despite some tries, I’ve not managed to locate any convincing enough correlations in this across different branches of Uralic, though.

      By the way, is it known what is the source for short single vowel endings other than a ä i in Finnish (e.g. lintu, laakso)?

      The known sources are through diphthong contraction:
      – *aj > *oi ~ *ëi ~ *i
      – *äj > *ei ~ *i
      – *əj > *i
      – *aw > *o
      – *əw > *u ~ *ü
      (and later *oi, *ëi, *ei > o, i, i in standard Finnish)
      The fate of *äw is not clear. I assume *o as the original outcome (per action names such as *elä- ‘to live’ ~ Fi. elo ‘living’; *künčä- > *küntä- ‘to plow’ ~ Fi. kyntö ‘plowing’), but *ü has also been suggested.

      This does not exhaust the data though. Many cases like aro, lintu, suomu, koivu remain where there are no non-circular grounds for reconstructing earlier suffixed proto-forms along the lines of *ara-w, *lintə-w. Sometimes there’s clear counterevidence, even; e.g. for ‘bird’, suggested cognates such as Northern Sami loddi ‘goose’ point to earlier *lunta, which would predict Fi. ˣlinto. Or indeed, ˣlunto!

      • Blasius B. Blasebalg says:

        Ööps, ÖK, thanks for the additional explanation. It turns out that taking Finnish as the universal representant for Finnic fails already at such an early stage.

        The known sources are through diphthong contraction … This does not exhaust the data though.

        For the part where the diphthong reconstruction works, this leaves the question how that diphthong got there? Loss of an intervocalic consonant is plausible (starting with at least 3 syllables), perhaps even loss of a consecutive vowel/syllable. However, a suffix *-w looks like quite a stretch … after all, the suffixed forms are far from the suggested root structure (ending in a vowel, for starters). I might be missing something here, but reconstructing *-wə or *-we, for instance *elä-wə (as far as *ə is available on the respective node), would significantly easen my frowning.

        Or perhaps Proto-Uralic did not have any vowel harmony, after all?

        Definitely worth a thought. But that would leave us with independent developments in at least three legs – Finnic-Mordvinic, Mari, and Eastern. Turkic influence cannot account for all of that, in particular Turkic influence on Proto-Finnic and Proto-Samoyedic must have been rather small. Moreover, vowel harmony is a feature that is often inherited, extended, reduced, modified, or dropped, but rarely developped from scratch.*) Now at least three related languages should have done it, in similar ways. Quite a bit of coincidence, isn’t it?

        However, I do get the feeling that the current root model for Proto-Uralic is too simple, not comprehensive enough. Perhaps it allowed for words that on first sight might look like breaches of vowel harmony. Proto-Uralic must have words with more than two syllables (inflected forms), so why should it not have allowed, at some point, longer roots? The ancestor of Finnish ‘sydän’ is a good candidate. Perhaps not all rules make sense if we only concentrate on two-syllable words?

        *) Vowel harmony is rarely developped in the following sense: There are few cases where
        a) an attested language has vowel harmony and its attested ancestor has not;
        b) an attested language has vowel harmony and its reconstructed ancestor, by conclusive evidence, had not.
        Funnily, this is somewhat similar to gender systems: They are extended and reduced all the time, but only a few instances are known about concrete new creations.

        Now b) might be due to how linguists reconstruct: If Eastern and Southern Nilotic have vowel harmony and Western Nilotic has not, what should be assumed about Proto-Nilotic?

        Case a) applies to Korean and Old Korean, according to Vovin. According to Altaicists, by contrast, Korean has had vowel harmony all the time since Altaic days. I have no idea what to believe on this point.

        • j. says:

          I assume earlier word-final *-w would have alternated with an inflectional stem in *-wə- before endings of the shape *-C at least, much as still in cases like sydän : sydäme-n ‘hart’ or sammal : sammale-n ‘moss’. (Likewise words of the type täti < *tätei < *tätä-j ‘aunt’ could have had earlier genitive forms such as *tätäjə-n.) But going further back, I’m not too sure if these have come from earlier unalternating *-Cə. Another option might be that it’s the inflectional endings that used to be vowel-final; long suspected to have been the case for at least 1PS *-m, 2PS *-t. Formations of the type *-Cə-n might then have developed along a “Hungarian” path: by epenthesis from earlier *-C-n < *-C-nV.

          On vowel harmony, Finnic seems like the only case where secondary contact influence is not an option. Proto-Samoyedic was in at least some contact with Proto-Turkic (though I actually rather suspect the opposite direction of influence here: Turkic shifting from pharyngeal to palatal vowel harmony due to Uralic contact). Harmony in Mordvinic & Mari is probably secondary: the Moksha and Meadow Mari systems can be analyzed as purely surface-phonetic, and the phonologized Erzya and Hill Mari systems are transparently derivable from this.

          There are some alleged traces of vowel harmony left in Permic too, though. I’ve seen a claim attributed to Lytkin that in inherited vocabulary, stem vowels would still survive before certain derivational suffixes, such as /-s/ < *-ksə. More specifically *-a-ksə > /-as/, *-ä-ksə > *-es, *-ə-ksə > /-ɨs/. This would then appear to date to a period earlier than Volga Bulgar influence.

          On the other hand, it seems possible to predict palatal vowel harmony from a few other phonological traits: the presence of “umlaut vowels” such as /ä ö ü/; a “front-heavy” distinctive load for vowel contrasts; and the lack of unstressed vowel reduction to [ə]. Proto-Uralic clearly had at least the first two, so maybe we do not need contact with a vowel harmony language specifically to end up with vowel harmony in various Uralic daughter branches. E.g. even if unstressed *ə was “full-vocalized” in Finnic due to contact with non-harmonic IE languages, this would still seem to predict harmony to emerge at this time at the latest.

          Compare also vowel harmony in dialects of Catalan. This provides a very clear example of vowel harmony developing in a language family & region where it was previously absent; and it also seems to have followed a roughly equivalent typological path, in turning up right on the border of Ibero-Romance (full-vocalic) and Gallo-Romance (vowel-reducing).

          • David Marjanović says:

            Wikipedia has one paragraph on vowel harmony in Catalan, calling the phenomenon specifically Valencian: when the stressed vowel is /ɔ/ or /ɛ/, it is copied over all unstressed vowels, at least when those are |o| or |a| (only 3 examples are given in total).

            Clicking through to “Vowel harmony” brings up a list of other Ibero-Romance languages, but the few for which any further description is available in their Wikipedia articles seem to have umlaut instead. http://personales.uniovi.es/c/document_library/get_file?uuid=ac9f49ea-eba1-4acc-b780-2dc9196f34ec&groupId=48843This paper describes (in the middle) a bunch of Asturian dialects, where unstressed final /u/ and sometimes the rare /i/ raise stressed |a e o| to /e i u/ or /o i u/ depending on the dialect, and a “Gallego-Leonese” dialect where unstressed /i/ and conditionally /u/ raise following stressed |a| to /ɛ/.

            I wonder if the Valencian vowel harmony results in some way from the combination of a syllable-based rhythm with strong but not too strong reduction of unstressed vowels that Catalan generally seems to have. French is also syllable-timed, but has reduced the unstressed vowels to such an extent that they can’t bear copied qualities anymore.

            So perhaps the Proto-Uralic vowel system developed from something simpler (**/e a o i ɨ u/?) like this:
            1) Initial stress;
            2) Frontness-based umlaut creating extra front vowels, enlarging the phonetic vowel inventory of stressed syllables;
            3) Mergers among unstressed vowels phonemicizing the new contrasts among stressed vowels;
            4) Redistribution of the 3 or 4 remaining unstressed vowel phonemes according to features of the stressed ones in the same word.

            Proto-Samoyedic was in at least some contact with Proto-Turkic (though I actually rather suspect the opposite direction of influence here: Turkic shifting from pharyngeal to palatal vowel harmony due to Uralic contact).

            Oh, that makes sense.

          • Blasius B. Blasebalg says:

            I assume earlier word-final *-w would have alternated with an inflectional stem in *-wə- before endings of the shape *-C at least …

            Oh yes, that makes a lot of sense. As you imply by your examples, this would apply to (most or) all consonants allowed as absolute auslaut, presumably including m n t next to w j (and, by your example, l, and one variety of s?). This makes for a very plausible and consistent model of a (proto-)language.

            By the way, this might open an avenue towards closer thoughts about vowel harmony: Why need the permissible vowel combinations in words of the shape CVC(C)V be the same as in CVC(C)VC? Well, in most (all?) other languages with vowel harmony _and_ final consonants they are; but this is definitely a point of attack in particular if there are different restrictions on the second vowel anyway (perhaps additionally depending on the final consonant). However, I am only idly speculating here; I don’t have a novel rule set or a particular lexem in mind.

            Compare also vowel harmony in dialects of Catalan.

            I didn’t know about this, this is very intreaguing. Thank you for pointing it out!
            So I tried to read up a bit about it. Two things came to my mind:

            a) You point out that in the Catalonia region, reducing and non-reducing Romance varieties come into contact. While this is correct, the situation is still not analogous to that of Finnish because Catalan is highly reducing, and that seems to apply also for the harmony dialects (reduction is sometimes cancelled due to harmony, while your argument for Finnic seems to start with the lifting of ə to i).
            b) As far as I understand, Catalan vowel harmony only affects part of a word, for instance the first syllable in a four-syllable word is never affected. This is perfectly common for several African versions of vowel harmony; however, this puts it at quite a distance from the Northern Eurasian style. Therefore, the systems are not really comparable.
            c) Moreover, Catalan harmony doesn’t use _sets_ of fitting vowels (such as a o u) but some vowels work as attractors, assimilating everything in their reach (“o and only o”). Again, this is not the Eurasian (“Uraltaic”) fashion. I feel uneasy with transferring insights from this to such a different style.

            Nevertheless, it is a very interesting example of how vowel harmony can arise, and I was craving such an example. Unfortunatly, it is quite remote from the systems in Uralic languages.

            Harmony in Mordvinic & Mari is probably secondary

            Oh, really? It has never occurred to me that the existence of vowel harmony (as opposed to the details of the system) might not be original in a Uralic language. But it is plausible that if you have all the necessary equipment (such as ü and ï), and perhaps many around you do it that you eventually join. Of course, this is also consistent with a trajectory off from vowel harmony and back.

            On the other hand, it seems possible to predict palatal vowel harmony from a few other phonological traits: the presence of “umlaut vowels” such as /ä ö ü/; a “front-heavy” distinctive load for vowel contrasts; and the lack of unstressed vowel reduction to [ə].

            Wow. If this is really corroborated, it is certainly an important theorem, even if the conditions are quite strict. I have quite a few issues with that statement, on several levels:

            1) Modern French is a counterexample, isn’t it? It has /o/ and /ö/, /u/ and /ü/, /a/ and /ä/, the difference is semantic (“ou”/”o`u” vs. “eu”, “au(x)”/”eau” vs. “eux”, “a”/”`a” vs. “aie”). And while historically French has been highly reducing, at least to one standard reduced (end) vowels are always omitted, which restricts ə to few environments (like ‘revenir’, where it is hard to omit the two e’s). However, I’d argue that ə in French is, synchronically, its own vowel quality, not just the reduction of something else. In particular, it is the standard pronunciation of the letter E, unthinkable in many other languages; more generally, it can be stressed under special circumstances without being lifted to ɛ or e (such as this), and finally, it is not really neutral in sound, but has a certain ö-coloring.
            So when is French vowel harmony scheduled to begin? ;-)

            2) The statement focusses on the phonetic realization of harmony; it concerns palatal harmony, but not ATR harmony. This might be necessary if there is indeed such a mechanism, but seems unnatural from a typological point of view. For instance, there is no interaction of this theory with Khalkha which has ATR + rounding harmony and reduced vowels.
            The first item I would like to learn about a vowel harmony system is its dimension: It makes a lot of difference if the system is one-dimensional (such as in Finnish, Korean, or Nandi) or two-dimensional (with rounding harmony, such as in Hungarian and “Core Altaic”). The exact realization of the first dimension (e.g. palatal vs. ATR) is secondary, in particular because retaining contrast systems is more important for understanding than exact imitation. As far as I understand, the ATR harmony in Khalkha corresponds to palatal harmony in other Mongolic languages.
            (This is point where Catalan seems weird even in a global context: It has more than two classes in only one dimension.)

            3) The proposition doesn’t name cause and effect. I understand that this is a typological statement, and languages have several options reach a “fitting” state. So under this theory, a language fulfilling the three conditions but without vowel harmony could either introduce it, or otherwise start reducing vowels, or collapsing the vowel system. However, vowels like ö and ü mostly occur in languages with vowel harmony, which strongly suggests the converse argument: These vowels exist in many languages _because_ of vowel harmony (not the other way around).

            4) Finally, it doesn’t explain Catalan harmony, since Catalan doesn’t have either ö or ü. (But then, Catalan harmony is not really front vs. back, but e vs. o vs. rest).

            So perhaps the Proto-Uralic vowel system developed from something simpler (**/e a o i ɨ u/?) like this:

            This seems like a plausible way to reach Uralic-style vowel harmony in general (I’m not saying I’m buying it for Proto-Uralic specifically).

            • j. says:

              That’s a lot of food for thought… A few initial points:

              1) Why need the permissible vowel combinations in words of the shape CVC(C)V be the same as in CVC(C)VC? Well, in most (all?) other languages with vowel harmony and final consonants they are

              Counterexamples to this are actually not hard to find, e.g. Meadow Mari, which has unalternating /ə/ inside words, but which at the end of a word turns into harmonizing [e ø o]. (These are distinct from regular /e ø o/ in not attracting stress, and I would suggest continuing to analyze them as phonologically /ə/.)

              Similar stuff goes on also in at least Moksha (/ə/ → [a ~ ä] word-finally) and some dialects of Ludic (/a ~ ä/ → [ə] word-finally in stems with 3+ moras).

              2) It has never occurred to me that the existence of vowel harmony might not be original in a Uralic language.

              For an even better example, there’s Southeastern Udmurt, which innovated vowel harmony and “umlaut vowels” /ä ö ü ə̈/ roughly in tandem (though /ü/ is partly also from Proto-Udmurt central labial *ʉ). Also, interesting coincidence: I today ran into this presentation by Fejes László which summarizes some of these systems in brief. (He has informative presentations on the vowel harmony systems of Hungarian, Finnish and Nganasan as well.)

              3) So when is French vowel harmony scheduled to begin?

              Probably around the time French shifts from final to initial stress.

              If I wanted to make this prediction into a more official release, I should definitely unpack “front-heavy” into something clearer… I do not mean vowel frontness, I mean word-level prosody. “Initial stress only” would be too strict, failing to capture languages like Erzya, Meadow Mari or Eastern Khanty, but we do need some kind of prominent left edges that can trigger new harmonic domains.

              4) However, vowels like ö and ü mostly occur in languages with vowel harmony, which strongly suggests the converse argument: These vowels exist in many languages because of vowel harmony

              No, this doesn’t seem to check out very well. Vowel harmony by definition never introduces, or actively affects at all, root vowel qualities. Maybe harmony can introduce a new quality first only in suffixes, followed by its introduction in roots through other means, but I don’t know of any clear cases of that either. (This would be one possible hypothesis for Southern Finnic õ and for Northern Finnic short ö, but in these cases harmony quite clearly already existed by that point.)

              For the origin of Uralic “umlaut vowels”, I have a hypothesis different both from this and from David’s. There will likely be a blog post sometime this fall, but if you want a hint, it hinges on the lack of **ö.

  3. David Marjanović says:

    I like it. For what little that’s worth, I like all of it. :-)

    the typical sound correspondence Mansi *ľ ~ Khanty *j (< PU *ď; the intermediates are not obvious, but that question is irrelevant for now)

    I’d rather say there’s an embarrassment of riches here. Of course [lʲ] and [ʎ] can turn into [j] directly; [ɟ] has merged into [j] in most (but not all) of Inuktitut.

    The notorious Khanty “ablaut” system […] has for a while now been explained as being instead a partly morphologized system of former umlaut.

    That has parallels in Upper German and especially Luxemburgish, as explained mit deutscher Gründlichkeit in this paper.

    • j. says:

      I’d rather say there’s an embarrassment of riches here. Of course [lʲ] and [ʎ] can turn into [j] directly;

      Sure. There would be no trouble if we were reasonably sure that a palatal lateral is the oldest stage. There’s also a correspondence between Mansi *ľ and Khanty *Ľ (a similar case to *L: > /ľ/ ~ /ɬ´/ ~ /ť/ depending on the dialect) though, which is what Honti reconstructs as original Ob-Ugric *ľ. For *ľ ~ *j he uses in the book the traditional notation *δ´, and you might be able to recall that elsewhere he has suggested *[ɬʲ] as the phonetic value (ditto *δ = [ɬ]).

      • David Marjanović says:

        Thanks for the reminder!

        [j] > [lʲ] is not completely unheard of, e.g. in the Baumwipfel-Regel of most Slavic languages (as in Serbian Skoplje for Macedonian Skopje), but no such conditions apply here, right?

        • j. says:

          Correct: *j remains as /j/ in Mansi (and elsewhere in Uralic) just fine. The similar hypothesis that maybe Khanty *j versus *Ľ is a conditional split has also crossed my mind, but it doesn’t seem to have any viability either. One near-minimal pair against this would be *sooɣəj ‘clay’ with *j < *δ´ (cf. above) versus *waɣəĽ ‘side river’.

