Proto-Uralic *ŋx?

My earlier post ‘Swan’ in Uralic alluded to the possibility of reconstructing Proto-Uralic also *x in positions where it has not previously been considered to occur, particularly by reanalyzing some clusters with *k in them. This is not an idle throwout idea: I have several other specific hypotheses about this already under development.

One of the more common PU consonant clusters is *ŋk (by my count in the top three). However, there does not appear to be substantial imbalance within the nasal-stop clusters: *nt is about equally common, and *mp, *nč, *ńć [1] not too rare either. So I do not think the instances of *ŋk need meddling with. I would however consider a slightly different reanalysis: perhaps some instances of the Proto-Uralic plain velar nasal should be reinterpreted as *ŋx.

I have three main reasons to suspect this:

  • the assumed distinctive sound change *ŋ > *ŋk in the Ugric languages;
  • the strong correlation of PU *ŋ with *ə-stems;
  • some comparative evidence from the Permic languages.

Ugric evidence, in typological light

It should be clear, I believe, that *ŋ > *ŋk is an “against-the-grain” sound change. Rather more often we can instead find the lenition/cluster simplification development *ŋk / *ŋg > /ŋ/ (thus e.g. in most Germanic languages; in various Indo-Aryan languages such as Bengali, Nepali and Sindhi; in Insular Celtic thru nasal mutation; or in Finnish thru consonant gradation). [2]

The opposite fortition development is surely not impossible, but I’m used to seeing it mainly in the context of a language banning /ŋ/ from its phonology. After such a change, it will be possible to re-map [ŋk] (or [ŋg]) as either a cluster /nk/ (/ng/) or as a prenasalized phoneme /ⁿk/ (/ⁿg/). One example of this within the Uralic languages are Swedish loanwords in earlier Finnish, up ’til about the 20th century: as final, or even word-medial unalternating /ŋ/ was for long not allowed, words like batong ‘baguette’, maräng ‘meringue’, salong ‘salon’  have been loaned as patonki, marenki, salonki (most of these of ultimately French origin, and thus in the process showing impressive unpacking of a single nasal vowel into an entire four-phoneme sequence). Another well-known example is the stereotypical Russian or more generally East European accent of English, which replaces final /ŋ/ with [ŋk].

The treatment of *ŋ in the Ugric languages is however rather a split, so simplification of the phoneme system is not available as a motive for assumed fortition. Distinct /ŋ/ remains in Khanty to this day (as in PU *suŋə > PKh *ɬoŋ > e.g. Obdorsk /lŏŋ/ ‘summer’), and I think that also the vocalization to /w/ ~ /j/ in Mansi is probably quite late. (The full story will be best left for another time, but some vowel shifts typical in the vicinity of original glides seem to be absent in words that originally had *ŋ; while a few words show variable treatment in the different Mansi dialects.)

Reconstructing instead *ŋx > *ŋk for the “epenthesis” cases would additionally allow tying this development together with the Ugric merger of *-k- and *-x- as *-ɣ-. These seemingly go in opposite directions (the former to a stop, the latter to a spirant), but perhaps in the latter case we should separate the spirantization from the merger per se: first *x > *k regardless of position, only later *-k- > *-ɣ-?

Phonetics tangent

At this point I will also remind that the notation “*-x-“, first introduced by Janhunen, is not intended to stand for a velar fricative (which in traditional UPA transcription is instead χ); it stands merely for a consonant of unknown quality. Before his time, scholars like Setälä or Collinder, and indeed still the UEW, employed *-ɣ- (traditional UPA γ). A value [x] might be still possible, but other options could be e.g. [g], [q], or even [k] (in which case it would be “plain” *-k- that was rather something else).

Since the distinction between PU *-k- and *-x- can only be substantially traced in Finnic, as *-k- versus zero, it seems that this has been earlier researchers’ main line of evidence for determining how to reconstruct *x. And this indeed suggests that early Finnic had some kind of a “weak” value like [ɣ] or [ɰ] as the reflex of *x. But it’s entirely possible that this already represents a secondary development particular to Finnic! Already right next door, Samic instead reflects *-x- uniformly as *-k- (cf. e.g. Pite Sami tuohkat ‘to bring’ ~ Fi. tuoda ‘id.’), which will be difficult to derive from an especially weak starting value.

Old explanations have proposed generalized consonant gradation in Samic, but I do not find this plausible either: -g- [ɣ] as the weak grade of -hk- < *-k- is restricted to the central dialects and looks more like areal Finnish influence than Proto-Samic inheritance. It seems contrived to first assume *ɣ : *ɣ generalized to *k : *ɣ, then this being again levelled to *k : *k in several varieties, instead of just *x > *k directly in all positions. [3] Also perhaps worth noting — if starting from *x as indeed [x], this and the development *ś > *ć could be considered the same process: the fortition of [+high] fricatives?

The *-k- / *-x- contrast also appears to surface in a tiny number of words in Mordvinic as a contrast in backness (/v/ versus /j/); but as also numerous other consonants are lenited medially to semivowels in Mordvinic (*-p- > /v/, *-ŋ- > /v/ ~ /j/, in some cases *-m- > /v/), this evidence is difficult to project back to any particular PU values.

It does not even seem impossible that the *x / *k and likewise *ŋx / *ŋk contrasts never existed in Ugric, and that there would instead have been a conditioned split in the more western languages. This approach was explored already long ago by Erkki Itkonen, who starting in the late 40s proposed to adjust reconstructions like *suxə- ‘to row’ to *suukə-; with loss of *k after long vowels then assumed in Finnic. [4] But as the idea of Finnic long vowels dating already back to Proto-Finno-Ugric has turned out untenable, and as it in any case clearly cannot apply to *ŋ-clusters, for now I will not speculate further on what kind of conditioning could be assumed instead. Still, the points in the next section may suggest some hints in this direction…

Stem type considerations

It’s been known for long that traditional PU *-x- only seems to be reconstructible before 2nd syllable *ə. The diagnostic Finnic CVV-stems do not ever appear to have Samic/Mordvinic/Samoyedic cognates [5] that would demand a reconstruction *CVxA. This is evidently also not solely due to Finnish stem contraction being entirely limited to *ə-stems, as there still are a number of examples of *oo or *öö deriving from earlier  *uwa/*uŋa or *üwä/*üŋä (one of the better-known examples being *voo ‘flow’ < PU *uwa).

I have not seen it noted, though, that also most cases of *-ŋ- in Proto-Uralic occur in *ə-stems. In the best-reconstructible vocabulary, the bare ratio appears to be 3 to 19:

  • *A-stems: *aŋa- ‘to open’, *čaŋa- ‘to hit’, *müŋä ‘backside’
  • *ə-stems: *oŋə ‘mouth’, *jäŋə ‘ice’, *kaŋərə ‘bow’, *kuŋə ‘moon’, *loŋə- ‘to throw’, *peŋərä ‘wheel’ (or *piŋärä? [6]), *piŋə ‘tooth’, *poŋə ‘breast’, *päŋə ‘head’, *püŋə ‘hazelhen’, *soŋə- ‘to enter’, *soŋə- ‘to wish’, *suŋə ‘summer’, *säŋə ‘air’, *śiŋə ‘support beam’, *šiŋərə ‘mouse’, *šuŋə ‘ghost’, *tüŋə ‘base’, *wiŋə ‘end’

By contrast, the distribution is almost even for the other PU nasals (*m, *n, *ń), perhaps slightly in favor of *A-stems. Examples with *m that appear reliably reconstructible to Proto-Uralic to me (are regularly reflected at least in one West Uralic and one East Uralic branch) show a ratio of 10 to 8:

  • *A-stems: *emä ‘mother’, *čama ‘direct’, *jama- ‘to be sick’, *kama ‘peel, skin’, *kuma ‘turned over’, *kämä ‘hard’, *d₂ümä ‘glue’, *ńoma ‘hare’, *oma ‘old’, *śuma ‘cap’
  • *ə-stems: *(ń)imə- ‘to suck’, *d₂ëmə ‘bird cherry’, *jëmə ‘gruel’ (> Mo. *jam, P. *jum, possibly partly Smy. *jä¹m), *komɜ ‘hollow’, *lumə ‘snow’, *lämə ‘broth’, *nimə ‘name’, *śëmə ‘fish scales’

Examples with *n show a ratio of 7-9 to 9:

  • *A-stems: *enä ‘big’, *ëna ‘mother-in-law’, *ona ‘short’, *kana ‘armpit’, *muna ‘egg’, *puna ‘hair’, *puna- ‘to plait’; perhaps #mona- ‘to say’, *śona ‘sleigh’
  • *ə-stems: *änə ‘sound’, *kanə- ‘to carry’, *menə- ‘to go’, *monə ‘many’, *panə- ‘to put’, *sënə ‘vein, sinew’, #śinə ‘coal’, *tonə- ‘to know’, *wenə- ‘to stretch’

And examples with *ń show a ratio of 6 to 3:

  • *A-stems: *ańa ‘older female relative’, *kuńa- ‘to blink’, *küńä ‘elbow’, *läńä ‘soft’, *mińä ‘daughter-in-law’, *pańa- ‘to press’
  • *ə-stems: *ëńə ‘tame’, *peńə ‘spoon’, *puńə- ‘to twist’

If we separate out in particular those examples with Ugric consistently pointing to *ŋk, the situation actually gets slightly more balanced: in the case of A-stems, ‘to hit’ and ‘backside’ remain, while under ə-stems, ‘ice’, ‘bow’, ‘to throw’, ‘tooth’, ‘hazelhen’, ‘air’, ‘mouse’ and ‘end’ remain. Still an 1 : 4 discrepancy, though.

This all is relevant if we consider Janhunen’s proposal that PU *x has come about thru a pre-Uralic conditional development specifically in *ə-stems. The details for this too are best found in his paper The primary laryngeal in Uralic and beyond, published in 2007 in SUST 253 (Pekka Sammallahti’s Festschrift), as cited already last time.

In particular he suggests consonant-stem formations as the key (though he still does not spell the process out too explicitly): a root like *mëxə ‘earth’ might have had the locative *mëx-na and the ablative *mëx-ta, which could be from pre-Proto-Uralic *mëQna, *mëQta, thru the lenition of some other consonant *Q in syllable-final position. After this, the vowel-stem forms would also have to have been generalized from *mëQə to *mëxə.

Janhunen proposes that this pre-Uralic *Q = *k. This seems unlikely to me however, since there are both several PU roots of the shape *CVkə (e.g. *kokə- ‘to check traps’, *lukə- ‘to count’, *jokə or *jëkə ‘river’) and instances of the cluster *kt (e.g. *ëkta- ‘to hang up e.g. a net’, *täktä ‘bone’, *toktə or *tëktə ‘loon’). In his article, he also notes that “laryngeals” in the world’s languages frequently derive also from other sources such as *s or *p. Interestingly, it so happens that in Proto-Uralic, roots of the shape *CVsə and *CVpə are also remarkably rare. The only examples that I find reliable seem to be *kosə- ‘to cough’, *jepəkä ‘owl’, and even if also including Finno-Permic roots, *jäsən(ə) ‘joint’. [7] The first might be simply a newer onomatopoetic innovation; the two latter are trisyllabic roots where there cannot have been vowel-stem/consonant-stem alternation between our target consonant and inflectional endings.

This particular approach, even if we widen our reach to also *p and *s as potential pre-PU sources of *x, doesn’t seem to work for explaining *-ŋxə- as coming from pre-PU *-NQə- ~ *-ŋx- though, since there are now some difficult-to-dispense-with counterexamples. If we went with *NQ = *mp, it will be difficult to explain *lämpə ‘warmth’ (not **läŋxə); if we went with *NQ = *ŋk, similarly e.g. *woŋkə ‘hole’ (not **woŋxə) will be a problem; and *NQ = *ns seems to be ruled out by the absense of any examples of *-ns- at all, including in *A-stems.

It’s additionally not at all clear to me how far back the characteristically Finnic consonant-stem alternation pattern *CV(C)Ci, *CV(C)CE-CCV ~ *CV(C)C-CV (as in Finnish partitives: viisi ‘five’, lumi ‘snow’ : viit-tä, lun-ta) really goes. There are some residues of this in Samic, but elsewhere the commonplace total loss of 2nd syllable *ə, especially after light syllables, gets in the way of analysis. Some early derivatives also look like they were originally based on a vowel stem, not a consonant stem. One telling case is Samoyedic *korå ‘bull reindeer’, which is clearly ultimately a derivative of PU *kojə ‘male’, and cognate to e.g. Fi. koiras ‘male’. Starting from PU *koj-ra would however predict PSmy **kåjrå — while starting from *kojə-ra will indeed predict PSmy *korå. (PU *o regularly remains only in roots of the shape *CoCə; *-jə- is regularly lost, as in PU *ujə- > PSmy *u- ‘to swim’. Contrast e.g. PU *ojwa > PSmy *åjwå ‘head’.)

I regardless think it’s probable that second-syllable *ə has somehow conditioned the rise of PU *x, even though for now we cannot identify what from, precisely. And even if assuming that some instances of western Uralic *-ŋ- are from *-ŋx- won’t explain the abundance of *-ŋə-roots in their entirety, it certainly won’t hurt either.

Permic evidence

This is probably the most comparatively interesting line of investigation. *ŋ shows a split development also in the Permic languages, being reflected as either *ŋ (> varyingly /m n ń/ in most varieties), or lost entirely. As I’ve alluded to already in my post on the treatment of *ŋ in Ugric two years ago, it appears that the Ugric contrast between *ŋ and *ŋk correlates with this, to an extent.

Group 1, with Permic *ŋ ~ Ugric *ŋ:

  • Udm. /ćińɨ/, Komi /ćuń/ ‘finger’ ~ Kh. *ćoŋən ‘knuckle’
  • ‘mouth’: Udm. /ɨm/, Komi /vom/, /əm/ ~ Kh. *ooŋ
  • ‘birch bark vessel’: Udm. /ľaŋes/, Komi /ľanəs/ ~ Kh. *jeŋəL
  • ‘tree stump’: Udm. /diŋ/, Komi /din/ ~ Hung. : töv-
  • ? ‘strawberry’: Udm. /emedź/, Komi /əmidź/ ~ Kh. *-ääńć, in compounds
    (if the latter is < *-ŋVć — but I would not rule out the Permic words also being fossilized compounds with a 2nd component from *äńśɜ, since *ŋ > m in an illabial environment is not really regular at all)

Group 2, with Permic zero ~ Ugric *ŋk:

  • ‘ice’: Udm. /jɨ/, Komi /jə/ ~ Hung. jég, Ms. *jääŋk, Kh. *jööŋk
  • ‘tree stump’: Udm. /jal/ ~ Kh. *jöŋkəL
  • Komi /mɨś/ ‘after’ ~ Hung. mëg ‘and’, mögé ‘behind’, etc. [8]
  • ‘larch’: Komi /ńia/ ~ Kh. *ńääŋk
  • ‘mouse’: Udm., Komi /šɨr/ ~ Hung. egér, Ms. *täŋkər, Kh. *ɬööŋkər

This pattern might seem unexpected: it’s my *ŋx that tends to develop to zero, not plain *ŋ. Possibly, in group 2, *ŋx first developed in Permic into a voiced plosive/affricate equivalent of *x, which was then lenited and lost; e.g. *[ɴq] > *ɢ > *ʁ > ∅, or *[ŋx] > *gɣ > *ɣ > ∅?

Though if things were exactly this clean, the correspondence would probably have been noticed already. There are also cases where the Ugric evidence is inconsistent:

  • Komi /ɨń/ (< *ïŋ?) ‘flame’ ~ Kh. *jääŋəL- ‘to roast’; — but Hung. ég- (< *-ŋk-) ‘to burn’
  • Udm. /pum/, Komi /pon/ ‘end’ ~ Hung. fej ‘head’, fő ‘main’; — but Ms. *pääŋk, Kh. *pööŋk ‘head’
  • Udm. /vand-/, Komi /vundɨ-/ (< *-ŋV-ta-) ‘to cut’ ~ Northern Kh. *waaŋ- ‘to hew’; — but Hung. vág- ‘to cut’, Ms. *waaŋk- ‘to hit’, Southern Kh. /waŋx-/; — and even Eastern Kh. /waaɣ-/?!

or outright contradictory evidence, with Permic *ŋ ~ Ugric *ŋk:

  • ‘tooth’: Udm., Komi /piń/ ~ Hung. fog, Ms. *päŋk, Kh. *pööŋk
  • Udm. /čɨŋ/, Komi /čɨn/ ‘smoke’ ~ Ms. *šeeŋkʷ ‘fog’ [9]
  • Komi /sɨnəd/ ‘air, smoke’ ~ Hung. ég ‘heaven’
  • ? Udm. /šońer/ ‘straight’, Komi /šań/ ‘good’ ~ Hung. igen ‘yes’
    (rather dubious; semantics are divergent, and this probably had Proto-Permic *ń, not *ŋ)

Regardless, there appears to be a total absense of cases with Permic zero ~ Ugric *ŋ, so I don’t think we can call the Permic and Ugric splits fully independent.

Some of the exceptions, especially in the former category, could involve secondary velar suffixes on the Ugric side (*päŋə ‘end’ → *päŋ-kä ‘head’? [10]), but stretching this explanation to all cases would be forced. Probably there’s something more complicated yet going on in here. One hypothesis to investigate might be that *ŋx > ∅ is only a conditional development in Permic.

An overall dearth of data is also a problem. The first category only contains one particularly good etymology with cognates widespread across Uralic (‘mouth’); the second, three (‘ice’, ‘mouse’, ‘behind’); the third, one (‘head’); the fourth, one (‘tooth’). Accounting for some etymologies with spottier distribution but at least some other good-looking cognates (‘tree stump’, ‘birch bark vessel’, ‘air’), the count rises to 3 : 3 : 1 : 2, with only a narrow and probably statistically insignificant majority for the correspondence pattern I suggest.

Further investigation is clearly required on several fronts, and I’m not yet fully attached to the idea of a cluster *ŋx. But for now, I conclude at least that reconstructing a single PU *ŋ behind both Ugric *ŋk and Ugric *ŋ can definitely be questioned, and other possibilities should be explored as well.

[1] I still often tend to transcribe the last one as *ńś following traditional approaches. It seems likely that phonetically the affricate value is more original though, but this ties into the thorny question of to what extent did PU have a contrast between *ś and *ć at all? There’s only any substantial evidence for a contrast initially and before *k, and even here different languages tend to point to different consonants.
[2] The common phonological constraint (see e.g. WALS) against word-initial /ŋ/ is surely also in large part due to this development trajectory. For any language with a CVC or CVCC maximal syllable template, and no /ŋ/ in its consonant system, the most likely pathway of developing /ŋ/ will be thru cluster simplification of some sort; which however will not be able to create any word-initial instances all by itself.
[3] Finno-Ugric studies have mostly long since shed Setälä’s infamously unfalsifiable early-1900s “theory” of all-encompassing consonant gradation in Proto-Finno-Ugric + massive levelling in all attested languages, but for some reason rudiments of this approach seem to have in Samic studies lingered until fairly late. As late as 1981, Mikko Korhonen’s handbook Johdatus lapin kielen historiaan still attempts to explain numerous regular sound changes that have no explicit relationship to gradation (e.g. *tk > Southern & Lule Sami rhk, Northern Sami ŧˈk : ŧk) as “generalized weak grades”.
[4] Itkonen, Erkki (1949): Beiträge zur Geschichte der einsilbigen Wortstämme im Finnischen. Finnisch-Ugrische Forschungen 30: 1–54.
[5] Cognates in the other languages could probably not be used to rule out a reconstruction as an *ə-stem.
[6] I’m following UEW in positing *-eŋə-, but it’s possible that this is a bad idea: the reconstruction seems to have been put together mainly by reference to Finnic, and there mainly by appeal to analogy with *söö- < *sewə- ‘to eat’. However, *püwärä < *piwärä < *piŋärä would work as the pre-Finnic proto-form just as well. The *ä in Mansi *päɣärt- (? *päŋärt-) may also point in this direction, as *e-ə usually yields *i. (By contrast I’m not putting heavy stock on the second-syllable vowel, which could well be secondary; cf. e.g. Southern Mansi /kaal-/ ~ /kalaa-/ ‘to die’ < PMs *kaal(a)- < PU *kalə-.)
[7] *kowsə ‘spruce’ needs to be reconstructed with a consonant cluster and cannot work as a counterexample.
[8] Mansi *mänt ‘along’, if it belong here, does not seem like an exception; this can be either due to early *ŋt > *nt, or due to late *ŋkt > *nt.
[9] Khanty *čüüɣ ‘fog’ is also listed here by UEW, but Ante Aikio’s etymology that derives this from PU *čäkə (~ Samic *cēkë) seems preferrable to me. Although I still wonder how Finnic *häkä ‘carbon monoxide’ fits into the picture.
[10] This latter derivative seems to exist in Samic at least: *pāŋkē ‘reindeer’s headgear’. Older comparisons linking the word to e.g. Finnic *panka ‘handle’ don’t seem very convincing to me.

2 comments on “Proto-Uralic *ŋx?
  1. Sartmoulou says:

    I was wondering what do you think of this proposition that Consonant Gradation was fortition rather than lenition:
    Admittedly the author doesn’t appear to be a specialist in Uralic comparative linguistics, but I feel like it could explain the otherwise unattested Samic reflex of *x as a plosive.

    • j. says:

      Thank you for reminding me of this paper — I’ve seen it long ago already, but I seem to have lost the copy since then. I mostly accept the main thesis of the paper: that foot balancing has been the phonetic process thru which gradation has originally developed. One point that could be added is that Finnish is already known to have levelled half-long consonants to singletons (in the weak grade of geminate stops), which would seem to naturally lead to the levelling of any earlier phonetic alternation in single non-stop consonants as well.

      Gordon however appears to treat the traditionally assumed Finno-Samic clade as a given. If this were to be incorrect, his thesis will have to be either amended also at least for Mordvinic, or recast in terms of areal processes. Gradation in Nganasan will require accounting for as well; and some word on the gradation-less Southern Sami, Veps and Livonian wouldn’t hurt either. It seems certain to me that gradation has only become phonologized separately in each three branches (in particular, in Samoyedic it seems to postdate several overarching sound changes in the consonant system such as *Ck > *C, *Cː > *C, *s > *t), and it might be a good idea to refine his argument to one according to which the rise of gradation is predicted already from the particular type of trochaic stress found in Uralic.

      This however seems to leave *x > *k as either too early to possibly be gradation-related; or, if we want to consider gradation early and *x > *k areal across Samic: unmotivated in Southern Sami, which appears to have lost gradation already very early on. (Though it does have its own share of interesting prosodic developments, and I am not sure how far back they go in turn.)

