PIE verb roots, for the people

Last fall I blogged about a possible project on charting the distribution of reconstructed Proto-Indo-European terms in the descendants languages. Some discussion on here focused on the likely unreliability of the data, sourced for my initial survey from a conveniently available but unreferenced Wiktionary appendix.

This was not a choice out of ignorance as much as out of availability. To my knowledge, no public database of reasonably up-to-date etymological Indo-European data is currently available anywhere.

There is no reason though for us to resign to an inequal access to information, with easily found free data being of poor quality vs. “proper” data being locked away in exorbitantly expensive dead-tree-format publications. Data and theories, per se, are uncopyrightable, after all.

I am therefore happy to announce having digitized a list of PIE verb roots, as recorded in the LIV + in its online Addenda und Corrigenda. [1] A basic version is available at the English Wiktionary. You may also be interested in taking a look at the fully tabulated data, in spreadsheet form. The notes in my master file on word derivation and distribution are sketchy at best though, and will require further work to fill in. [2]

While this file is probably necessarily public domain, if anyone reading ends up using or referencing it somewhere, I would appreciate a shoutout or similar.

As comes to actual analysis, at this point the data mainly allows a look at root structure. I might as well note in this post some basic facts that stick out.

For starters, the usual stop phonation constraints (against **D-D, **T-Dʰ, **Dʰ-T) surface reliably. A more interesting related pattern emerges too: I’ve sometimes seen it suspected that the unusual PIE cluster *wr- could come from earlier *br-, therefore tying together with the lack of stem-initial *b-. (Not a lack altogether: at least in the preliminary data, *b still occurs often enough in stem-final position.) However, if this was assumed, we would end up with quite a large number of pre-PIE stems of the shape *b-D; 5 of the 12 roots with *wr- show a stem-final voiced stop; as in *wreg- ‘einer Spur folgen’. So either we’d need to also assume the reconstructible voicing constraints to have emerged only later; or to fine-tune this hypothesis to some kind of a chainshift like *bʰ- > *w-, *b- > *bʰ-.

I would be content to abandon the idea though and to instead assume that most cases of *wr- have rather arisen either thru the reduction of a 1st syllable of earlier roots (in PIE-internal terms ≈ as zero-grade derivatives of some root shaped *(C)wer-, *Cewr-), or thru some Schwebeablaut-ish metathesis process.

There is more interesting stuff going on with resonants. I do not recall seeing this discussed in the context of PIE root structure anywhere before (which of course could be ignorance on my behalf), but several non-trivial constraints on their distribution are apparent. Here are some quick observations on this topic:

  1. No roots — or perhaps better: “sonorant cores” of a shape **-R₁eR₁- occur. This is a fairly trivial application of the universal principle of Similar Place Avoidance, though.
  2. No cores of a shape **-ler-, **-rel- occur either. Again, this is fairly simple to understand as similar consonant avoidance.
  3. The core **-nel- is also absent: this seems less expected, but may have the same motivation as the above. It could also be an accidental gap, though, as onset *n- is relatively rare altogether, and *-len- is well attested. Perhaps it is rather the abundance of *-ney- and *-new- roots that should be questioned.
  4. *m in the onset does not appear to quite count as a sonorant. There are just about no roots beginning with a cluster *Tm-, where *T would be a stop consonant (the lone example is *dʰmeH- ‘blasen’). We do find *sm-, *Hm-, but then again, *sT- and *HT- are possible just as well.
    This also lines up well with how a few cases of *mR- occur as well. Historically, they seem likely to be mostly “zero-grade clusters” again; but this etymological explanation does not suffice to explain the absense of other sonorant-sonorant clusters such as **nR-, **lR-.
  5. Sonorant cores of a shape *-yeR- seem unexpectedly rare altogether. No examples with **-yel-, **-yer-, **-yen- occur at all, and only a single example of *-yem-.
  6. Conversely, even when looking at roots with stem-final obstruents only, onset *-y- is curiously common preceding a stem-final back consonant (velar, laryngeal or *w): 29 cases out of 33, or 88%, show this environment! I wonder if we could assume that such roots reflect some specific pre-PIE front vowel, which was diphthongized to *ye before back consonants. It would likely have to be separate from the source of PIE *-ey- though, which does not seem to have any aversion against occurring before velars and laryngeals.
  7. Initial *h₂w- appears to be more common than all other laryngeal + glide clusters altogether, and it is also quite common stem-finally (i.e. as *-h₂w-, not *-wh₂-!). I wonder if this should be assumed to represent an earlier single phoneme such as *[ħʷ], created even further back from the ancestor of *h₂ by the same processes that led to the rise of the PIE labiovelar series?

I could extend my discussion to onset and stem-final consonant clusters as well, but they do not seem to show anything especially interesting for me to raise up just yet.

[1] Two corrections on reconstruction remain mysterious to me: an alleged removal of a root **meyH- ‘lang werden’ (the two roots I’ve recorded with this shape do not seem to have such a meaning), and the adjustment of a root *kelh₁- to *k¹elh₁- (no such root occurs in the original data; although the root *kel- ‘antreiben’ is adjusted to *kelh₁- in another correction).
[2] I have at the moment no recollection what the column labeled “st” signifies, but I am leaving it in for possible further elaboration.
edit: On re-checking the data, apparently this indicates the number of branches with verbal reflexes given by LIV in the running text. However, footnotes often list nominal derivations, and closer checking also shows that some entries even list a few additional uncertain verbal reflexes in footnotes… meaning that this will be not quite an actual measure of the distribution of the reflexes. Perhaps I will remove this in later editions.

Tagged with: , , , , , ,
Posted in Uncategorized

A note on the Mitian Argument

An article to have caught my attention tonight: Mikael Parkvall (2008), Which parts of language are the most stable?, Sprachtypologie und Universalienforschung 61/3.

The main momentum of the paper is to define a statistical measure of the “arealness” or “geneticness” of a particular linguistic feature. This can be accomplished with fairly elementary calculations, once given a large dataset (the author uses, not especially surprizingly, WALS). Typologists will likely find the excercise illustrative, both in its general array of eyeball-able results, and in demonstrating how even the simplest bit of math can go a long way. [1]

One result stands out to me: among the features found the most strongly genetic, at #3 stands “M-T pronouns” — i.e. the likes of Uralic *minä, *tinä, and their suggested distant relatives in Indo-European, Yukaghir, Turkic, Mongolic, etc. (families that, taken together, form a subset of the Nostratic macrofamily hypothesis known as “Mitian”). Parkvall does not fail to notice this result either.

This may still require a number of caveats. WALS does not pack a very large number of etymological data sets, and is more geared towards features that can instead illuminate areal patterns. And, perhaps as a warning, the #1 most genetic feature on the list turns out to be “presence of phonemic clicks”.

As people who dabble in linguistic classification most probably know, click consonants have traditionally been held as a defining marker of an alleged “Khoisan” language family of southern Africa, first proposed by notorious “lumper” Joe Greenberg. However, putting together more conventional evidence for this grouping has over the years proven near-impossible, and these days conservative analyses instead seem to have settled on distinguishing some 3-4 separate families (the larger units with some acceptance being Khoe, Tuu, and Ju-ǂHoan) in place of unified Khoisan.

(An additional point, if you look closely at the math behind the stats, is that the highly genetic assessment of clicks gets a slice of its homogeneity score not just from the high homogeneity of the “Khoisan” families in their presence of clicks; but also from the complete homogeneity of all non-African language families in their absense of clicks. This argument can be expected to equally apply to any other trait that is truly a single-family or single-geographical-area idiosyncracy, rather than one found sporadically around the world.)

Regardless, we see “Mitianness” still squarely beating out various common tell-tale signs of established-family genetic relatedness, such as the presence of ejectives; sex-based noun gender systems; or polysynthesis.

At some point in the future, once we have an “etymological WALS” at our disposal, it would be moreover interesting to repeat this experiment with a few other lexical variables. E.g. how do numerals or body parts stack against pronouns in genetic classification? What are the stablest kinship terms? How good a job does the Swadesh list really do? Are there any interesting surprizes to be found in words for abstract concepts? Do old and universal enough cultural concepts (think “pottery”, “hunting technology”) behave as if they were core vocabulary? Etc, etc, time will tell.

[1] Of course, something like 90% of the time, “the simplest bit of maths” seems to be all that we have yet in linguistics. This is surely great news for people who are not professionals, but who want to follow linguistics arguments along from home; or for the career plans of people like myself, who know enough undergrad-level maths to craft a couple other elementary mathematical tools for testing this or that hypothesis, if necessary. On the other hand, it is a less than promising sign about the overall quantitative reliability of our field in general, so far…

Tagged with: , , , , ,
Posted in Commentary

On *ü in Mari vs. Proto-Uralic

It is always a low note of sorts when a scientific dispute gets resolved by quietly shifting consensus (e.g. due to proponents of one side passing away) rather than by actual discussion.

One of these seems to be the status of Proto-Uralic *ü. In literature up to about the mid-1900s, various skeptical viewpoints can be found on if a contrast between *i and *ü should be reconstructed or not. They dwindle away in later times however, with the modern researcher only really encountering any trace of the issue when perusing the UEW, which still provides proto-forms with *ü only as an alternative to proto-forms with *i. So far I have regardless been unable to locate any turning point source that argues in detail in favor of establishing *ü after all.

For sure, all major overviews of comparative Uralic vocalism (Steinitz 1944, Collinder 1960, Sammallahti 1988) still reconstruct contrastive front rounded *ü (or, in the case of Steinitz, largely equivalent reduced *ö̆), and give what they see as the regular later development in most individual languages. It is thus fairly simple to reverse-engineer a rough argument for in which cases to reconstruct *ü. Altogether, especially the following three contrasts appear to be relatively robust and in etymological correspondence to each other:

  • Finnic *i : *ü
  • Hungarian ë : ö
  • Khanty *e : *ö (perhaps rather *[ɪ] : *[ʏ])

Also the *i : *ɨ contrast in Permic correlates well with this (though *ɨ can also derive from PU *u and *ä).

Numerous further conditional developments, including also indirect traces in several Uralic languages that lack front rounded vowels, have also been identified. Collating these in one place would probably amount to an almost full answer to old skeptical viewpoints, which mostly have focused on the possibility that the contrasts seemingly pointing to *i : *ü have separately developed in each language.

I think one subgroup remains an open problem though. A phonetically equivalent contrast also appears in Mari, between *ĭ (> generally /ə/, in a couple of dialects /ɪ/ or /i/) and *ü̆ (> Hill Mari /ə̈ ~ ʏ/, Meadow Mari /y/). But this particular contrast seems to do a poor job at matching with the Proto-Uralic *i : *ü contrast, as could be reconstructed on the basis of the other languages. While reflexes with “correct” labiality seem to be in the lead, an abundance of counterexamples is also apparent: [1]

  • PU *i > Ma *ĭ: 15 cases
    *ićä ‘father’ > *ĭćä ‘older brother’, *kičək > *kĭčək ‘fresh snow’, *kirä- > *kĭre- ‘to hit’, *kiśkə- > *kĭške- ‘to throw’, *minä > *mĭńə ‘I’, *ńičkä- > *jĭčke- ‘to pluck’, *pićlä > *pĭćle ‘rowan’, *pilwə > *pĭl ‘cloud’, *pištä- > *pĭšte- ‘to put’, *pitä- > *pĭće- ‘to hold’, *śikšta (← II) > *šĭštə ‘beeswax’, *śilmä > *šĭnćä ‘eye’, *tinä > *tĭńə ‘thou’, *wittə > *wĭć ‘5’
  • PU *i > Ma *ü̆: 6 cases
    *kiwə > *kü ‘stone’, *piŋə > *pü ‘tooth’, *nimə > *lü̆m ‘name’, *śixələ > *šülə ‘hedgehog’, *šikšna (← Baltic) > *šü̆štə ‘strap’, *sitV- ‘to bind’ > *šüðəš ‘bind’
  • PU *ü > Ma *ĭ: 9 cases
    *küjə > *kĭškə ‘snake’, *külmä > *kĭlmə ‘cold’, *küńärä > *kĭńer ‘elbow’, *kütkə- > *kĭćke- ‘to harness’, *mükkä > *mĭk ‘mute’, *ńüktä- > *ńĭktä- ‘to pluck’, *süjə > *šĭjä ‘year ring’, *sükəśə > *šĭžə ‘autumn’, *śüklä (← Turkic) > *šĭɣəľə ‘wart’
  • PU *ü > Ma *ü̆: 11 cases
    *d₂ümä > *lü̆mə ‘glue’, *künčə > *kü̆č ‘nail’, *künčä- > *kü̆nče- ‘to dig’, *küsV > *kü̆žɣə ‘thick’, *kütV > *kü̆ðäl ‘middle’, *sülə > *šü̆lə ‘fathom’, *süskV- > *šü̆škä- ‘to cram’, *śüd₁ə > *šü ‘coal’, *śülkə > *šüwəl ‘spit’, *türə > *tü̆rəś ‘full’, [2] *tüŋə > *tü̆ŋ ‘base’, *wülä > *wü̆l- ‘over’

PU *e also mostly yields Ma *ĭ or *ü̆, again split fairly evenly.

  • PU *e > Ma *ĭ: 15 cases
    *e- > *ĭ- ‘negative verb’, *elä- > *ĭle- ‘to live’, *eštə- ‘to be in time’ > *ĭšte- ‘to do’, *jećə > *ĭške ‘self’, *jekä > *i ‘year’, *keltä- > *kĭlðe- ‘to bind’, *kenčV- > *kĭčälä- ‘to serch’, *neljä > *nĭl ‘4’, *le- > *liä- ‘to be’, *leštə > *lĭštäš ‘leaf’, *peljä > *pĭləkš ‘ear’, *penä > *pi ‘dog’, *pesä > *pĭžäkš ‘nest’, *repäś (← II) > *rĭwəž ‘fox’, *śerV > *sĭr ‘character, nature’
  • PU *e > Ma *ü̆: 12 cases
    *jetV > *jü̆t ‘night’, *kejə- > *küä- ‘to boil’, *kerə > *kü̆r ‘bast’, *pečä > *pü̆nčə ‘pine’, *pečkV- > *pü̆čkä- ‘to cut’, *sesar (← IE) > *šü̆žar ‘sister’, *śečä > *čü̆čə ‘uncle’, *śepä > *šü ‘neck’, *tejnəš (← II) > *tü̆əž ‘pregnant’, *terä (← II) > *tü̆r ‘blade’, *werə > *wü̆r ‘blood’, *wetə > *wü̆t ‘water’

I have included here cases with Proto-Mari *i and *ü only in stems of the shape CV(V-), where the appearence of “full” rather than “reduced” vowels is regular. Some other examples exist as well though, such as *ik ‘one’ (< *ü?), *üpš ‘smell’ (< *i?).

Existing literature does not seem to tackle the issue, and often I get the feeling that authors essentially try to sweep the problem under the carpet. Sammallahti leaves the history of Mari vocalism untreated. Collinder offers, for the cases with *e > *ü̆, only the slightly ad hoc rule that this development occurs “in the vicinity of *w and *r”, while he does not comment on the cases with *i > *ü̆ or *ü > *ĭ. Steinitz’ approach posits a late development *ĭ > *ü̆ again in the vicinity of labial consonants (and raises the possibility that it applies only to Meadow Mari and not even Proto-Mari), but leaves the other cases untreated.

I have not seen any specialized studies that would have fared better either. E. Itkonen in his major 1954 article on the history of Mari and Permic vocalism even explicitly notes that labiality assimilations that he posits next to *w, *p, *r cannot be considered regular. Contrast indeed e.g. ‘blood’ (*we- > *wü̆-) vs. ‘five’ (*wi- > *wĭ-), ‘tooth’ (*pi- > *pü-) vs. ‘cloud’ (*pi- > *pĭ-), ‘blade’ (*-er- > *-ü̆r) vs. ‘to hit’ (*-ir- > *-ĭr-). — Also, since when is *r a labial consonant anyway?

I suspect that already the basic assumptions underlying earlier research on this are incorrect. Instead of the developments *i > *ü̆ and *ü > *ĭ being some kind of exception cases to be explained away, the old skeptic contingent has been right this time: the contrast between Proto-Mari *ĭ and *ü̆ is unrelated to the contrast between Proto-Uralic *i and *ü. Rather, PU *i, *ü and *e merged in the early history of Mari, and this merged phoneme (I will mark it simply as *i) later secondarily split into *i > *ĭ and *ü > *ü̆ again — without regard for its PU origins.

The best single conditioning factor instead appears to be stem type:

  • *i-ä > *ĭ: 23 cases
    *elä- > *ĭle-, *ićä > *ĭćä, *jekä > *i, *külmä > *kĭlmə, *keltä- > *kĭlðe-, *küńärä > *kĭńer, *kirä- > *kĭre-, *minä > *mĭńə, *mükkä > *mĭk, *neljä > *nĭl, *ńičkä- > *jĭčke-, *ńüktä- > ńĭktä-, *pićlä > *pĭćle, *peljä > *pĭləkš, *penä > *pi, *pesä > *pĭžäkš, *pištä- > *pĭšte-, *pitä- > *pĭće-, *repäś > *rĭwəž, *śüklä > *śĭɣəľə, *śikšta > *šĭštə, *śilmä > *šĭnćä, *tinä > *tińə
  • *i-ä > *ü̆: 9 cases
    *d₂ümä > *lü̆mə, *künčä- > *kü̆nče-, *pečä > *pü̆nčə, *sesar > *šü̆žar, *śečä > *čü̆čə, *śepä > *šü, *šikšna > *šü̆štə, *terä > *tü̆r, *wülä > *wü̆l-
  • *i-ə > *ĭ: 11 cases
    *eštə- > *ĭšte-, *jećə > *ĭške, *kičək > *kĭčək, *küjə > *kĭškə, *kiśkə- > *kĭške-, *kütkə- > *kĭćke-, *leštə > *lĭštäš, *pilwə > *pĭl, *süjə > *šĭjä, *sükəśə > *šĭžə, *wittə > *wĭć
  • *i-ə > *ü̆: 15 cases
    *kejə- > *küä-, *künčə > *kü̆č, *kerə > *kü̆r, *kiwə > *kü, *nimə > *lü̆m, *piŋə > *pü, *sülə > *šü̆lə, *śüd₁ə > *šü, *śülkə > *šü̆wəl, *śixələ > *šülə, *tejnəš > *tüəž, *türə > *tü̆rəś, *tüŋə > *tü̆ŋ, *werə > *wü̆r, *wetə > *wü̆t
  • unclear/inapplicable > *ĭ: 4 cases
    *e- > *ĭ-, *kenčV- > *kĭčälä-, *le- > *liä-, *śerV > *sĭr
  • unclear > *ü̆: 6 cases
    *jetV > *jü̆t, *kütV > *kü̆ðäl, *küsV > *kü̆žɣə, *pečkV- >*pü̆čkä-, *süskV- > *sü̆skä-, *sitV- > *šüðəš

The raw accuracy of the maintenance hypothesis (*i > *ĭ, *ü > *ü̆) seems to be 26 cases predicted correctly out of 41 ≈ 63.5% (worse if we also wanted to presume *e > *ĭ). Assuming the typical reflexation to be *i-ä > *ĭ, *i-ə > *ü̆ instead reaches up to 38 correctly predicted out of 58 ≈ 65.5 %. Which is so far only marginally better… But there is room for fine-tuning here as well.

Some of the apparent exceptions in verb roots can be readily interpreted to indicate a shift of stem type in pre-Mari. *ĭšte- ‘to do’, *kĭške- ‘to throw’ and *kĭćke- ‘to harness’ (in red above) show 2nd syllable *e, which normally corresponds with PU *A-stem verbs; thus I would reconstruct pre-Mari *ist-ä-, *kiśk-ä- and *kitk-ä-. Here *-ä- is probably some kind of a transitivizing suffix, well known in Mari (the classic example is probably /koða-/ ‘to stay’ : /koð-e-/ ‘to leave’) and probably dating to earlier times already (reconstructible in a small number of PU doublets such as *künčə ‘nail’ ~ *künč-ä- ‘to plough/dig’; *ipsə ‘smell’ ~ *ips-ä- ‘to smell’). We could also take the final *-e, rather rare in nominals, of *ĭške ‘self’ as grounds to reconstruct pre-Mari *(j)iś-kä.

Similarly, *pü̆čkä- ‘to cut’, *šü̆škä- ‘to cram’ (in blue above) show 2nd syllable *ä, which normally corresponds with PU *ə-stems; and therefore I would reconstruct pre-Mari *pičkə-, *siskə-. The former thus turns out better compareable with Mordvinic *pečkə- ‘to cut’ than with Samic *peackē- ‘to cut (off)’ (< *pečk-ä-), and the latter with Samic *sëskë- ‘to rub against’ than with Fi. sysä-, Es. süska- ‘to push into’.

(This on the other hand creates new problems for *kĭčälä- ‘to serch’, *liä- ‘to be’, *ńĭktä- ‘to pluck’, which now start pointing to earlier *ə-stems…)

I would also take *kü̆žɣə ‘thick’ (also in blue) as pointing to earlier *kizəgV < *küsəkV (akin to Proto-Samic *kësëkV > Northern Sami gassat etc.), rather than the bare root *küsä that most sources report. Perhaps even *kĭškə ‘snake’ should be taken as pointing to PU *küjəwä (> Erzya /kijov/, Hung. kígyó, Smy. *kiwä) > pre-Mari *kiwä(-skV) rather than the bare root *küjə (> PF *küü, Udm. /kɨj/ [3]).

Nominal derivation phenomena could lie behind some of the other exceptions as well, though due to the non-maintenance of the PU stem vowel contrasts in Mari nominals, this will have to be more speculative. For example, Finnic *kidek ‘snowflake’ has a number of parallel derivatives etc. in the descendant languages, and the original root may well have been *kičä rather than *kičə. It would be also possible to assume PU *kičäk, and date the development *-Ak > *-Ek (as seen in cases such as Fi. jauha- ‘to grind’ ~ jauhe ‘powder’; jättä- ‘to leave behind’ ~ jäte ‘trash’) as inner-Finnic.

Consonant environment conditioning does not need to be ruled out entirely either. E.g. *šü ‘neck’ could be taken back to pre-Mari *siw(ä), and *šĭjä ‘year ring’ to pre-Mari *sijə, with the natural developments *iw > *ü̆ and *ij > *ĭ bleeding the usual stem type conditioning. (This provides also another possible line of explanation for ‘snake’.) The latter rule could be even generalized slightly to also capture *wĭć ‘5’.

The phonetics of this hypothesis do not have to be left arbitrary either: a kind of palatal umlaut mechanism seems to work. The root structure *i-ä > *ĭ(-e) remains consistently front-vocalic and illabial; while the root structure *i-ə would probably have been first retracted to something like *[ɨ]-[ə]. After this, I would suppose central *ɨ was labialized to [ʉ], and then re-fronted > [y] > [ʏ]. This development appears internally unmotivated (it could possibly be attributed to areal influence from Turkic) — but it has a good precedent in the fact that Mari is the only Uralic language with a front rounded reflex of PU *ë, for which we must then reconstruct the exactly parallel development [ɤ~ɜ] > [ɵ] > [ø] > [y].

Later vowel harmony between /a ~ ä/, as attested in Hill Mari (but not Meadow Mari) was likely not yet in effect by this stage. This appears to be shown by the straggling cases of Proto-Mari *ĭ-ä: where *ĭ is further reduced and retracted to /ə/ in Hill Mari, the stem vowel surfaces as /a/, not as /ä/. Cf. e.g. /kəčala-/ ‘to serch’, /ńəkta-/ ‘to skin’, /šəja/ ‘year ring’.

[1] This selection has been datamined from both older and newer literature. Individual referencing would go beyond the purposes of this blog post. Various dubious or difficult-to-reconstruct comparisons have been omitted, including e.g. most cases where some or most other reflexes point to original *ä rather than *e.
[2] To my knowledge, this comparison has not been previously presented, though it seems self-evident. The identity of the “suffix” is unclear to me however.
[3] Even this might derive from the longer form *küjəwä: contrast *süjə > /si/ ‘year ring’. Perhaps thus: *süjə > *süj > *si, but *küjəwä > *küjə > *kɨj?

Tagged with: , , , ,
Posted in Commentary, Reconstruction

More on umlaut chronology in Samic

I recently proposed that the fission of Proto-Uralic *ä and *e into more open and more close vowels in Samic, depending on the following second-syllable vowels (“stem type”), should be dated already to the dialectal West Uralic era, given that similar developments appear also in their closest relatives: the Finnic and Mordvinic languages. This diverges in a couple of ways from the views in the main handbooks on the historical development of Samic, i.e. Korhonen (1981): Johdatus lapin kielen historiaan and Sammallahti (1998): The Saami Languages: An Introduction.

One basic disagreement is over absolute chronology. While both Korhonen and Sammallahti (henceforth: K & S) agree that at least the merger of the stem types *e-ə and *i-ə [1] should indeed be dated to the earliest phase of the pre-Proto-Samic era, their treatises begin from the now obsolete “Proto-Finno-Samic”, dated as some half a millennia later, reconstructed with cross-reference mainly to Finnic, and usually also located some 1000 km more westerly (in the Gulf of Finland area) than my reference point in common West Uralic (around the upper reaches of the Volga). [2]

Another however concerns the overall relative chronology. K & S present the historical phonology of Samic in a highly tiered fashion that makes for some very attractive charts and graphics, with roughly four distinct periods of development:

  • an early phase (K’s “kantalapin I vaihe“, S’s “Pre-Saamic” and “Proto-Saamic 2“) with the loss of several inherited vowel contrasts, and the splitting of this smaller pre-Samic vowel system into several allophones, depending on stem type;
  • a complete revamp of the vowel length system (K’s “kantalapin II vaihe“, S’s “Proto-Saamic 3” in parts), depending on earlier vowel qualities;
  • a restructuring of the system of unstressed vowels (K’s “kantalapin III vaihe“, S’s “Proto-Saamic 3” in parts as well as “Proto-Saamic 4“)
  • late phonetic shifts in the sound values of several stressed vowels (K’s “kantalapin IV vaihe“, S’s “Proto-Saamic 5“).

As I have mentioned in an exchange in the comments section, I am however skeptical of the historical reality of this model. It strikes me as unnaturalistically neat altogether. Only a few of the changes can be explicitly shown to have been in the presented order in relative chronology, and probably most of the distinct “phases” here should be meshed together. Others might even be post-Proto-Samic entirely (though that will be another topic).

In particular I do not think that all Proto-Samic umlaut developments should be considered equally early. The Samic languages are some of the most “umlaut-rich” languages within Uralic, and the individual languages have continued to innovate new changes of this type pretty much as soon as new features arise among the unstressed vowel system. In this context it seems entirely implausible to me that at one point the pre-Proto-Samic speakers would have collectively decided “ok, that’s enough for now, let’s call a 500-year moratorium on umlauts”.

More specifically, while I think that developments *ä-ä > *ȧ-ȧ (> PS *ā-ē) versus *ä-ə > *e-ə (> PS *ē-ë) might be even earlier than has been previously suspected, by contrast I think that the a-umlaut of inherited *e and *o (e.g. PU *pesä >> PS *peasē ‘nest’; PU *kota >> PS *koatē ‘tent, teepee’) must be instead dated to a somewhat later Proto-Samic phase. This is due to some exception cases that appear to be explainable by them having been subject to both umlauts.

Umlaut stacking

It’s been observed already since the earliest reconstruction work on Uralic vocalism that PS *ea fairly often turns up in the Samic languages as a reflex of earlier *ä. Explanations for these cases have varied quite a bit, from considering this the regular reflex of the stem type *ä-ä (this was the opinion of Wolfgang Steinitz), to dismissing all instances as irregular or “sporadic” (thus K & S). Neither extreme is satisfying though, and it would be desirable to identify some conditions for the development. Dating the umlauts of *ä and *e into two different chronological stages seems to offer a lead on this.

If we assume that the pre-Samic dialect of late West Uralic — I will call it “pre-Samic” or “preS” for short — had already raised *ä-ə to *e-ə, as in words like the following:

  • PU *jäŋə [jɛŋə] > preS *jeŋə > PS *jēŋë ‘ice’
  • PU *kälə [kɛlə] > preS *kelə > PS *kēlë ‘tongue’
  • PU *mälkə [mɛlkə] > preS *melkə > PS *mēlkë ‘breast’

— then at this point, a derivational process turning one of these *ə-stem words into an *ä-stem word would allow it to be later subject to a-umlaut just as inherited *e is, yielding PS *ea. There appear to be some clear examples that involve the syncope of *-ə- upon the addition of a derivational suffix: PU *CäCə > preS *CeCə → *CeCə-Cä > *CeCCä > PS *CeaCCē. Some other examples involve a derivational process that leads to a pre-Samic *o-stem (which similarly trigger a-umlaut): preS *CeCə → *CeC-o > PS *CeaCō. [3]

This mechanism appears to explain a reasonable number of the cases of PU *ä yielding PS *ea. Thus far, I have identified seven possible front-vocalic cases (including one somewhat speculative new etymological proposal):

  • *keaćō ‘medium-sized whitefish’ (only in Lule Sami: getjuk) < preS *keć-o
    ← *kećə < PU *käśə(ŋ)
    Cf. Mansi *kääsəŋ, Hungarian keszeg ‘bream’, which both indicate earlier *ä. (Finnish keso ‘white bream’ has also been considered cognate, but is better derived from kesä ‘summer’.)
  • *leapō- (Lule Sami lehpagis ‘nice’, Old Swedish Sami leppotet) < preS *lep-o-
    ← *le(p)pə < PU *lä(p)pə
    Cf. Moksha /ľäpä/ ‘weak’, Mari *lewə ‘warm, mild (of weather)’, Khanty *leepət ‘weak’, which indicate *ä. Finnic *leppedä ‘balmy’ again looks like the odd member out in the cognate set. The similar *leepedä ‘mild’ could be instead compared here just about as easily. [4]
  • *meanō- ‘to become evasive’ < preS *men-o-
    ← *menə- < PU *mänə-
    Cf. Mordvinic *mäńə- ‘to dodge, to get free’, Komi /mɨn-/ ‘to get free’, Hungarian mentes ‘free’, which indicate *ä. The verb *mänə- ‘to get free’ is probably ultimately somehow related to *menə- ‘to go’, but the cognates suggest the two having been distinct already at the PU level. (I additionally wonder if contamination from the former could perhaps explain the irregular vowel in Savonian/Karelian mäne- ‘to go’.)
  • *peajō- ‘to shine’ < preS *pej-o-
  • *peajvē ‘day’ < preS *pejwä < *pejə-wä
    both ← *pejə < PU *päjə ‘bright, shining, etc.’
    The bare root does not appear to unambiguously survive anywhere (perhaps in Komi /bi/ ‘fire’?), but numerous other derivatives generally indicate *ä, e.g. Finnic *päivä ‘day, sun’, Hungarian fehér ‘white’.
  • *pealkē ‘thumb’ < preS *pelkä < *peləkkä < PU *pälə-kkä
    Cf. Mordvinic *päĺka, where the unvoiced cluster *ĺk must be secondary (PU *lk would have yielded **ĺg). Komi /pel ~ pev/ also suggests *ä. The underived root could be identified with *pälə ‘side’, as has been proposed by Janhunen. The messy Finnic words for ‘thumb’, often included here, mostly point to  PF *peikala or *peikoi; and they probably need to be kept separate (at best some kind of secondary contamination of the original Uralic word with some other source could be involved).
  • *veakkē ‘help’ < preS *wekkä < *wekə-(k)kä
    Formally, this might be a derivative of PU *wäkə ‘power’ > preS *wekə (> PS *vēkë ‘people’). A semantic intermediate ‘activity with several people, work bee’ could be involved.

It is however necessary to also assume similar but even earlier syncope in some other old derivatives, which do show regular a-umlaut in Samic.

  • *ńālmē ‘tongue’ < *ńälmä (~ Hungarian nyelv ‘tongue, language’ etc.) ← PU *ńälə- ‘to swallow’
  • *ńālkē ‘tasty’ < *ńälkä ← id.
  • *pāŋkē ‘reindeer’s headgear’ < *päŋkä ← PU *päŋə ‘head’

The first of these words has a very wide distribution, and the bisyllabic form *ńälmä could perhaps be assumed already for PU… though this would get in the way of a partial rule for *m-lenition in Hungarian that I have sketched some years ago.

I can also think of a slightly different mechanism to account for one of the remaining high-profile examples of *ä >> *ea. This is *pealē ‘side; half’. The polysemic meaning suggests that this may have come about as a blend of two originally distinct PU words: the above-mentioned *pälə ‘side’ (> Finnic *peeli, Mordvinic *päľ, Mari *pel), and the evidently closely related but distinct *pälä ‘half’ (> Finnic *pooli, Mordvinic *päľə, Mari *pelə).

The two words also seem to merge in Ugric: compare Hungarian fél : fele- ‘side; half’; Mansi *pääl ‘side; half’; Khanty *peeɭək ‘side; half’. But while this development can be simply due to the loss/reduction of 2nd-syllable vowels, the Samic development would require assuming contamination: the stem vowel seems to continue preS *pȧlȧ ‘half’, while the *e-type 1st syllable vowel seems to continue preS *pelə ‘side’. The two would led to the creation of a preS “compromise” form *pelä, from which then regularly > PS *pealē.

Finnic parallels

Worth noting is that in the case of ‘day’, a similar exception development is also found in Finnic. PF *päivä ‘day, sun’ (and not **paivi) has likewise escaped the early lowering/backing of *ä-ä, perhaps for the same reasons too: contraction from *päjə-wä taking place only after the a-umlaut of primary *ä-ä.

This pattern seems to extend further: among the remaining cases with *ä > PS *ea, Finnic cognates usually have *ä-ä as well. At least five cases can be identified that have correspondences in the more eastern Uralic languages:

  • ‘lichen’: PS *jeakēlē ~ PF *jäkälä (~ Permic)
  • ‘paw’: PS *keapēlē ~ PF *käpälä (probably related to *käppä ‘paw’ > Finnic, Mordvinic)
  • ‘bog’: PS *jeaŋkē ~ Fi. jänkä (~ Permic, Mansi, Khanty)
  • ‘flap, cover’: PS *leappē ~ PF *läppä (~ Mari, Permic, Hung., Mansi)
  • ‘smoke hole’: PS *reappēnē ~ PF *räppänä (~ Permic)

While this same correspondence is also common enough in loanwords (PS *(h)earkē ← PF *härkä ‘bull’; PS *kearnē ← PF *kärnä ‘crust’; both originally from Baltic), and this approach has in the past been applied to ‘bog’ (S → Fi) and ‘paw’, ‘flap’ (F → S) as well, nothing seems to outright require considering these words later than the pre-Samic / pre-Finnic period. If *ä-ä [ɛ-a] had in both groups been lowered to *ȧ-ȧ [a-a] by then, new lexical innovations of the time could reintroduce also a new, secondary *ɛ-ä in pre-Finnic (*jɛkälä ‘lichen’, etc.); while in pre-Samic, only *e-ä would have been available.

Conveniently enough, there is also one word of this type for which early loaning in the West Uralic period is assured: PS *keavrē ~ PF *käkrä ‘bent’, which probably derives from Indo-Iranian *čakra- ‘wheel’ (or from a slightly earlier *ḱɛkra-). [5]

To be sure, I still generally hold that if two competing etymologies are available for a word, then all other things being equal, the more recent explanation should be preferred. But this is only a probabilistic rule-of-thumb. So while several of the words here (and also many of the more numerous similar cases yet that are restricted to Samic & Finnic) probably have indeed been loaned between Finnic and Samic at a later date, I would not rule out the possibility of some of them still going back to different parallel preS and preF sound substitutions in the late West Uralic era.

For now I’m still sketching out the situation with back vowels. In particular it’s not clear to me how the raising PU *ë-ə and *aj(C)ə > preS *a-ə > PS *ō-ë should be dated: this is attested from numerous Germanic loanwords, and thus could be newer than *ä-ə > *e-ə. It may well be the same change as preS *a-a > PS *ō-ē (likewise attested from Germanic loanwords); and thus not triggered by stem vowels at all.

[1] In their view the result of this merger would not have been quite [i], but a near-close vowel they mark as *ḙ. I would suggest the sound value [ɪ] for this (similarly [ʊ] for their *o̭), reflecting the common tendency of close short vowels to reduce and centralize. Initially this probably would not have had any phonological signifigance though, so I will continue to use *i and *u for the early pre-Samic and early pre-Mordvinic era.
[2] “Common” rather than “proto”: while West Uralic at least seems like a defensible subgrouping to me (unlike its traditionally assumed kin like “Proto-Finno-Volgaic”, “Proto-Finno-Permic”, etc.), the common innovations are not many, and it remains effectively only a dialect of Proto-Uralic itself. This being the case, an accurate picture of West Uralic can only be gained by starting from Proto-Uralic and “reconstructing upwards”, not by presuming the existence of the group and attempting to compare Samic/Finnic/Mordvinic in isolation (a method that has traditionally generated rather Finnocentric models, further muddled by conflicting evidence from areal later-diffused vocabulary). It would also be premature to rule out entirely the possibility of WU being an “areal-genetic” group of dialects after all, since non-exact parallels for a few of the characteristic innovations (e.g. *ë- > *a, *åĆ > *aĆ, *-d₂- > *-d₁-) can be found in Mari, Permic and even Hungarian as well.
[3] It does not seem clear to me if these cases should be assumed to involve the suffixation of a consonantal suffix such as *-w and a later development *-əw > *-o, or simply the addition of *-o as a suffixal element right away, but this does not really affect their validity. If the former though, then this has some implications for the history of the PS stem type *ā-ō; they could not descend from *ä-o at the West Uralic level, but would have to go back to preS *ȧ-ȧ < PU *ä-ä, with the labial suffix only as an incidental addition.
[4] The irregular vowel correspondences in the Finnic words could perhaps be accounted for by assuming contamination from the Germanic loanword *leevä ‘slight; temperate’. This is one of the very old Germanic loans in Finnic that shows *ē → *ee. While both sides appear to point to a mid vowel [eː], I believe this is illusory. PIE *e was probably closer to [ɛ], and the eventual lowering and backing of *ē to *ā in Northwest Germanic suggests that even an intermediate [æː] existed at one point; as is also shown by the existence of a couple of loanwords in Finnic that have *ē → *ä (e.g. PGmc *wēgaz ‘lever, scales’ → PF *väkä ‘hook’). Pre-PGmc *klēwas ‘lukewarm’ was thus probably loaned as pre-Finnic *lääwä, later raised to PF *leevä together with inherited words like PU *lämə > preF *läämi > PF *leemi ‘broth’. At this period it could be assumed that pre-F *läppətä ‘mild’ was adjusted to *lääpətä, on the model of *lääwä; with later raising then giving PF *leepedä. — Even slightly earlier *leeppedä could perhaps be assumed, with PF *leepedä and *leppedä representing two ways of naturalizing the overheavy syllable structure.
[5] Another highly similar word family exists as well: PF *käprä ‘rolled up’; PS *kēpr-ë- ~ PF *käpr-i-stä- ‘to roll up’. As has been proposed by Katz, in principle this might represent an earlier parallel loan with PIE *kʷ still retained on the loangiving side, substituted by pre-S / pre-F *p. Dating *l > *r in Indo-Iranian as already this early seems unlikely however, and I suppose a more probable explanation would be that this is Uralic-internal descriptive variation. Note also a number of obviously secondary formations in Finnish such as käkkyrä, käppyrä ‘curved thing’.

Tagged with: , , , ,
Posted in Etymology, Reconstruction

The phonetic vagueness of laryngeal theory

While I continue to be strictly speaking Not An Indo-Europeanist, I regularly keep reading about comparative Indo-European research just as well. Including not only matters with immediate relevance to Uralic studies, but also the usual controversy honeypots: interpretations of the stop system (glottalic? aspiration where? how many velar series? etc.); and interpretations of the vowel system in relation to ablaut and laryngeal theory. They seem to often form an important “frontier” of sorts in the development of fine-grained historical phonology reconstruction methodologies, if only due to the large amount of attention they receive.

This doesn’t imply I would be particularly impressed with the average state of the field.

In the case of the last-mentioned, one thing that I see come up a lot is that given a certain degree of uncertainty over the original realizations of the laryngeals, almost everyone seems to be still treating them at least to some extent as deus ex machinae, outside of subjection to phonetically meaningful sound changes.

One particular repeat offender seems to be the interaction of laryngeals with syllabic resonants. Consider e.g. the following list of sound developments given by Peter Schrijver (2015), Pruners and trainers of the Celtic family tree:

  • *CRHjV > *CRījV (laryngeals vocalize to *ī between consonant+resonant and a palatal glide)
  • *R̥DC > *RaDC (word-initial syllabic resonants vocalize to resonant + *a before a voiced unaspirated stop + another consonant)
  • *HR̥C > *aRC (syllabic resonants vocalize to *a + resonant after a word-initial laryngeal — including voiced unaspirated stops)
  • *CR̥HV > *CaRV (syllabic resonants vocalize to *a + resonant before laryngeal + vowel)
  • *CR̥HT > *CRaT (syllabic resonants vocalize to resonant + *a before laryngeal + voiceless stop)
  • *CR̥HC > *CRāC (syllabic resonants vocalize to resonant + *ā before laryngeal + other consonant)
  • *N̥ > *aN (remaining syllabic nasals vocalize to *a + nasal)
  • *R̥ > *aR, *Ri (remaining syllabic liquids vocalize to *a + liquid or liquid + *i)

This is pretty much abstract symbol algebra. At best these can be called sound correspondences between Proto-Indo-European and Proto-Celtic. To suggest that a laryngeal or a syllabic resonant would directly change to or excrete *ī in the first case, but *ā in the sixth, is just about equivalent to claiming “a sound change” *dw > *erk- for Armenian. In reality, developments like these surely must have been composed of several stages.

Of course Schrijver is doing only an overview of Celtic historical phonology, and I would predict that some of the primary sources go into more detail. But it strikes me as an overall problem if there is little interest in IE studies in unpacking these kind of sound correspondences. Nowhere have I seen even fairly in-depth introductions to laryngeal theory attempt to explain these kind of developments using the normal tools and frameworks of historical sound change.

It’s not even very difficult at all to see how some elementary order could be imposed on this kind of a mess. We could note that there is e.g. tons of *a-insertion is going on (and I could add the change *CHC > *CaC, which Schrijver skips over, probably on account of being analyzeable as even earlier than Italo-Celtic). It seems likely there has been a single main epenthesis process, followed by diversification in different environments; not from numerous near-identical epentheses. Additionally, the epenthesis seems likely to have been not quite to *a, given some reflexes as *i.

So for the sake of an example, suppose e.g. that early on, all syllabic resonants first break to *əRə. From such a starting point, most of the more complex developments here will be explainable with what are reasonably natural phonetic developments:

  • *R̥DC “>” *RaDC will be simply the loss of word-initial *ə: *əRəDC- > *RəDC- > *RaDC-.
  • *HR̥C “>” *aRC will be explainable as the blocking of the previous change due to an earlier laryngeal, followed by loss of the second schwa: *HəRəC- > *HəRC- (**HRəC) >> *arC-.
  • *CR̥HV “>” *CaRV will be explainable as the loss of a schwa from an open syllable before a full vowel: *CəRəHV > *CəRHV-. It is not clear if the first schwa would be better assumed to have remained due to schwa lowering to *a intervening (> *CaRHV- > *CaRV-), or due to the laryngeal remaining long enough that the loss of schwa from open syllables was no longer operational (> *CəRV- > *CaRV-).
  • *CR̥HC “>” *CRāC appears to show that the second schwa will now remain in a closed syllable, leading to the loss of the first one instead: *CəRəHC- > *CRəHC-. The compensatory loss of laryngeals may have then kicked in around this time: *CRəHC- > *CRə̄C- > *CRāC-.
  • *CR̥HT “>” *CRaT might diverge from the previous due to any number of reasons. One is that medial voiceless *-T- was likely pronounced longer than its voiced counterparts, and could have induced a shortening *ə̄ > *ə.
  • *CRHjV “>” *CRījV (where we probably expect a syllabic resonant in the input?) could be routed thru e.g. a metathesis *Hj > *iH: thus first *CəRəHjV- > *CəRəiHV-. Then assume a monophthongization *əi > *ī, and loss of the first schwa, now found before a full vowel: *CəRəiH- > *CRīHV-. Finally, suppose loss of the stray laryngeal, and epenthesis of *j as a hiatus filler to acquire *CRījV-, as required.

This is but a quick drabble, and I don’t mean to claim that this would be an accurate view of the actual history. But I would like to see more IEists take a stab at developing an analysis of the finer details of laryngeal theory that at least works more like this second set of sound changes.

I’ve already seen some promising work on syllabification in PIE that posits schwa epenthesis already as an original phonological process, but it seems certain that such research could be also linked to numerous the branch-specific historical developments.

My hunch is moreover that this line of query could end up going much further. To my knowledge, even counting barely attested ancient epigraphic languages, no IE language retains any direct evidence of syllabic nasals, or of the phonetically mysterious “syllabic laryngeals”. And if it were to turn out that phonetic vowels can be assumed to have been there all along: what exactly will be benefits of an analysis that claims *[əH] or *[əN] to really have been phonologically plain */H/ or */N/?

As far as I can tell, a lot about this hangs on the urge to group Indo-European ablaut alternations into neater patterns. And I won’t oppose that investigation — but I get the feeling that its proponents fail to show proper respect for the distinction between internal and comparative reconstruction. Alternations along the lines of *sek- : *sk-, *semk- : *sm̥k- certainly have a greater algebraic consistency, but it’s less clear to me if they could be presumed for PIE itself.

(Similarly it’s interesting how numerous introductions to PIE or some individual IE branch will outline laryngeal coloring as an “early sound change”, but neither outline the slightest amount of evidence for dating it as post-PIE, nor clearly assert that the assumed sound changes are pre-PIE, derived by internal reconstruction rather than by comparative evidence.)

So I could ask…: why would we even assume that the stage *s[ə]mk- is the innovation here? Cross-linguistically, the loss of reduced vowels is far more common than their insertion. Yet IE studies instead outline an amazing cornucopia of early epenthesis processes. Another look at the field also reveals several theories about the rise of zero grades from pre-PIE vowel reduction. Still for some reason it seems to have remained overwhelmingly difficult for scholars to put 2 and 2 together and to conclude that many of these “epentheses” are probably archaisms rather than innovations.

Tagged with: , , , ,
Posted in Commentary, Methodology

Early a-umlaut in West Uralic?

In a footnote to my previous post I passingly speculated that Finnic *ä-backing: *ä-ä > *a-ə (> late Proto-Finnic *a-i : *a-ë-) should perhaps be split in two phases: stem vowel reduction leading to a split from *ä-ə as an earlier stage, completion of the 1st-syllable vowel backing as a later stage.

I have already gathered some other evidence for this particular chronology, from the analysis of some forthcoming examples. But if I were to suppose for early Finnic an intermediate vowel *ȧ in these words, how should the situation be analyzed phonetically (or for that matter, phonologically)?

My initial thought was to posit a central vowel *ȧ (IPA [ä]). But this would have contrasted with both front *ä (IPA [æ]) and back *a (IPA [ɑ]); e.g. *särkə ‘roach’ : *sȧrńə (< *särńä) ‘ash tree’ : *śarwə ‘horn’. Such a crowded low vowel inventory is highly rare in the world’s languages.

But since I was also speculating that this *ȧ still induced front vowel harmony, perhaps a better alternative will be to reconstruct this as a fully open front vowel (IPA [a]). Contrasts between this and near-open [æ] are also rare, but this situation seems to be well salvageable by replacing the latter with an open-mid vowel *ɛ instead.

PU *ä in fact shows mid reflexes in most Uralic languages:

  • In Samic, *ä-ə yields *ē-ë (though *ä-ä still yields *ā-ē).
  • Erzya merges *ä with Proto-Mordvinic *e (from PU *i, *e) as /e/.
  • I’ve seen [ɛ] rather than [æ] reported for some dialects of Moksha, though I don’t have a clear picture on the exact distribution of this.
  • Mari reflects *ä as *e, which normally remains /e/ in all varieties.
  • Permic reflects *ä most often as a vowel that has been reconstructed as Proto-Permic *ɛ, which in turn yields Komi /ɤ/, Udmurt /o ~ e/. PP *e > Komi /e/, Udm. /o ~ e/ is also common. [1] Some cases show Proto-Permic *a > Udm., Komi /a/, but they’re rarer and tend to involve messier data. I suspect this last vowel was in origin a rare conditional allophone at best, later strongly reinforced by loanwords from various sources.
  • Hungarian reflects *ä as /ɛ/ ~ /eː/, the latter from Old Hungarian *ɛː.
  • Far Eastern, Southern and Northern Khanty reflect *ä as tense /e/ (conventional Proto-Khanty *ee).
  • All Samoyedic languages show a change *ä > *e. This looks like it would have to be dated as later than *e > *i (which does not apply to Nganasan), but the resulting “Late Samoyedic” *e is generally indeed realized closer to /ɛ/ than /e/.

Aside from Finnic, the only languages uniformly in favor of an open value are Mansi (*ä > *ää) and Surgut Khanty (*ä > reduced /ä̆/). The idea of an original open *ä thus rather starts looking as yet another Finno-centricism of Proto-Uralic reconstruction.

Suppose we consider Finnic and Ob-Ugric outvoted, and adjust the PU vowel system ever so slightly by reconstructing original *i *e *ɛ rather than *i *e *ä. This vowel-height inventory is well attestable from the world’s languages, and can also be encoded phonologically identically, with *ɛ as simply a [+open] vowel.

After the initial stage of *ä-backing in early Finnic, the inventory would be extended to four heights *i *e *ɛ *ȧ: a rarer setup, but again still quite well attestable (e.g. in English). To get from here to the attested Finnic setup, a counterclockwise mini-chain shift is required: *ȧ > *a [ɑ], *ɛ > *ä [æ]. The phonological makeup of this four-height system looks a bit more precarious, and may require assuming a feature like [+tense] making a fleeting appearence.

This all also has some unexpected synergy with the development of back open vowels in Western Uralic. I have already a good while ago outlined a defense of the following model:

  • Proto-Uralic had labial *å [ɒ] in the first syllable, illabial *a [ɑ] in the 2nd syllable.
  • In Proto-West Uralic, illabial *a in the first syllable arose thru three innovations:
    • *ë > *a in all positions (*sënə > *sanə ‘sinew’, *mëksa > *maksa ‘liver’)
    • *å-a > *a-a (*kåla > *kala ‘fish’)
    • in palatal environments, *å > *a (*wåjə > *wajə ‘butter’)
  • Remaining cases of *å later merged with *o in Samic and Mordvinic, with *a in Finnic (*śårwə > pre-S and pre-Mo *śorwa, pre-F *śarwə ‘horn’).

Assume now that the first point holds mutatis mutandis also in the case of front vowels: the PU vowel structure I mark as *ä-ä was not phonetically a fully harmonic setup either, but instead phonetically *[ɛ]-[a]. [2] This provides a great motivation for height assimilation to *[a]-[a]. Such a change could perhaps be assumed to have been common to pre-Finnic and pre-Samic, and also substantially demystifies the phonetic motivation for Finnic *ä-backing. (Regardless, it will still have to remain unclear why, on the Finnic side, the stem vowel was concurrently reduced to *ə; much like it remains unclear why *å-ə in Samic and Mordvinic yields *o-a rather than *o-ə.)

Some further similarities:

  • *ɛ-ȧ > *ȧ-ȧ is exactly parallel to *å-a > *a-a: a kind of sub-phonemic a-umlaut.
  • The Finnic shift *ɛ(-ə) > *ä(-ə) is closely parallel to *å(-ə) > *a(-ə): both constitute a shift of non-cardinal vowels towards more cardinal values (though the former change is sub-phonemic, the latter an actual merger).
  • The Samic shift *ɛ(-ə) > *e(-ə) is also closely parallel to *å(-ə) > *o(-a): both constitute a reduction of openness contrasts through raising. The former will have to be later than the merger of *e-ə with *i-ə — but as this change is shared with Mordvinic, dating it as quite early does not seem problematic to me. It may have begun e.g. as a push chain in early Samic, with the second merger then spreading to Mordvinic. (Indeed, perhaps also to Mari, where *e and *i seem to have identical reflexes across the line.)

Finally, one further interesting corollary of this model is probably that the split of *ä-ə and *ä-ä in Samic will end up being earlier than the a-umlaut of *e and *o to eventual ea and oa. This chronology will go quite well together with some other hypotheses of mine under work as well.

[1] As the two have seemingly identical outcomes in Udmurt, I suspect that their split might even be post-Proto-Permic.
[2] It would be also possible to reconstruct non-vowel-harmonic *ä-a = *[ɛ]-[ä], *å-a = *[ɒ]-[ä]. Despite vowel harmony being clearly reconstructible for both Proto-Finnic and Proto-Samoyedic, and at least probable for many of the branches in-between, I do not currently have a firm opinion on if vowel harmony existed in PU. There seem to be a number of indications that it could be late Turkic influence in at least (Hill?) Mari, Hungarian and Southern Mansi — but, on the other hand, all three have clearly been subject to reduction and loss of unstressed syllables, which could have already early on eliminated inherited vowel harmony (as also has happened e.g. in Livonian, standard Estonian, and dialects of Veps).

Tagged with: , , , , , ,
Posted in Reconstruction

Etymology squib: -kko

Assigning meanings to Finnish derivational suffixes can be a pain. Plenty of them show a fairly scattershot selection of meanings. One example is -kko (-kkö); in modern Finnish, following Hakulinen in SKRK (54.15, 56.8 §§), six main functions can be identified:

  1. Generic diminutive derivatives. E.g. hauli ‘shot’ → haulikko ‘shotgun’, kesä ‘summer’ → kesakko ‘freckle’, kahdeksan ‘eight’ → kahdeksikko ‘figure eight’, kolista ‘to clatter’ → kolikko ‘coin’, lampi ‘pond’ → lammikko ‘puddle’, neljä ‘four’ → nelikko ‘quartet’, suu ‘mouth’ → suukko ‘kiss’; irregularly: veli ‘brother’ → veikko ‘brother mine, pal’
  2. Names of beings. E.g. elää ‘to live’ → elikko ‘critter’, emä ‘animal mother’ → emakko ‘sow’, hiiri ‘mouse’ → hiirakko ‘gray horse’, Savosavakot ‘Savonian settlers in Ingria’. Some loanwords have been adopted into this group too: kriitikko ‘critic’.
  3. Names of actions. Examples are few, but include rynnätä ‘to rush’ → rynnäkkö ‘charge’, yltää ‘to reach’ → ylläkkö ‘assault’ (cf. yllättää ‘to surprize’).
  4. Areal-collective nouns, especially of plants. E.g. kataja ‘juniper’ → katajikko ‘juniper patch’, kuusi ‘spruce’ → kuusikko ‘spruce woods’, mänty ‘pine’ → männikkö ‘pine woods’, ruoho ‘grass’ → ruohikko ‘lawn’
  5. Local nouns. E.g. kivi ‘rock’ → kivikko ‘rocky area’, hieta ‘sand’ → hietikko ‘sandy area’, jäätää ‘to be freezing’ → jäätikkö ‘glacier’, pyhä ‘sacred’ → pyhäkkö ‘shrine’, ruoko ‘reed’ → ruovikko ‘reed bed’
  6. Modern coinages for quantitative terms. E.g. aste ‘grade’ → asteikko ‘scale’, moni ‘many’ → monikko ‘plural’, yksi ‘one’ → yksikkö ‘unit’

A look across the dialects of Finnish, as well as other Finnic languages, however reveals that at least this seemingly very polysemous suffix has not undergone a spontaneous semantic explosion somewhere along the line: it is instead of heterogeneous origin. Groups 1 thru 3 derive from Proto-Finnic *-kkoi, in turn from earlier *-kka-j (though some individual, presumably fairly new words can fail to show evidence for a diphthong in key varieties). Groups 4 and 5 (and arguably in an indirect way 6, I suppose) meanwhile derive from Proto-Finnic *-kko.

This duality is to an extent still visible in Finnish as well, in at least two facts of morphotax. Firstly: the latter suffix generally attaches to nouns’ plural stems (kivikko, not ˣkivekko), the former also singular stems (cases like lammikko occur, but so do cases like emakko).

The second point is subtler (and arguably starts bridging into reconstruction): in words with comparanda outside Finnish itself, only the latter suffix appears to have a front-vocalic variant -kkö, while the former is just about confined to back-vocalic use. This is attributable to the pre-Proto-Finnic diphthong split (another rather specific sound change that I think might deserve a more specific name): that in front-vocalic and labial back-vocalic environment, pre-PF *-Aj has yilded PF *-ei > Fi. -i, not -o(i). And hence we can identify as the original front-vocalic counterparts of the first two groups rather derivatives in -kki, including hypocoristic names such as mieliä ‘to desire’ → Mielikki, talvi ‘winter’ → Talvikki; [1] names of beings such as lempi ‘love’ (or lempiä ‘to love’?) → lemmikki ‘pet’, suosia ‘to favor’ →  suosikki ‘favorite’; or action names (actionyms?) such as hävitä ‘to go lost’ → hävikki ‘loss (of goods)’, viipyä ‘to be late’ → viivykki ‘delay’. This suffix by contrast has no local use at all.

The etymology of collective *-kko has remained unclear, to my knowledge. Hakulinen suggests that this would be still originally a single etymological group, and that the precedent of some *-kka-derivatives (not even *-kkoi-derivatives!) from location roots such as perä ‘rear’ → peru ‘rear part’ → perukka ‘back end’ could have motivated a shift of the suffix from a loosely diminutive meaning to a local one. However, this does not explain at all the phonological contrast between *-kkoi and *-kko.

I think a more natural source can be suggested: extraction from the noun joukko (< PF *joukko) ‘group’. Semantically the connection seems self-evident, e.g. kuusijoukko ‘group of spruces’ = kuusikko. (Why the suffix is so firmly used for trees and other plants in particular remains mysterious to me, though.) *joukko is moreover among the oldest overheavy (CVVCCV) underived word roots in Finnic, a fact that seems like it could have further enabled a perception as containing a “root” *jou- [2] and a suffix *-kko.

Another question to ponder could be if the *-j-element that is usually indicated before this suffix is perhaps neither the plural oblique stem marker *-j-, nor even the nominal combining-form suffix *-j- (as in cases like lehmä ‘cow’ → lehmipoika ‘cowboy’); but rather continues the first syllable of *joukko? This could perhaps explain the fact that it appears not only where phonetically expected (*kuusə-j-kko > kuusikko), but also can oust stem vowels that ought to remain (*mäntü-j-kko > männikkö, despite the plural stem of ‘pine’ being mäntyi-).

There might also be one bisyllabic local noun that has been formed with this suffix, but hasn’t usually been identified as such: loukko ‘hole, den’. Earlier e.g. in UEW this has been compared with e.g. Mari *lŭk ‘corner’, Hungarian lyuk ‘hole’, but I suppose a better analysis will be segmentation as lou-kko. The root lou- appears to be then be identifiable with lovi ‘cleft’. This seems to be supported by how loukko refers not so much to something like a mole or badger’s burrow, as much as to a weasel or fox’s den in a rocky area: a “lovikko” full of nooks and crannies. — The possibility of this connection is suggested already in SSA, but apparently only as a passing editorial comment.

[0] This blog post brought to you by Göran Karlsson, whose former copy of oi- ja ei-nominit Länsi-Uudenmaan murteissa (Pekka Lehtimäki, PhD thesis, 1972) I have today picked up from University of Helsinki’s unofficial recycling point for Finno-Ugric academic literature, and which has inspired me to take a new look at some facets of this etymology (the main gist of it I’ve come up with some time ago already). — Perhaps also by his son Fred Karlsson, Uni of H’s professor emeritus who I suspect is responsible for leaving the book around free to a good home. Göran instead worked in Åbo Akademi, and if Wikipedia is to be trusted on this, retired already in 1980 and died in 2003, and I doubt the book would have arrived to Uni of H. already a minimum of a dozen years ago.
[1] Talvikki though is formed from a back-vocalic and illabial root. I wonder though if this phenomenon could reflect, rather than the generalization of the front allomorph to a couple of derivatives, original front-vocalism, given that PF *talvi < *PU *tälwä. The stem vowel shift *-A > *-ə in this root shape must have taken place before the split of *-Aj, but perhaps backing still had not been completed, and the words remained front-vocalic by the time of the diphthong split? and thus the development would have been e.g. pseudo-PU *tälwä-j-kkä-j > *tȧlwə-j-kkä-j > *tȧlwikkei > PF *talvikkëi. Even disharmonic PF *talvikkei does not seem entirely out of the question.
[2] I have earlier entertained the idea that this could maybe be identified with a weak grade stem of the pronoun joku ‘some(one)’, and thus joukko would be not an old Germanic loanword as is the current understanding, but rather from PF *jogukko ‘collection of some peeps’. But cognates such as Ludian ďouk (not ˣďoguk), Votic jõukku (not ˣjogukku) do not grant this idea support; this would leave the suffix *-kko still unetymologized; and joku does not even seem to actually form any derivatives, due to being a compound of two pronoun roots *jo- (joka, jo-ta ‘who, what’) and *ku- (kuka, ku-ta ‘who, what’), which even still inflect separetely (jotakuta, jollekulle etc.).

Tagged with: , , , ,
Posted in Etymology

Notes on Mari stem vowels

Though I often enough blog here about issues of consonantism too, it is clear that the largest challenges remaining in Uralic historical phonology concern vocalism.

Our current standard model of Uralic vowel history is mainly rooted in Samic, Finnic, and Mordvinic (the West Uralic group) on one hand, Samoyedic on the other. The evidence of these languages allows sketching a “canonical” system of eight stressed vowels in the 1st syllable, vs. a two-way contrast in the 2nd syllable. The later development in the other languages has also been surveyed in good enough detail to tell that the system probably is not going to need any fundamental uprooting. Perhaps we’ll eventually end up adding some further unstressed vowels; perhaps we can identify a ninth stressed vowel phoneme. But at least the former kind of updates will probably end up being based on these same key languages all the same. [1] Unstressed vowels are almost always lost in Permic, Hungarian, Mansi and Khanty, so positing new ones without any direct evidence would be quite questionable.

Mari is however a curious intermediate case. The original trochaic *CV(C)CV stem structure of Proto-Uralic is still partly preserved, though in a more reduced shape than in Mordvinic. But the development of stem vowels seems to diverge according to their parts of speech.

Verb roots in Mari are vocalic without exception, dividing into two stem classes: e-verbs (e.g. *ĭle- ‘to live’, *kånde- ‘to carry’, *pĭšte- ‘to put’) and a/ä-verbs (e.g. *kola- ‘to hear’, *lektä- ‘to leave’, *nelä- ‘to swallow’, *tola- ‘to come’). This distinction quite neatly corresponds to the West Uralic distinction between *a-verbs and *ə-verbs (cf. e.g. the Finnish cognates of the abov verbs: elä-, kanta, pistä-; kuule-, lähte-, niele-, tule-). There are still a number of exceptions; for many of them I could outline some lines of explanation, but in any case they don’t seem to rock the big picture.

As Mari /e/ in initial syllables regularly reflects PU *ä, it seems necessary to assume that inherited open stem vowels first merged as *ä, and were regularly raised after this. This would be quite similar to Samic, where the distinction between *ä and *a was similarly lost in the 2nd syllable, and the merged sound was in neutral environments eventually raised to *-ē.

The lowering of *ə to *a/*ä is not as trivial to understand, as Mari *a and *ä in initial syllables have no regular origin. Perhaps this is an additional piece of evidence that PU *ə was indeed a vowel quality that did not occur in the 1st syllable? The shift *[ə] > *a / *ä would be itself simple enough.

Nominal roots (including besides nouns also adjectival and numeral roots) are a different story. Almost no full-vowel-stem nominals occur in Mari, recent loanwords aside. The main types are instead consonantal and *ə-stems. Both types show only a simple reduced vowel /ə/ between the root and inflectional suffixes. In the latter stem type, this remains in the nominative; and is written е, ӧ, о in the orthography of Meadow Mari, though unlike actual /e ö o/ it remains unstressed. [2]

Vexingly, this distinction does not appear to correlate at all to the *a : *ə distinction recoverable from the West Uralic material. And unlike Mordvinic (where a class of consonant stems has emerged by loss of *-ə after single consonants and velar + sibilant clusters), the consonant environment does not seem to explain the duality, either. Final vowels can be either lost or retained after both heavy and light syllables; and this does not change if we look at the situation in Proto-Mari rather than Proto-Uralic. This adds up to a full set of no fewer than 12 different stem type correspondences between Mari and standard-issue Proto-Uralic:

  1. Light *a-nominal to vocalic stem:
    e.g. *kota > *kuðə ‘house’, *muna > *mŭnə ‘egg’, *śečä > *čü̆čə ‘uncle’
  2. Light *ə-nominal to vocalic stem:
    e.g. *kaśə > *kužə ‘long’, *ńëlə > *nülə ‘arrow’, *sülə > *šü̆lə ‘fathom’
  3. Heavy *a-nominal to light vocalic stem:
    e.g. *aška > *ošə ‘white’, *mërja > *mürə ‘berry’, *tälwä > *telə ‘winter’
  4. Heavy *a-nominal to heavy vocalic stem:
    e.g. *külmä > *kĭlmə ‘frozen’, *sonta > *šondə ‘dung’, *täštä > *tištə ‘sign’
  5. Heavy *ə-nominal to light vocalic stem:
    e.g. *ëppə > *owə ‘father-in-law’, *këččə > *kåčə ‘bitter’, *läwlə > *lelə ‘heavy’, *tammə > *tumə ‘oak’
  6. Heavy *ə-nominal to heavy vocalic stem:
    e.g. *oŋkə > *oŋgə ‘fishing hook’, *kośkə > *kåškə ‘rapids’, *wartə > *wŭrðə ‘shaft’
  7. Light *a-nominal to consonant stem:
    e.g. *ora > *ur ‘squirrel’, *kala > *kol ‘fish’, *pata > *påt ‘pot’
  8. Light *ə-nominal to consonant stem:
    e.g. *kätə > *kit ‘hand’, *lomə > *lŭm ‘snow’, *sënə > *šün ‘sinew’, *werə > *wü̆r ‘blood’, *wetə > *wü̆t ‘water’
  9. Heavy *a-nominal to light consonant stem:
    e.g. *ojwa > *wuj  ‘head’, *jalka > *jål ‘foot’, *neljä > *nĭl ‘4’
  10. Heavy *a-nominal to heavy consonant stem:
    e.g. *oksa > *ukš ‘branch’, *lupsa > *lŭpš ‘dew’, *mëksa > *mokš ‘liver’
  11. Heavy *ə-nominal to light consonant stem:
    e.g. *ëptə > *üp ‘hair’, *künčə > *kü̆č ‘nail’, *pučkə > *pŭč ‘hollow stem, tube’, *śarwə > *šur ‘horn’
  12. Heavy *ə-nominal to heavy consonant stem:
    e.g. *mekšə > *mükš ‘bee’, *soksə > *šukš ‘worm’

I get the feeling that this mess cannot (and shouldn’t) be resolved starting from just the canonical PU root structure and designing sound changes fine-tuned for exact vowel and consonant environments. E.g. supposing that *ə remains after *l, as in ‘arrow’ and ‘fathom’, will make it difficult to explain the consonant stem in ‘fish’. Probably at least one stem type distinction has been retained here that does not systematically survive in West Uralic.

This doesn’t mean that there couldn’t still be minor conditional sound laws involved, of course; e.g. heavy consonant stems seem to involve only plosive + *š clusters, and probably a similar conditional loss of *ə has occured here as did in Mordvinic. (Altho there are still words like *kukšə ‘dry’, *upšə ‘hat’ to be found as well.)

On the other hand: the fact that only nominals are this much of a mess suggests another avenue of explanation. Probably some parts of the situation can be cleaned up by distinguishing inherited and loanwords. Loans are typically nominals, and this can easily lead to a larger number or proportion of unetymological root shapes appearing in them. Consider e.g. Baltic *kerta → Finnish kerta, Mordvinic *kirda ‘time, instance’. If we naively equated the distribution of this word with its age, we might end up reconstructing a common West Uralic proto-form *kertä. But the expected reflexes of this should rather be *ä/*ə-stem forms: ˣkertä, **kiŕďə.

Verbs by contrast are somewhat less likely to be loaned. Modern Finnish makes a particularly striking example: in underived verb roots the etymological vowel combinations /e-ä/, /i-ä/ remain still more numerous than the loanword combinations /e-a/, /i-a/, although in nominals the battle has been lost ages ago (perhaps already in Proto-Finnic).

My next step in untangling this issue would probably be to tabulate how 1) widespread Uralic roots, 2) areal possibly-Uralic roots, and 3) known loanwords of various age are distributed in Mari between the 12 classes. Preliminarily, it seems that at least type #2 (*CVCə > *CVCə) is numerically much overshadowed by type #8 (*CVCə > *CVC). And here at least *nülə could be suspected of being a family-internal loanword from some direction, since this actually has an unexpectedly specific sense ‘arrowhead made of bone’, while the neutral Mari word for ‘arrow’ is instead *pikš. It would have to be a very old loan though, since it shows the expected proto-Mari sound changes *ń- > *n-, *ë-ə >  *ü, *ü > *[ö] / _R. [3]

Additionally, a second bisyllabic nominal stem type might have to be set up for Proto-Mari, for words where Hill Mari has a consonant stem but Meadow Mari has a vowel stem. It does not seem immediately clear if this correspondence can be always derived from an original vowel stem by apocope. Examples of this correspondence among the words mentioned above include ‘fathom’ (-lə₂), ‘berry’ (-rə₂), ‘oak’ (-mə₂) and ‘hat’ (-pšə₂); but not ‘house’ (-ðə₁), ‘egg’ (-nə₁), ‘uncle’ (-čə₁), ‘father-in-law’ (-wə₁), ‘bitter’ (-čə₁), ‘long’ (-žə₁), ‘dry’ (-kšə₁). This could add some extra resolution as well.

Finally, I’ll note that Mari also allows monosyllabic nominal stems. These regularly reflect roots with earlier medial semivowels or spirants, regardless of the original stem type: e.g. *kiwə > *kü ‘stone’, *luka > *luɣa > *lu ’10’, *śüd₁ə > *šü ‘coal’, *täjə > *ti ‘louse’. [4] But, interestingly, and further highlighting the stark split in stem type behavior between verbal and nominal roots in Mari, there are no monosyllabic verbs to go along with these. Candidates for monosyllabicity end up as bisyllabic CV.V stems instead, again with exactly the expected stem vowel. E.g. *jëxə- > *jü.ä- ‘to drink’, *kajwa- > *ko.e- ‘to dig’. Does this perhaps indicate that monosyllabic nouns should be considered a subtype of consonant-stem nouns, even though no nominals of a shape **CVə seem to occur?

[1] A few good candidates are indeed already provided by two kinship terms:
– PS *nōtōj ‘husband’s sister’ ~ PF *nato ‘spouse’s sister’ ~ PSmy *nåto ‘spouse’s younger sibling’
– PS *kālōj ‘husband’s brother’s wife’ ~ PF *kälü ‘(husband’s) brother’s wife’ ~ PSmy *kälü ‘sister’s husband’
The argument for reconstructing a “kinship suffix” *-w for these (*nataw, *käläw?) appears to be circularly motivated by the belief that PU did not allow any 2nd syllable labial vowels. On the other hand, the unstressed labial vowels in Proto-Samoyedic are a relatively new discovery as well, and before that, words like these could have well been counted among the words that have innovated 2nd syllable labial vowels in Proto-Finnic and Proto-Samic. — On the third hand, I also wonder if the problematic sound correspondences in a third similar word: PS *vivë ~ PF *vävvü ~ PSmy *weŋü ‘son-in-law’ should be attempted to resolve by constructing something like PU #weŋäwə, with Samic *-vë not corresponding to PF *-vvü and PSmy *-ŋü, but instead only to their final labial vowel.
[2] I have seen phonological descriptions of Meadow Mari that attempt to follow the orthography and identify final unstressed , , with /e, ö, o/ (e.g. Eeva Kangasmaa-Minn’s description of Mari in the 1998 reference book Uralic Languages by Routledge), but this seems like a terrible idea to me: it clashes with regular stress assignment on the rightmost full vowel, and requires setting up a rule by which final /ə/ becomes one of the full mid vowels, depending on vowel harmony. I sort wonder if the analysis lingers out of some kind of attachment to vowel harmony? which this schwa-fortition rule would be the only example of in Meadow Mari.
[3] Another option might be to assume that the final vowel represents some kind of a fossilized derivational element.
[4] It also appears to be the case that just about all of these cases have close tense vowels *i, *ü, *u.

Tagged with: , , , , ,
Posted in Reconstruction

Proto-Uralic *ŋx?

My earlier post ‘Swan’ in Uralic alluded to the possibility of reconstructing Proto-Uralic also *x in positions where it has not previously been considered to occur, particularly by reanalyzing some clusters with *k in them. This is not an idle throwout idea: I have several other specific hypotheses about this already under development.

One of the more common PU consonant clusters is *ŋk (by my count in the top three). However, there does not appear to be substantial imbalance within the nasal-stop clusters: *nt is about equally common, and *mp, *nč, *ńć [1] not too rare either. So I do not think the instances of *ŋk need meddling with. I would however consider a slightly different reanalysis: perhaps some instances of the Proto-Uralic plain velar nasal should be reinterpreted as *ŋx.

I have three main reasons to suspect this:

  • the assumed distinctive sound change *ŋ > *ŋk in the Ugric languages;
  • the strong correlation of PU *ŋ with *ə-stems;
  • some comparative evidence from the Permic languages.

Ugric evidence, in typological light

It should be clear, I believe, that *ŋ > *ŋk is an “against-the-grain” sound change. Rather more often we can instead find the lenition/cluster simplification development *ŋk / *ŋg > /ŋ/ (thus e.g. in most Germanic languages; in various Indo-Aryan languages such as Bengali, Nepali and Sindhi; in Insular Celtic thru nasal mutation; or in Finnish thru consonant gradation). [2]

The opposite fortition development is surely not impossible, but I’m used to seeing it mainly in the context of a language banning /ŋ/ from its phonology. After such a change, it will be possible to re-map [ŋk] (or [ŋg]) as either a cluster /nk/ (/ng/) or as a prenasalized phoneme /ⁿk/ (/ⁿg/). One example of this within the Uralic languages are Swedish loanwords in earlier Finnish, up ’til about the 20th century: as final, or even word-medial unalternating /ŋ/ was for long not allowed, words like batong ‘baguette’, maräng ‘meringue’, salong ‘salon’  have been loaned as patonki, marenki, salonki (most of these of ultimately French origin, and thus in the process showing impressive unpacking of a single nasal vowel into an entire four-phoneme sequence). Another well-known example is the stereotypical Russian or more generally East European accent of English, which replaces final /ŋ/ with [ŋk].

The treatment of *ŋ in the Ugric languages is however rather a split, so simplification of the phoneme system is not available as a motive for assumed fortition. Distinct /ŋ/ remains in Khanty to this day (as in PU *suŋə > PKh *ɬoŋ > e.g. Obdorsk /lŏŋ/ ‘summer’), and I think that also the vocalization to /w/ ~ /j/ in Mansi is probably quite late. (The full story will be best left for another time, but some vowel shifts typical in the vicinity of original glides seem to be absent in words that originally had *ŋ; while a few words show variable treatment in the different Mansi dialects.)

Reconstructing instead *ŋx > *ŋk for the “epenthesis” cases would additionally allow tying this development together with the Ugric merger of *-k- and *-x- as *-ɣ-. These seemingly go in opposite directions (the former to a stop, the latter to a spirant), but perhaps in the latter case we should separate the spirantization from the merger per se: first *x > *k regardless of position, only later *-k- > *-ɣ-?

Phonetics tangent

At this point I will also remind that the notation “*-x-“, first introduced by Janhunen, is not intended to stand for a velar fricative (which in traditional UPA transcription is instead χ); it stands merely for a consonant of unknown quality. Before his time, scholars like Setälä or Collinder, and indeed still the UEW, employed *-ɣ- (traditional UPA γ). A value [x] might be still possible, but other options could be e.g. [g], [q], or even [k] (in which case it would be “plain” *-k- that was rather something else).

Since the distinction between PU *-k- and *-x- can only be substantially traced in Finnic, as *-k- versus zero, it seems that this has been earlier researchers’ main line of evidence for determining how to reconstruct *x. And this indeed suggests that early Finnic had some kind of a “weak” value like [ɣ] or [ɰ] as the reflex of *x. But it’s entirely possible that this already represents a secondary development particular to Finnic! Already right next door, Samic instead reflects *-x- uniformly as *-k- (cf. e.g. Pite Sami tuohkat ‘to bring’ ~ Fi. tuoda ‘id.’), which will be difficult to derive from an especially weak starting value.

Old explanations have proposed generalized consonant gradation in Samic, but I do not find this plausible either: -g- [ɣ] as the weak grade of -hk- < *-k- is restricted to the central dialects and looks more like areal Finnish influence than Proto-Samic inheritance. It seems contrived to first assume *ɣ : *ɣ generalized to *k : *ɣ, then this being again levelled to *k : *k in several varieties, instead of just *x > *k directly in all positions. [3] Also perhaps worth noting — if starting from *x as indeed [x], this and the development *ś > *ć could be considered the same process: the fortition of [+high] fricatives?

The *-k- / *-x- contrast also appears to surface in a tiny number of words in Mordvinic as a contrast in backness (/v/ versus /j/); but as also numerous other consonants are lenited medially to semivowels in Mordvinic (*-p- > /v/, *-ŋ- > /v/ ~ /j/, in some cases *-m- > /v/), this evidence is difficult to project back to any particular PU values.

It does not even seem impossible that the *x / *k and likewise *ŋx / *ŋk contrasts never existed in Ugric, and that there would instead have been a conditioned split in the more western languages. This approach was explored already long ago by Erkki Itkonen, who starting in the late 40s proposed to adjust reconstructions like *suxə- ‘to row’ to *suukə-; with loss of *k after long vowels then assumed in Finnic. [4] But as the idea of Finnic long vowels dating already back to Proto-Finno-Ugric has turned out untenable, and as it in any case clearly cannot apply to *ŋ-clusters, for now I will not speculate further on what kind of conditioning could be assumed instead. Still, the points in the next section may suggest some hints in this direction…

Stem type considerations

It’s been known for long that traditional PU *-x- only seems to be reconstructible before 2nd syllable *ə. The diagnostic Finnic CVV-stems do not ever appear to have Samic/Mordvinic/Samoyedic cognates [5] that would demand a reconstruction *CVxA. This is evidently also not solely due to Finnish stem contraction being entirely limited to *ə-stems, as there still are a number of examples of *oo or *öö deriving from earlier  *uwa/*uŋa or *üwä/*üŋä (one of the better-known examples being *voo ‘flow’ < PU *uwa).

I have not seen it noted, though, that also most cases of *-ŋ- in Proto-Uralic occur in *ə-stems. In the best-reconstructible vocabulary, the bare ratio appears to be 3 to 19:

  • *A-stems: *aŋa- ‘to open’, *čaŋa- ‘to hit’, *müŋä ‘backside’
  • *ə-stems: *oŋə ‘mouth’, *jäŋə ‘ice’, *kaŋərə ‘bow’, *kuŋə ‘moon’, *loŋə- ‘to throw’, *peŋərä ‘wheel’ (or *piŋärä? [6]), *piŋə ‘tooth’, *poŋə ‘breast’, *päŋə ‘head’, *püŋə ‘hazelhen’, *soŋə- ‘to enter’, *soŋə- ‘to wish’, *suŋə ‘summer’, *säŋə ‘air’, *śiŋə ‘support beam’, *šiŋərə ‘mouse’, *šuŋə ‘ghost’, *tüŋə ‘base’, *wiŋə ‘end’

By contrast, the distribution is almost even for the other PU nasals (*m, *n, *ń), perhaps slightly in favor of *A-stems. Examples with *m that appear reliably reconstructible to Proto-Uralic to me (are regularly reflected at least in one West Uralic and one East Uralic branch) show a ratio of 10 to 8:

  • *A-stems: *emä ‘mother’, *čama ‘direct’, *jama- ‘to be sick’, *kama ‘peel, skin’, *kuma ‘turned over’, *kämä ‘hard’, *d₂ümä ‘glue’, *ńoma ‘hare’, *oma ‘old’, *śuma ‘cap’
  • *ə-stems: *(ń)imə- ‘to suck’, *d₂ëmə ‘bird cherry’, *jëmə ‘gruel’ (> Mo. *jam, P. *jum, possibly partly Smy. *jä¹m), *komɜ ‘hollow’, *lumə ‘snow’, *lämə ‘broth’, *nimə ‘name’, *śëmə ‘fish scales’

Examples with *n show a ratio of 7-9 to 9:

  • *A-stems: *enä ‘big’, *ëna ‘mother-in-law’, *ona ‘short’, *kana ‘armpit’, *muna ‘egg’, *puna ‘hair’, *puna- ‘to plait’; perhaps #mona- ‘to say’, *śona ‘sleigh’
  • *ə-stems: *änə ‘sound’, *kanə- ‘to carry’, *menə- ‘to go’, *monə ‘many’, *panə- ‘to put’, *sënə ‘vein, sinew’, #śinə ‘coal’, *tonə- ‘to know’, *wenə- ‘to stretch’

And examples with *ń show a ratio of 6 to 3:

  • *A-stems: *ańa ‘older female relative’, *kuńa- ‘to blink’, *küńä ‘elbow’, *läńä ‘soft’, *mińä ‘daughter-in-law’, *pańa- ‘to press’
  • *ə-stems: *ëńə ‘tame’, *peńə ‘spoon’, *puńə- ‘to twist’

If we separate out in particular those examples with Ugric consistently pointing to *ŋk, the situation actually gets slightly more balanced: in the case of A-stems, ‘to hit’ and ‘backside’ remain, while under ə-stems, ‘ice’, ‘bow’, ‘to throw’, ‘tooth’, ‘hazelhen’, ‘air’, ‘mouse’ and ‘end’ remain. Still an 1 : 4 discrepancy, though.

This all is relevant if we consider Janhunen’s proposal that PU *x has come about thru a pre-Uralic conditional development specifically in *ə-stems. The details for this too are best found in his paper The primary laryngeal in Uralic and beyond, published in 2007 in SUST 253 (Pekka Sammallahti’s Festschrift), as cited already last time.

In particular he suggests consonant-stem formations as the key (though he still does not spell the process out too explicitly): a root like *mëxə ‘earth’ might have had the locative *mëx-na and the ablative *mëx-ta, which could be from pre-Proto-Uralic *mëQna, *mëQta, thru the lenition of some other consonant *Q in syllable-final position. After this, the vowel-stem forms would also have to have been generalized from *mëQə to *mëxə.

Janhunen proposes that this pre-Uralic *Q = *k. This seems unlikely to me however, since there are both several PU roots of the shape *CVkə (e.g. *kokə- ‘to check traps’, *lukə- ‘to count’, *jokə or *jëkə ‘river’) and instances of the cluster *kt (e.g. *ëkta- ‘to hang up e.g. a net’, *täktä ‘bone’, *toktə or *tëktə ‘loon’). In his article, he also notes that “laryngeals” in the world’s languages frequently derive also from other sources such as *s or *p. Interestingly, it so happens that in Proto-Uralic, roots of the shape *CVsə and *CVpə are also remarkably rare. The only examples that I find reliable seem to be *kosə- ‘to cough’, *jepəkä ‘owl’, and even if also including Finno-Permic roots, *jäsən(ə) ‘joint’. [7] The first might be simply a newer onomatopoetic innovation; the two latter are trisyllabic roots where there cannot have been vowel-stem/consonant-stem alternation between our target consonant and inflectional endings.

This particular approach, even if we widen our reach to also *p and *s as potential pre-PU sources of *x, doesn’t seem to work for explaining *-ŋxə- as coming from pre-PU *-NQə- ~ *-ŋx- though, since there are now some difficult-to-dispense-with counterexamples. If we went with *NQ = *mp, it will be difficult to explain *lämpə ‘warmth’ (not **läŋxə); if we went with *NQ = *ŋk, similarly e.g. *woŋkə ‘hole’ (not **woŋxə) will be a problem; and *NQ = *ns seems to be ruled out by the absense of any examples of *-ns- at all, including in *A-stems.

It’s additionally not at all clear to me how far back the characteristically Finnic consonant-stem alternation pattern *CV(C)Ci, *CV(C)CE-CCV ~ *CV(C)C-CV (as in Finnish partitives: viisi ‘five’, lumi ‘snow’ : viit-tä, lun-ta) really goes. There are some residues of this in Samic, but elsewhere the commonplace total loss of 2nd syllable *ə, especially after light syllables, gets in the way of analysis. Some early derivatives also look like they were originally based on a vowel stem, not a consonant stem. One telling case is Samoyedic *korå ‘bull reindeer’, which is clearly ultimately a derivative of PU *kojə ‘male’, and cognate to e.g. Fi. koiras ‘male’. Starting from PU *koj-ra would however predict PSmy **kåjrå — while starting from *kojə-ra will indeed predict PSmy *korå. (PU *o regularly remains only in roots of the shape *CoCə; *-jə- is regularly lost, as in PU *ujə- > PSmy *u- ‘to swim’. Contrast e.g. PU *ojwa > PSmy *åjwå ‘head’.)

I regardless think it’s probable that second-syllable *ə has somehow conditioned the rise of PU *x, even though for now we cannot identify what from, precisely. And even if assuming that some instances of western Uralic *-ŋ- are from *-ŋx- won’t explain the abundance of *-ŋə-roots in their entirety, it certainly won’t hurt either.

Permic evidence

This is probably the most comparatively interesting line of investigation. *ŋ shows a split development also in the Permic languages, being reflected as either *ŋ (> varyingly /m n ń/ in most varieties), or lost entirely. As I’ve alluded to already in my post on the treatment of *ŋ in Ugric two years ago, it appears that the Ugric contrast between *ŋ and *ŋk correlates with this, to an extent.

Group 1, with Permic *ŋ ~ Ugric *ŋ:

  • Udm. /ćińɨ/, Komi /ćuń/ ‘finger’ ~ Kh. *ćoŋən ‘knuckle’
  • ‘mouth’: Udm. /ɨm/, Komi /vom/, /əm/ ~ Kh. *ooŋ
  • ‘birch bark vessel’: Udm. /ľaŋes/, Komi /ľanəs/ ~ Kh. *jeŋəL
  • ‘tree stump’: Udm. /diŋ/, Komi /din/ ~ Hung. : töv-
  • ? ‘strawberry’: Udm. /emedź/, Komi /əmidź/ ~ Kh. *-ääńć, in compounds
    (if the latter is < *-ŋVć — but I would not rule out the Permic words also being fossilized compounds with a 2nd component from *äńśɜ, since *ŋ > m in an illabial environment is not really regular at all)

Group 2, with Permic zero ~ Ugric *ŋk:

  • ‘ice’: Udm. /jɨ/, Komi /jə/ ~ Hung. jég, Ms. *jääŋk, Kh. *jööŋk
  • ‘tree stump’: Udm. /jal/ ~ Kh. *jöŋkəL
  • Komi /mɨś/ ‘after’ ~ Hung. mëg ‘and’, mögé ‘behind’, etc. [8]
  • ‘larch’: Komi /ńia/ ~ Kh. *ńääŋk
  • ‘mouse’: Udm., Komi /šɨr/ ~ Hung. egér, Ms. *täŋkər, Kh. *ɬööŋkər

This pattern might seem unexpected: it’s my *ŋx that tends to develop to zero, not plain *ŋ. Possibly, in group 2, *ŋx first developed in Permic into a voiced plosive/affricate equivalent of *x, which was then lenited and lost; e.g. *[ɴq] > *ɢ > *ʁ > ∅, or *[ŋx] > *gɣ > *ɣ > ∅?

Though if things were exactly this clean, the correspondence would probably have been noticed already. There are also cases where the Ugric evidence is inconsistent:

  • Komi /ɨń/ (< *ïŋ?) ‘flame’ ~ Kh. *jääŋəL- ‘to roast’; — but Hung. ég- (< *-ŋk-) ‘to burn’
  • Udm. /pum/, Komi /pon/ ‘end’ ~ Hung. fej ‘head’, fő ‘main’; — but Ms. *pääŋk, Kh. *pööŋk ‘head’
  • Udm. /vand-/, Komi /vundɨ-/ (< *-ŋV-ta-) ‘to cut’ ~ Northern Kh. *waaŋ- ‘to hew’; — but Hung. vág- ‘to cut’, Ms. *waaŋk- ‘to hit’, Southern Kh. /waŋx-/; — and even Eastern Kh. /waaɣ-/?!

or outright contradictory evidence, with Permic *ŋ ~ Ugric *ŋk:

  • ‘tooth’: Udm., Komi /piń/ ~ Hung. fog, Ms. *päŋk, Kh. *pööŋk
  • Udm. /čɨŋ/, Komi /čɨn/ ‘smoke’ ~ Ms. *šeeŋkʷ ‘fog’ [9]
  • Komi /sɨnəd/ ‘air, smoke’ ~ Hung. ég ‘heaven’
  • ? Udm. /šońer/ ‘straight’, Komi /šań/ ‘good’ ~ Hung. igen ‘yes’
    (rather dubious; semantics are divergent, and this probably had Proto-Permic *ń, not *ŋ)

Regardless, there appears to be a total absense of cases with Permic zero ~ Ugric *ŋ, so I don’t think we can call the Permic and Ugric splits fully independent.

Some of the exceptions, especially in the former category, could involve secondary velar suffixes on the Ugric side (*päŋə ‘end’ → *päŋ-kä ‘head’? [10]), but stretching this explanation to all cases would be forced. Probably there’s something more complicated yet going on in here. One hypothesis to investigate might be that *ŋx > ∅ is only a conditional development in Permic.

An overall dearth of data is also a problem. The first category only contains one particularly good etymology with cognates widespread across Uralic (‘mouth’); the second, three (‘ice’, ‘mouse’, ‘behind’); the third, one (‘head’); the fourth, one (‘tooth’). Accounting for some etymologies with spottier distribution but at least some other good-looking cognates (‘tree stump’, ‘birch bark vessel’, ‘air’), the count rises to 3 : 3 : 1 : 2, with only a narrow and probably statistically insignificant majority for the correspondence pattern I suggest.

Further investigation is clearly required on several fronts, and I’m not yet fully attached to the idea of a cluster *ŋx. But for now, I conclude at least that reconstructing a single PU *ŋ behind both Ugric *ŋk and Ugric *ŋ can definitely be questioned, and other possibilities should be explored as well.

[1] I still often tend to transcribe the last one as *ńś following traditional approaches. It seems likely that phonetically the affricate value is more original though, but this ties into the thorny question of to what extent did PU have a contrast between *ś and *ć at all? There’s only any substantial evidence for a contrast initially and before *k, and even here different languages tend to point to different consonants.
[2] The common phonological constraint (see e.g. WALS) against word-initial /ŋ/ is surely also in large part due to this development trajectory. For any language with a CVC or CVCC maximal syllable template, and no /ŋ/ in its consonant system, the most likely pathway of developing /ŋ/ will be thru cluster simplification of some sort; which however will not be able to create any word-initial instances all by itself.
[3] Finno-Ugric studies have mostly long since shed Setälä’s infamously unfalsifiable early-1900s “theory” of all-encompassing consonant gradation in Proto-Finno-Ugric + massive levelling in all attested languages, but for some reason rudiments of this approach seem to have in Samic studies lingered until fairly late. As late as 1981, Mikko Korhonen’s handbook Johdatus lapin kielen historiaan still attempts to explain numerous regular sound changes that have no explicit relationship to gradation (e.g. *tk > Southern & Lule Sami rhk, Northern Sami ŧˈk : ŧk) as “generalized weak grades”.
[4] Itkonen, Erkki (1949): Beiträge zur Geschichte der einsilbigen Wortstämme im Finnischen. Finnisch-Ugrische Forschungen 30: 1–54.
[5] Cognates in the other languages could probably not be used to rule out a reconstruction as an *ə-stem.
[6] I’m following UEW in positing *-eŋə-, but it’s possible that this is a bad idea: the reconstruction seems to have been put together mainly by reference to Finnic, and there mainly by appeal to analogy with *söö- < *sewə- ‘to eat’. However, *püwärä < *piwärä < *piŋärä would work as the pre-Finnic proto-form just as well. The *ä in Mansi *päɣärt- (? *päŋärt-) may also point in this direction, as *e-ə usually yields *i. (By contrast I’m not putting heavy stock on the second-syllable vowel, which could well be secondary; cf. e.g. Southern Mansi /kaal-/ ~ /kalaa-/ ‘to die’ < PMs *kaal(a)- < PU *kalə-.)
[7] *kowsə ‘spruce’ needs to be reconstructed with a consonant cluster and cannot work as a counterexample.
[8] Mansi *mänt ‘along’, if it belong here, does not seem like an exception; this can be either due to early *ŋt > *nt, or due to late *ŋkt > *nt.
[9] Khanty *čüüɣ ‘fog’ is also listed here by UEW, but Ante Aikio’s etymology that derives this from PU *čäkə (~ Samic *cēkë) seems preferrable to me. Although I still wonder how Finnic *häkä ‘carbon monoxide’ fits into the picture.
[10] This latter derivative seems to exist in Samic at least: *pāŋkē ‘reindeer’s headgear’. Older comparisons linking the word to e.g. Finnic *panka ‘handle’ don’t seem very convincing to me.

Tagged with: , , , , , , ,
Posted in Reconstruction

Linkday #4: SEC

Studia Etymologica Cracoviensia is one of those journals that kindly makes its back issues freely available online these days, currently up to 2013. Which is old news for many I’m sure… I think I’ve even been linked directly to one of the issues before, but without realizing it’s coming from the journal’s own website.

The journal has no especial Uralic focus (aside from their 2009 memorial issue for Eugen Helimski perhaps), but on the other hand — it’s usually difficult to write a longer etymological paper that doesn’t have any observations with general historical linguistic interest in it.

Posted in Links

Enter your email address to follow this blog and receive notifications of new posts by email.