Alternations and “alternations”; with data from Finnish

A theoretical device in historical linguistics that I think can easily go abused is the basic morphophonological concept of “alternation”.

To lay some groundwork: an initial issue, on which I may expand more at some point, is that several grades of what is meant by “alternation” in the first place can be distinguished. All of them come with their own behavior, and trying to treat them as equal is a surefire way of going off-track in analysis.

Firstly, some archetypal examples of morphophonological alternation are easy to think of: systematic phenomena like consonant gradation in Finnic and Samic (and the less-known case of Nganasan), or consonant mutation in Celtic. These permeate a language’s lexicon on all levels, including neologisms and other newly gained vocabulary, and are often employed for specific grammatical functions.

It would be an obvious error to treat all morphophonology as having similar wide-reaching signifigance though. In what I would call the second category, even indubitably productive alternation patterns can be far more minor, applying perhaps only to a single morpheme. Consider e.g. the voicing and vocalization alternations in the English past tense morpheme -(e)d and the plural morpheme -(e)s; these require separate accounts to cover all corner cases (zero-suffix pasts like led, trod and found are not quite the same as zero-suffix plurals like fish, sheep and mice), and what similarities they show in their phonological behavior are easily seen to result from general phonetic constraints — not from them sharing abstract “suffix mutation rules”.

Thirdly it is quite common for particular types of alternations to be at least partly lexicalized. Examples like English teach : taught (< *tǣk-ja- : *tǣx-t) or Finnish niellä ‘to swallow’ ~ nälkä ‘hunger’ (< *ńälə- ~ *ńäl-kä) have obviously long since ceased to be anything but fossilized relicts. This may quite well go also for “marginally productive” alternations that only replicate themselves by analogy. Nobody would claim that PIE ablaut is productive in English, and regardless this has not stopped people from creating new strong past tense forms like shit : shat (given here the analogy of sit : sat).

Unproductive alternations can also go deeper yet, involving loaning. English is a good source of examples: we can consider e.g. Germanic/Romance doublets, such as stand ~ statue (going all the way back to PIE), or for a slightly younger example, ward ~ guard (the latter originally a Frankish loan in French, and thus linked at the West Germanic level). Cases where both sides are of loan origin are possible as well, e.g. Latin/Greek doublets such as serpent ~ herpetology, Latin/French doublets such as regal ~ royal, or doublets originally derived within Latin, such as cause ~ excuse (← causa ~ *ex-causa). Such alternations might be impossible to identify at sight, and only a deeper knowledge of etymology and language history will end up demonstrating that they in fact go back to a common root. [1]

(There would be also a neurolinguistics blog post to be written on if “productive morpho(phono)logy”, as separate from phonology, (morpho)syntax and lexicon exists as its own phenomenon at all, or if it’s all simply an issue of more or less fossilized analogies — but that’s not my main topic, nor really even particularly within my expertise.)


There is additionally a fourth sense of “alternation” however, which I think goes the least appreciated: language-internal false cognates. Whenever alternations of some sort occur within the paradigm of a single word, it’s usually a good starting idea to suppose some kind of a historical divergence, rather than flat-out suppletion. Whenever two words aren’t directly related though, and only show some degree of semantic and phonetic resemblance, presuming a relationship is far more risky. A comparison of e.g. English beak with peak, though perhaps plausible on the face, does not suffice to allow us to infer the existence of “an alternation b ~ p” — except in the banal descriptive sense that these two semantically close-by words really do differ only in their initial consonant. Historically, this similarity appears to be entirely accidental.

I’ve avoided giving too many examples above to steer clear of feeding confirmation bias. While languages typically indeed contain numerous unproductive doublets and marginal alternations, these can be entirely indistinguishable from mere chance similarities. I would consider it methodologically invalid to claim that just because two words show similarity, they should be considered etymologically related through “some” unspecified means. This kind of a conclusion should always require specific confirmation from other comparative data.

Not necessarily language-external comparison, mind you. E.g. if an alternation can be attributed to a sound change in a particular context, it would be expected that the same change has affected other words as well, and therefore created multiple similar doublets. For a specific example, lexical doublets in Finnish of the type sortaa ‘to break down; to oppress’ ~ sorsia ‘to tease’ (the latter appearing to be derived from the former with the frequentative suffix -i-) can be put on a much firmer ground already as soon as compared with the existence of an alternation -t- : -si- also in inflection, as in e.g. kaartaa ‘to move in an arc, to go around’ : past tense kaarsi. [2]

Aside from “pure” chance similarity, another risk involved in doublet-hunting is semantic contamination: similar shape can lead two words to drift toward similar meanings, if given the chance. One cautionary example could be Finnish kastua ‘to become wet’ (also kastaa ‘to dip in’, kaste ‘dew’ etc.) ~ kostua ‘to become moist’ (also kostea ‘moist’), which at first sight appear to be some kind of a related doublet, perhaps comparable to other examples of an “a ~ o alternation” (say, kajo ‘shimmer’ ~ koi ‘dawn’ [3])?

However, we know from historical, dialect and other Finnic data that the original meaning of kostua has been ‘to return, to be returned’ (and compare in Modern Finnish still the expression kostua jostain ‘to benefit from (a scheme or deal)’ [4]) — which allows it to be regularly analyzed as a reflexive derivative of the base verb kostaa, whose main meaning nowadays is ‘to avenge’ (< ‘*to return something’), clearly unrelated to moisture or wetness. The meaning ‘to become moist’ seems to have developed through the stage ‘to return to usable condition’, which for traditional leather-based (and, to some extent, wooden) tools and items can well have meant remoisturization after drying. Also relevant is the culinary habit of softening dry preserved bread in broth, brine etc. before consumption. [5] But it seems oddly specific that this very specific semantic development would have just accidentally happened to a verb with close similarity to kastua; and it is probably a good idea to analyze this similarity as having outright motivated the semantic development.

In any case, the conclusion still is that there is no morphophonological alternation a ~ o involved here: only two etymologically unrelated word groups, some of whose members have converged in meaning.


Since etymologists are usually mainly concerned with establishing connections between words, not in tearing them down, I would expect that there are people who fail to appreciate just how easy it is for words to show accidental or at least unetymological similarity, though. It is also often difficult or impossible to positively demonstrate that a given similarity definitely is accidental; and even producing calculations on the odds of accidental resemblance will be difficult, given how there are not really any “default hypotheses” about word origins that we could feed into these.

I believe there is however at least one method for demonstrating that accidents indeed happen: we can attempt seeking phonetically unlikely doublets, and see how easy it is to get these together, as compared with doublets that would seem to suggest some other, more phonetically expectable alternation.

Over some years, I have been collecting comparisons of this type from within modern Finnish, taking up cases of any imaginable phonetic variation (generally within the initial CV(C)C unit; anything going on in later syllables is often better considered “merely” morphology). Systematic surveying is difficult, and what I have so far is likely still biased in favor of alternations I have looked more into. Regardless the results so far are clear: even without giving too much slack for semantics, it is possible to get together at least a few surface doublets for approximately any alternation pair imaginable, while alternations with some actual historical motivation behind them generally generate larger amounts of doublets.

Some examples:

  • a ~ e: lavea ‘wide’ ~ leveä ‘wide’
  • ai ~ ie : taitaa ‘to know (a skill)’ ~ tietää ‘to know (information)’
  • e ~ i: vehnä ‘wheat’ ~ vihne ‘awn’
  • e ~ ää: retikka ‘radish’ ~ räätikkä ‘rutabaga’
  • eu ~ uu: peuhata ‘to frolic, play rough’ ~ puuhata ‘to be busy, work on various small things’
  • h ~ m: houkka ‘fool’ ~ moukka ‘boor’
  • ha ~ e: harha ‘illusion, delusion’ ~ erhe ‘error’
  • ht ~ v: kuihtua ‘to wilt’ ~ kuivua ‘to dry’
  • i ~ ö: itikka ‘mosquito’ ~ ötökkä ‘bug’
  • iu ~ ui: hiukka ‘little bit’ ~ huikka ‘sip’
  • j ~ n: koje ‘machine’ ~ kone ‘machine’
  • k ~ l: äklö ‘sickeningly sweet’ ~ ällö ‘icky’
  • kk ~ pp: tukko ‘wad’ ~ tuppo ‘wad’
  • l ~ s: lingota ‘to sling’ ~ singota ‘to shoot off’
  • m ~ s: karmea ‘terrible’ ~ karsea ‘ghastly’
  • m ~ ∅: muhkea ‘grand, bountiful’ ~ uhkea ‘voluptous’
  • n ~ ∅: nilja ‘slime’ ~ iljanne ‘slippery ice’
  • o ~ ä: vongata ‘to pester (esp. for sex)’ ~ vängätä ‘to pester (of children)’
  • p ~ r: pöyhkeä ‘snooty’ ~ röyhkeä ‘arrogant’
  • r ~ v: rako ‘cleft’ ~ vako ‘furrow’
  • r ~ ∅: varsa ‘foal’ ~ vasa ‘calf’
  • s ~ t: surma ‘death’ ~ turma ‘ruin, accident’
  • s ~ ∅: kaista ‘stripe, lane’ ~ kaita ‘narrow’
  • sk ~ v: rieska ‘flatbread’ ~ rievä ‘flatbread’
  • t ~ v: tai ‘or’ ~ vai ‘or’
  • uo ~ u: nuokkua ‘to nod off’ ~ nukkua ‘to sleep’
  • ää ~ äy: ääri ‘edge’ ~ äyräs ‘brim’

This is a relatively representative sample, in that more than one of the above examples have demonstrably unrelated origins; more than one are also demonstrably related; several can be suspected to be the product of contamination in some direction; most however have no particular known explanation.

You can download the full list here (Unicode encoding; contents only in Finnish so far). In case you run into encoding woes, you can also try accessing this on pastebin. I have indicated a few etymological analyses so far, but most cases await fuller analysis. Further data can surely still be gathered as well. If anyone is interested in collaboration (analysis, adding in references to earlier literature, just adding in new potential doublets, etc.), feel free to get in touch with me.

For now I will not go into what kind of more detailed conclusions could be drawn from the data… though I imagine already simple eyeballing should be enough to highlight some features.

[1] On this topic, I often wonder how much of Latin we could in theory reconstruct from just the abundant loanwords it has left in modern western European languages. Or for that matter, if given no corroboration from the rest of Romance, would we be able to identify this reconstructed Latin as an early stage of French (rather than merely an extinct relative)?
[2] That the past tense of sortaa is typically sorti is then easily accountable as analogical, especially given that other verbs yet may show active vacillation, e.g. soutaa ‘to row’ : past tense souti ~ sousi.
[3] This example, for what it’s worth, is rewindable back to Pre-Finnic *kaja- ‘shine’ ~ *kajə ‘dawn’ with a slightly “weaker” alternation, and we could be dealing with some kind of an original derivation pattern in either direction; but this remains to be confirmed.
[4] I suppose an analysis as ‘to get so excited that you wet your pants’ might be theoretically possible, if we only knew of the modern sense of kostua.
[5] For further discussion of this word family’s history, see Hakulinen, Lauri (1940): Kostea ja kostua. Virittäjä 44.

Tagged with: , , , , ,
Posted in Methodology

Problems in Indo-European vocalism, part 1

Looking at Indo-European studies has for a while now been giving me an impression that the usual vowel system reconstruction has unnoticed flaws in it.

They are different issues from the long-running debate on the reconstruction of the stop system, though. The traditional *i *e *a *o *u, easily attestable around the world, surely has nothing wrong in it in terms of synchronic phonology. Adding in the syllabic resonants *m̥ *n̥ *r̥ *l̥ won’t be a major typological problem, either. Rather… weird things start to pile up once we instead survey the development of this vowel system in the IE languages.

For a starting point, let’s consider Anatolian. I claim no particular expertise in the area though, so instead of getting my hands dirty with data, my commentary here follows fairly closely some short overviews by H. Craig Melchert. [1] He ends up positing (in an update to earlier views about a simpler 4+4 system) a vowel system almost identical to PIE for Proto-Anatolian: five short vowels *i *e *a *o *u, their long counterparts including *ā < *eh₂, as well as an unpaired long vowel *ǣ < PIE *eh₁ (early on also a later redacted “*ẹ̄” < PIE *ey). All of these yield their own distinct correspondence sets, and I would not try to claim that we need to merge or split some of these phonemes. But there are some imbalances in how some different contrasts develop.

Melchert does not go into featural phonology, but if we are to trust his transcription, both *e and *o would be mid vowels. Their development tendencies however diverge.

There is one general similarity: most Anatolian languages seem to show a trend of qualitatively simplifying the vowel system, towards plain *i *a *u. This is completed only in Luwian, but elsewhere, too, the mid vowels have a tendency to merge with other stuff.

Short *e often yields *i in some kind of raising contexts: e.g. following *j, or when pretonic (kind of resembling Germanic). In a few other positions, there are conditional developments to *a, such as before *n in Hittite and Lydian. However, by contrast, there seems to be no evidence for a raising development *o > **u. Most Anatolian languages have generally merged *o and *ō into *a and *ā. Melchert only reports three features that allow distinguishing *o and *a:

  • In Hittite, stressed *o in closed syllables yields long /ā/, while *a remains short /a/.
  • In Lydian, stressed /o/ is found next to a labiovelar (either a stop *Kʷ or the glide *w).
  • In Lycian, the general treatment is *o > /æ/ (transcribed e; no comment on what happens to *ō).

If, in a language family elsewhere, we were faced with two correspondence sets — one of them *a ~ *a ~ *a, the other *a/ā ~ *æ ~ *a/o — I would definitely not conclude that we are to reconstruct *a and *o respectively. And I would assume that Melchert, too, only ends up reconstructing a mid vowel *o, because this is what the second Anatolian vowel corresponds to in traditional PIE, not because the reflexes so demand. Even /o/ in Lydian looks like it might represent some kind of an assimilation from the adjacent labiovelars, rather than the preservation of original rounding.

The long vowel situation seems even more worrying. We would definitely expect to see a raising *ō > **ū at least somewhere, at minimum in languages like Lycian or Luwian where *ē > *ī, if these two had made up a similar class of long mid vowels. But apparently we only get /ā/ everywhere. Melchert reports for this contrast but a single distinguishing feature: apparently *dwō- yields /dā-/ in Hittite, versus no such loss of the glide for *dwā-. This seems to me much too iffy grounds for setting up a separate *ō.

Ignoring traditional PIE for the moment and instead reconstructing *a₁ (in place of *a) versus *a₂ (in place of “*o”), there would seem to be more promising options for phonological interpretation available. In terms of height, I’d assume that it was actually the latter that was the more open vowel *[a]. This is fairly directly suggested by the different treatment in Hittite: all other things being equal, more open vowels tend to be realized as longer. No clear evidence seems to exist for a difference in backness; *a₁ remains stable-ish (though I presume there would have been some variation on if a stands for central [a] or back [ɑ]), while *a₂ has both clearly fronted (Lycian) and backed (Lydian) reflexes. This, however, provides another reason to suspect a lower value for *a₂, given that backness contrasts tend to be more labile among lower vowels.

What this seems to leave available for *a₁, then, is some kind of a weaker vowel value still prone to lowering, like [ɐ], [ɜ] or [ə], probably both [-front] and [-round]. It seems a bit curious how this has not been retained as such anywhere, [2] but hardly any more so than the failure of *o to surface consistently anywhere (or any other family-wide “sweep” development, such as *s > *h in early Iranian or *a *ā *u *ū > *o *a *ъ *ɨ etc. in early Slavic).

Compare to this e.g. the develoment of English short a (Early Modern /a/) and laxed u (Early Modern /ə/): the former has split over the last few centuries into a variety of lengthened (BATH lexical set, father, “tense æ”), fronted (TRAP set) and/or backed (PALM and WATER sets) reflexes, while a new neutral short /a/ is in numerous varieties filled in from earlier †/ə/ > †/ʌ/. Similar vowel histories can be found moreover e.g. in Samic varieties (old *ā > á being more heavily split in allophones etc. versus lowered *ë > â/a remaining more neutral) or in Samoyedic (old *å *a yielding a large variety of reflexes versus *ə, often lowered, remaining more neutral).

Melchert’s most recent work also mentions the recent discovery of a “new” /o/, /ō/ for several Anatolian languages, in earlier work conflated with u, ū. The short version mainly evolves from labiovelar + syllabic resonant, the long version from *aw, *ow, both also from *u next to laryngeals (thus this /ō/ corresponds to late PIE *ū, from *uH; remaining cases of Hittite /ū/ are instead from *ew, or from stressed open-syllable lengthening of *u). These are therefore clearly distinct from traditional PIE *o, *ō. If this new *o could have been in place already in Proto-Anatolian (apparently plausible at least in non-final syllables), it’s all the more reason to not suppose also the simultaneous retention of old *o.


Given that Anatolian retains numerous archaisms, and the possibility of it being the earliest split-off of Indo-European entirely, we can also proceed to ask an important question: would Proto-Anatolian *ɜ *a or traditional PIE *a *o be the more archaic state of affairs? I would end up preferring the former: a chain shift a > ɑ > o, ə > ɜ > a is more typical than the opposite.

As soon as we’ve formed this hypothesis for a “skew triangular” (or perhaps even “square”? [3]) vowel system *i *e *ɜ *a *u for not only Proto-Anatolian, but also Early PIE altogether, there will be numerous immediate implications. I will not go into listing all of these just yet… But to mention one, this will nicely amount to addressing the now and then raised typological objections about the rarity (and possible absense entirely before laryngeal coloring) of traditional PIE *a. In the new system, this turns out to translate into the rarity of the more marked vowel *ɐ, while the proper cardinal open vowel *a is quite frequent indeed.

[1] 1992, “Relative Chronology and Anatolian: The Vowel System”, in Rekonstruktion und Relative Chronologie. Akten der VIII. Fachtagung der indogermanischen Gesellschaft, ed. Robert Beekes;
1993, “Historical Phonology of Anatolian” 1993, Journal of Indo-European Studies 21/3-4;
and 2015, “Hittite Historical Phonology after 100 Years (and after 20 years)“, in . I have not yet seen his 1994 monograph Anatolian Historical Phonology, but the 2015 paper seems to summarize the main points.
[2] In writing at least. It should be kept in mind that epigraphic evidence does not actually constitute phonetic evidence.
[3] Since *e may have well been half-open [ɛ] rather than half-close [e].

Tagged with: , , , , ,
Posted in Reconstruction

Another Phonological Relict in South Estonian

Some days ago, I decided to go for a re-reading of Setälä’s classic Yhteissuomalainen äännehistoria (1891) (that’s “Common Finnic Historical Phonology”, for the non-Finnish-reading people in the audience). This proved a good idea, in yielding not just the confirmation of some issues I had been wondering about; but also various detail observations new to me that seem to support a theory of mine in the works.

I mean the thesis introduced at the end of my last post: the characteristic Finnic sound change *š > *h did not take place in unitary Proto-Finnic, or even in unitary Core Finnic (following the splitting-off of South Estonian and Livonian) but spread across the Finnic language area even later, after its splitting into dialects entirely.

One of these details appears in the Finnic word for ‘goose’, normally reconstructed as *hanhi (> e.g. Fi. hanhi, Es. hani). We are quite sure that this goes back to earlier *šanši, given that it’s a long-known loanword from PIE *ǵʰans- (most likely thru Baltic); and also given the recent observation that it could be traced back to even earlier *šänšä, allowing treating Erzya /šenže/ ‘duck’ as a “non-native cognate”.

Since the word fails to show up in Samic — or rather, shows up there in an entirely different form *ćōńëk, allegedly from a pre-Germanic alternative formation *ǵʰan-ut- according to an etymology from Koivulehto [1] — we probably still shouldn’t assume loaning into common West Uralic. Another point in favor of this seems to be given by the Finnic sound change *-ńć- ~ -ńś- > *-ć- ~ *-ś-: that this early denasalization only applies before a palatalized sibilant seems best explained by assuming that the clusters *-nš- and *-ns- had not yet even entered the language by this point (neither of them occurs in material inherited from Proto-Uralic). [2]

Denasalization before sibilants is a fairly natural sound change though. A second round of the same has later taken place again in the southern Finnic area, this time with compensatory lenghtening, affecting *-ns- found in innovated Proto-Finnic vocabulary (as in Es. põõsas ~ Fi. pensas < PF *pënsas ‘bush’) or developing thru *-nc- from the assibilation of *-nt- (as in Es. kaas ~ Fi. kansi < *kansi < PF *kanci < *kanti < PU *kamtə ‘lid’). And the interesting fact is: in South Estonian this affects ‘goose’ as well! yielding haah’ instead of the expected ˣhahn’.

You might protest that surely the loss of a nasal should be just as natural before /h/. This is also the mechanism Setälä appeals to. Crucially though: words showing *-nh- of some other origin are not denasalized. As just mentioned in my last post, they instead metathesize, yielding e.g. *tenho > tehn ‘thank’, *vanha > vahn ‘old’ (again, just like other sonorant+h clusters, regardless of if they go back to *-Rš- or not). ‘Goose’ appears to be the only example of this denasalization development. [3] I would not brush off as a coincidence the fact that it is also the only example that can be securely traced back to *-nš-.


This situation might not be obvious, as two other Finnic words with *-nh- have still been proposed to come from *-nš-. Yet newer research appears to have shown by now that neither example holds water.

*vanha ‘old’ is the first case with alleged earlier *-nš-, traditionally compared with Udmurt /vuž/, Komi /važ/, of the same meaning. Komi /a/ would be irregular as a counterpart of Finnic *a, though, and a recent proposal from Mikhail Zhivlov [4] identifies a better etymology for the Permic words: borrowing from Baltic *wetuša- ‘old’ (cf. Lithuanian vetušas). The development *e > /u ~ a/ seems to be regular before a lost medial consonant, as in PU *wetə > Udm. /vu/ ~ K. /va/ ‘water’. [5] A different etymology for Finnic *vanha has been proposed too: borrowing from Germanic *wanhaz ‘bent, crooked, bad’. This seems uncertain due to the semantic difference, but if the Permic connection fails, it appears to be the explanation we will have to default to. LÄGLOS is of the opinion that it would be exactly the existence of Permic cognates that shows this etymology to be unviable, not any formal flaw.

The second is *inhiminen ‘human’, which has been traditionally compared with Mordvinic *inžə ‘guest’. A loan etymology by Koivulehto derives these from PIE √ǵenh₁- ‘to beget’. Disassembling this requires a bit more analysis though. Given that the usual sound substitution for Indo-European *ǵ has been Uralic *j, Koivulehto suggests that the words continue the zero-grade *ǵn̥h₁-, with the sequence *ǵn̥- substituted as *in- (rather than *jVn-). Since we still have /i-/ and not the expected **e- in Mordvinic, the word would then have to have been loaned fairly late — but my soundlaw *je- > *i- for Finnic seems to “get in the way” of this: Koivulehto’s reconstruction could be quite well amended to a common proto-form *jenšä-, derived instead from the IE full grade.

Other considerations still chafe against this analysis. Firstly, Koivulehto also assumes a sound substitution *H → *š, but as has been recently argued by Adam Hyllested, [6] this is likely mistaken, and we should instead assume *H → *h straight away. Most of Koivulehto’s alleged examples are restricted to Finnic, and thus show no direct evidence for *š at all. For a few others, with cognates in e.g. Samic that explicitly point to *š, alternative etymologies have been suggested. If I were doing a more detailed review, I would consider also the possibility that they represent “etymological misnativization”, with IE *H → Finnic *h substituted as *š either in the other Uralic languages involved, or already in an archaic mediating Finnic variety.

Secondly, in Finnic we have no evidence for a bare root **inhä, only for the longer stem *inhimV- (mostly further suffixed with the adjectival/deminutive ending *-inen, but a few forms like Ludian inahmoi could in principle be parallel rather than “suffix-switched” derivatives). This seems to not match at all with the usual patterns of Finnic nominal derivation. We would expect something ending in *-imV-  to be either a nominalization (in *-mA-) from a frequentative verb (in *-i-), or a superlative. Instead the Indo-European derived noun *ǵenh₁mn̥ ‘offspring’ (> Latin genimen, Sanskrit janiman, etc.) seems to provide a better morphological match: it even provides half of the ending *-inen, whose presence in the neutral word for ‘human’ is otherwise a bit puzzling. In Mordvinic we see no signs of this though, which would seem to suggest that the ‘guest’ word has a different etymology entirely.

(Thirdly… in South Estonian only Northern-type reflexes inemine ~ inimene seems to be attested, so even if the history here had really been *ǵenh₁- > *jenšV- > *inhV-, it would not affect my analysis of ‘goose’ anyway.)


How late this reanalysis requires pushing *š > *h exactly is not clear. The terminus post quem on show is after the Southern Finnic denasalization (or perhaps concurrently with it: earlier in North Estonian vs. later in South) — but this is itself difficult to date. At minimum this would have to be later than the splitting-off of Northern Finnic, which in principle might however go quite deep into the Proto-Finnic period.

There is some weak evidence for some dialect diversity within the future Estonian area at this time as well. Another minor observation of Setälä’s is that, in a few central Estonian dialects, *Vns > *VVs postdates the diphthongization of original *aa and *ää to /ua/ and /iä/. This won’t have to mean that the entire denasalization development is this late, though: a nasal vowel stage *ṼṼs would make a very believable intermediate, with full loss of nasality only later.

The form haah’ also does not even appear to be common across the entire South Estonian dialect area, but is rather limited to its southernmost fringes. To some extent this probably means that the literary / North Estonian form hani has simply displaced the native form in some parishes… but a very similar distribution also seems to hold for tehn and vahn. In principle it would be possible that also the southwesternmost area of South Estonian had already split off by the time of *š > *h, and that the general Central Finnic soundlaw *nh > *n is the regular development elsewhere in the SE area.


This analysis may also raise a few methodological questions. Is it really legitimate to suppose a development *Vnš > *VVš for pre-South Estonian only on the basis of a single etymology? On one hand, it is clear that granting an open check for positing single-example sound changes with highly specific conditioning would allow rewriting the historical phonology of any language completely to taste. On the other hand, in this particular case we have some very strong constraints to avoid this failure mode: aside from the bare output (haah’), we can independently establish also all three of the input (*šanši), the specific conditioning environment (loss of *n before a sibilant) and the general phonetic motivation (the articulatory complexity of a nasal-sibilant transition) of the sound change I’m assuming.

Much seems to depend on how we model sound change phonologically. Do changes target, or are they conditioned by atomic phonemes — or by the features of neighboring segments? If the former, then we will be forced to treat *Vns > *VVs and *Vnš > *VVš as two parallel changes that have only incidental similarity; if the latter, then it will become possible to treat them as the one and the same sound change *VnS > *VVS, and to proceed to infer early dialect diversity within the Finnic languages.

[1] I am on the skeptical side though, and would expect anything showing Samic *ć ← PIE *ḱ to have been adopted from a Satem variety.
[2] The same relative dating is similarly suggested by how this sound change seems to extend to Mordvinic as well. None of the textbook examples such as PU *kuńćə ‘urine’ have known reflexes in Mordvinic; but one binary comparison, Erzya /saźi-/ ‘to gain, get’ ~ Permic *sudź- ‘to reach’ seems best reconstructed as *sëńćV-.
— It might be additionally a good idea to assume that the heterorganic clusters *-ŋs- and *-ŋš-, known in one word each (*joŋsə > PF *jousi ‘bow’; *jaŋša- > PF *jauha- ‘to grind’) had already changed to *-xs-, *-xš- in Finnic before the denasalization of *-ńć-.
[3] ‘Thank’ and ‘old’ are actually morever the only two examples of *-nh- > -hn- that I can get together on a quick search.
[4] I do not know of a more substantial publication on this yet, but an initial release has been in the proceedings of the 2008 conference Языковые контакты в аспекте истории. (My thanks to André Nikulin for the reference.)
[5] Rather than setting up a separate marginal Proto-Permic vowel *å, I would prefer explaining this correspondence as a conditional development in Komi from Proto-Permic *o (normally > Udm. /u/ ~ K. /o/). Finding a phonetically reasonable account of the development regardless remains to be done. A few possibilities that would initially seem plausible are blocked e.g. by how both *-ej- and *-at- still yield the expected /o/ in Komi (cf. /voj/ ‘night’, /śo/ ‘100’).
[6] In a conference paper to be found his PhD thesis Word Exchange at the Gates of Europe. Again, I do not know of a “more proper” published version.

Tagged with: , , , , , ,
Posted in Etymology, Reconstruction

A Phonotactic Allewrgy…?

There are, I think, several things off about the current understanding about the treatment of the consonant clusters *wr and *wj in Proto-Finnic.

There are no generally accepted instances of *-wr- in Proto-Uralic (though see below for one proposal), and examples with *-wj- are rare enough that so far none of them happens to have Finnic reflexes (probably the most reliable is *jäwjə ‘beard lichen’, with reflexes in just three branches: Samic + Khanty + Samoyedic). Within the Finnic comparative data, no direct evidence for these clusters appears either.

Cases involving these clusters in Proto-Finnic are therefore solely Indo-European loanwords. In these, two different lines of treatment have been generally accepted.

The first is metathesis to *-jw-, *-rw- > *-iv-, *-rv-. These latter clusters clearly occur in material inherited from pre-Finnic (e.g. PF *kaiva- ‘to dig’ ~ Samoyedic *kajwå ‘spade’; PF *sarvi ‘horn’ ~ Samic *ćoarvē ‘id.’). One classic example of the metathesis of *-wr- has been known for centuries: the word for ‘lake’, PF *järvi ~ PS *jāvrē. [1] Among Baltic loanwords, about three other examples can be found: *karva ‘hair’, *tarvas ‘bull’ and *torvi ‘horn (instrument)’ (~ e.g. Lithuanian gauras, tauras, ‘id.’; Latvian taure ‘id.’). ‘Lake’ has been proposed to be a loan as well, except from an earlier stage of Balto-Slavic, to account for reflexes also in Mordvinic and Mari.

Cases of metathesis of *-wj- are a newer discovery. Germanic *-wj- being continued as Finnic *-iv- was established some decades ago by Koivulehto, [2] with examples such as *laiva ‘ship’ ← Gmc. *flawją ‘id.’ (> e.g. Old Norse fley); *raivat- ‘to clear out, esp. woodland’ ← Gmc. *strawjan- ‘to strew’; *raivo ‘skull’ ← Gmc. *trawją ‘vessel’ (ONo treyja). Examples in loanwords from other sources seem to be rare so far, but one is the Estonian rivername Koiva, located in northern Latvia; and whose Latvian name is instead Gauja.

The other development is fortition to *-pj-, *-pr-. Both of these are clusters introduced in loanwords in the first place, and examples of this development are generally later loanwords from Germanic. Examples are not too numerous, but they include *hipjä ‘skin’ ← Gmc. *hiwją ‘appearence’ (ONo ); — *hapras ‘brittle, weak’ < *šapras ← Gmc. *sawraz ‘filth, dirt’ (ONo saurr); *sapra ‘a type of haystack’ ← Gmc. *sauraz ‘pole’ (ONo saurr); *äpräs ‘bank, steep shore’ ← Gmc. *awriz or *awraz ‘sandbank’ (ONo eyrr, aurr).

I do not aim to question any of these etymological correspondences. However, I find the idea that both developments would have arisen specifically as sound substitutions to avoid the “phonotactically forbidden” clusters *-wj-, *-wr- implausible.


There is one principal problem. While the Proto-Finnic period involved a hefty reduction in the total consonant inventory of the language (loss of palatalized *ć *ś *ń, the “spirants” *d₁ *d₂ *x, the postalveolar affricate *č and the velar nasal *ŋ), it on the other hand brought a clear increase in phonotactic complexity. Some new types of consonant clusters that appear to have been introduced include:

  • stop/affricate + liquid, e.g. *sëpra ‘company’, *atra ‘plough’, *ocra ‘barley’, *nakris ‘turnip’; *täplä ‘spot’, *kakla ‘neck’
    (no *tl though)
  • stop + nasal, e.g. *litna ‘town’, *sakna ‘sauna’;
  • stop + *j, e.g. *kapja ‘hoof’, *patja ‘mattress, pillow’, *acja ‘thing’, *vakja ‘wedge’;
    (also, in native vocabulary, *-tv- < *-d₂w-, in e.g. *patvi ‘tinder’;)
  • fricative + nasal, e.g. *käsnä ‘callus’, *lehmä ‘cow’, *ahnas ‘ferocious’;
  • fricative + semivowel, e.g. *rasva ‘fat’, *ohja ‘guide’, *rahvas ‘people’;
  • nasal + geminate stop, e.g. *temppu ‘trick’, *kontti ‘leg; backpack’, *lonkka ‘hip’;
  • liquid + geminate stop/affricate, e.g. *harppat- ‘to take a long stride’, *kartta- ‘to avoid’, *tarkka ‘acute, accurate’; *hëlppo ‘easy’, *hëltta ~ *helttä ‘cockscomb’, *malcca ‘Atriplex sp.’, *palkka ‘salary’;
    (found through inflection and derivation also in native vocabulary, e.g. *jält-tä, partitive sg. of *jälci ‘cambium’)
  • liquid + affricate, in at least *porcas ‘pig’;
    (found also in native vocabulary through *-Rtə >> *-Rci)
  • liquid + fricative, e.g. *varsa ‘foal’, *vërho ‘drape’, *kulha ‘bowl’;
  • *n + fricative, e.g. *pënsas ‘bush’, *vanha ‘old’;
  • geminate nasal, e.g. *konna ‘toad’;
  • geminate liquid, e.g. *villa ‘wool’.
    (from earlier *ln)

This was not a momentaneous revolution in phonotactics, of course. For a few of these, examples of rather uncertain Uralic derivation have been suggested (e.g. ‘turnip’ has been compared with Mansi *nëër, Khanty *naaɣər ‘pine nut’); others have been introduced in relatively early loanwords and have thus “non-native cognates” elsewhere in Uralic (e.g. Mordvinic *purćəs ‘pig’); others may not have yet been introduced in Proto-Finnic proper, but rather in some of the early Finnic dialects (such as *käsnä, found only in Northern Finnic). None of this rocks the overall picture, though: if loanwords were able to feed in new types of clusters, they were taken up as-is, just about as far as possible.

(The same process has also kept going later on. Even in varieties such as standard Finnish, where there has been no post-Proto-Finnic syncopë to generate new native clusters, the ongoing flow of various Indo-European loanwords still has by now introduced loads more of novel consonant clusters, such as /-stm-/ in astma, /-ŋ(k)st-/ in gangsteri, /-kstr-/ in ekstra.)

So why would *-wj- and *-wr- have been specifically and stubbornly avoided for centuries? Especially when this general type of cluster, semivowel + sonorant, was able to occur in native vocabulary all along, as is shown by e.g. Fi. läyli ‘heavy’ < PF *läüli < PU *läwlə, or the above-mentioned Fi. kaivaa ‘to dig’ < PF *kaiva- < PU *kajwa-.


I propose that the main part of the solution is that the the alleged “metathesis upon substitution” did not quite occur. This was instead a regular sound change, one that merely happened to mainly operate on loanwords.

Some indirect support is provided, I think, by how other examples of continuant cluster metatheses are already known in Finnic, too. These include:

  • *-jh- > -hj- in North Estonian (lahja ‘thin, lean’ ~ Fi. laiha)
  • *-nh-, *-lh-, *-rh- > -hn-, -hl-, -hr- in South Estonian (vahn ‘old’, võhl ‘witch’, kahr ‘bear’ ~ Fi. vanha, velho, karhu; NEs. vana, võlu, karu)
  • *-wh- > -hv- in both NEs. and SEs. (kehv ‘poor’ ~ Fi. köyhä)
  • *-sn- > -ns- in Western Finnish (runsas ‘plentiful’ ~ Livvi ruznaz)

For the metathesis *-wj- > *-jw- in particular there is also an areal parallel from Ter Sami and Lule Sami (you may recall I have already mentioned the Finnic metatheses currently under discussion in that post, too).

Since these metatheses affect only a part of the Finnic (or Samic) languages, sound change seems to be the only explanation available. It is not clear to me why continuant clusters would be particularly prone to metathesis though, and it’s possible that e.g. the connecting factor in the first three changes could be the metathesis of *h specifically. Regardless, it seems rather arbitrary to instead prefer a sound substitution explanation for *-wj- and *-wr-.

A number of the individual words to have been metathesized specifically point towards a sound change rather than a sound substitution, too.

1) For ‘lake’ there are two possible arguments. The first is chronological: *jäwrä could be regularly reconstructed already for Proto-West-Uralic (or for Proto-Finno-Volgaic, if you were to subscribe to such a stage). Alternately, there may be a phonetics argument available. Research in Uralic substrate vocabulary in Western Russia has led to supposing a “Meryan” reflex *jäkr- as well, reflected in lake names with an element ягр- or яхр-. [3] Both phonetic typology, and the proposed early Balto-Slavic etymology of this whole ‘lake’ root (either from *yewH-ro- ‘body of water’ [4]; or from *eǵʰe-ro- ‘lake’? [5]) suggest that the velar element in these has not developed from *w, but is instead an archaism, pointing to *jäkrä or *jäxrä as the earliest shape of the word in Uralic, with lenition to *-wr- at least in pre-Finnic and pre-Samic.

None of this is still completely watertight though, as another possibility yet is that early BSl. *-wHr- was substituted as *-Kr- in pre-Meryan, but as *-wr- at least in pre-Samic. If so, loaning to Proto-Finnic could have happened independently as well. (Mordvinic and Mari only show simple *r and they can swing any way really.)

2) With ‘horn’, the appearence of *o in Finnic may point to *towrə as an earlier form. This ties in with a larger topic: several Baltic as well as some Germanic and Indo-Iranian loanwords in Finnic seemingly still preserve PIE *o — but in a few of the cases we are actually dealing with PIE *a instead, one of these being this particular word (East Baltic *taure is, obviously enough, a derivative of *tauros ‘bull, aurochs’). I’ve prepared a small survey of this matter some time ago, [6] and among other results it turns out that cases with *au → *ou seem to be especially frequent. I suppose that this indicates that Proto-Baltic or Proto-Balto-Slavic had already merged short *a and *o at the time, but that the diphthong *au was during the time realized in some applicable dialect with a labialized first component, roughly [ɒu]. The loanwords with *au → *au would then have to be analyzed as later (as is already the case as well in explanations that appeal to the late retention of PIE *o), or as coming from a different Baltic dialect.

3) The above argument applies almost intact also to Koiva, for which we can likewise posit *Koiva < *Kowja ← *G[ɒu]jā. Here original pre-Balto-Slavic *ou can be suspected as well, though.

4) With ‘ship’, assuming metathesis as a sound law seems to provide a small improvement for the historical phonology of Livonian. In native vocabulary and sufficiently old loanwords, the development of *-Viv- in Livonian is initially *-Vuv-, possibly with monophthongization in modern Courland Livonian (well paralleled by known developments such as *-Vll- > -VVl-, or *-Vlj- > *-Vľľ- > -VVľ-):

However, for ‘ship’ we instead find *laija > lǭja : laij-. This could be explained by the metathesis *-wj- > *-jw- having never happened in Livonian. Thus, just as *-jw- > *-Viv- assimilates to *-Vuv-, also *-wj- > *-Vuj- assimilates to *-Vij-; and the development *lawja > *laiva only holds for the rest of Finnic. [7]

5) Finally, my earlier promised possibly inherited example of *-wr-: *korva ‘ear’.

Older research has taken comparison with Samic *koarvē, approx. ‘prop’ (NS also bealljigoarvi ‘earhole’), Permic *kʷor ‘leaf’, Hungarian dial. harap ‘dry grass’ as grounds to reconstruct PU *korwa ‘blade, leaf’. On semantic grounds, the alleged Samic cognates look like loans from Finnic though. The direct development ‘blade’ > ‘prop’ appears improbable; while the development ‘blade’ > ‘ear’ > ‘handle’ > ‘prop’ (the two last stages are verifiable as polysemic meanings of *korva and its derivatives in Finnic) seems to be etymologically blocked, since Samic still retains the original PU root for ‘ear’, *pealjē < *peljä.

A competing proposal comes from Juha Janhunen, who in ’81 has compared Finnic *korwa with Samoyedic *kåw ‘ear’. In his original opinion, the root here would be approx. *kawə, irregularly labialized in (pre-)Finnic, and extended to a derivative *kow-ra > *korva. Semantically this is clearly better.

I do not find ad hoc labialization in Finnic enticing, though. And there’s also another phonological issue: *kåw is the only Proto-Samoyedic root with a shape *CVw in Janhunen’s reconstruction, while a number of more reliable examples instead point to the regular development being *CVwə > *CV. [8] Etymologically the proposal has its problems as well. Supposing two synonyms for ‘ear’ with complementary distribution (*kawə in Finnic + Samoyedic, *peljä everywhere else in Uralic) might work under a scenario where Finnic and Samoyedic are two early offshoots of Uralic, but seems less likely if they sort into their own respective wider subgroups, West Uralic and East Uralic (as I think is the most probable).

Despite all these issues, this idea might regardless be onto something. I would instead assume that the original root here is *kow-; and that, while it is not retained as such in any Uralic language, a parallel derivative from this, formed already in Proto-Uralic with the common verbalizing suffix *-l(ə)-, is the well-attested verb for ‘to hear’. This has traditionally been reconstructed as the rather Finnocentric *kuule-, but in my opinion thus better: *kow-lə-. [9] Several reflexes seem to indicate *o; these include Mordvinic *kuľə-, Mari *kola-, Mansi *kʷaal-, and, if it has any input from here, Hungarian hall- (though Old Hungarian hadl- would seem to show that this is instead from PU *kontV-lə- ‘to listen’). Permic *kɨl- and Khanty *kɔɔL- are the only reflexes compatible with short-vocalic *kulə-, and they might simply result e.g. from a raising *ow > *u, similar to the development *ow > *uu I assume for Finnic. [10]

It also seems likely to me that the Samoyedic words for ‘ear’ are derived from this root in some fashion, even if probably not as direct inheritance. PU *o > Samoyedic *å is after all the regular development in any environment other than *CoCə. To make progress, I’d suggest that the PSmy reconstruction itself requires adjustment. Janhunen’s monosyllabic *kåw seems to be largely based on Nganasan kou, but this could just as well come from a bisyllabic proto-form such as *kåjå, through the regular loss of post-tonic *-j- and raising of *å (an exact parallel is PU *kaja > PSmy *kåjå > Ng. kou ‘sun’). Given the developent *-wj- > *-j- in *jäwjə > *jüjə ‘beard lichen’, I would prefer assuming an agentive derivative *kow-ja > *kåwjå > *kåjå ‘hearer’. — Or perhaps a more heavily contracted *kowlə-ja > *kol-ja > *kåljå > *kåjå, for an exact parallel with attested forms like Fi. kuulija? [11] This would even have some strange synergy with the derivation of Smy. *timä < *temä ‘tooth’ from *sewə-mä ‘bite, biting’.


Getting back on track, though. If the metatheses *-wj-, *-wr- > *-jw-, *-rw- took place as regular sound changes in Proto-Finnic times, this will naturally lead to a full absense of *-wj- and *-wr-, as the Finnic comparative data indeed suggests. So far, so good.

However, what from there on? Should we not just as well expect these clusters to be recreated right away by the next few batches of loans, instead of fortition to *-pj-, *-pr-?

At this point I would like to direct attention to the fact that the (Western) Finnish reflexes of these words do not show explicit signs of such a fortition. My example words listed above surface as hauras ‘brittle’, saura ‘haystack’, äyräs ‘bank’, dialectal hiviä ‘skin’ (though Standard Finnish has adopted the fortited form hipiä). This is though indeed also the regular Finnish development of *-pj-, *-pr- (cf. *kapja > kavio ‘hoof’, *sëpra > seura ‘company’)… so, as long as we wanted to route these loans through Proto-Finnic, it will still be preferrable to indeed reconstruct e.g. *hapras, *hipjä, in order to regularly account for all reflexes, including also such ones as Northern Karelian hapraš, hipie.

But consider now the possibility that these aren’t loans dating all the way back to Proto-Finnic; and rather loans acquired after its breakup, taken up in the first place in Western Finnish, and mediated from there to the other Finnic varieties. In this case, the appearence of *-pr-, *-pj- could instead be a type of “etymological nativization gone awry”: e.g. the pre-Karelian dialect would by this time still have remained without **-wr-, but it would have had *-pr- as an equivalent of West Finnish *-wr-. This could have motivated adopting the cluster not phonetically, but rather “phonologically”. [12]

This firstly allows us to get rid of the strange back-and-forth phonological development in Finnish: words like hauras would simply preserve the Germanic original’s diphthong altogether. Secondly, this allows for some variation in the reflexes elsewhere in Finnic: if different Finnic dialects had to individually deal with adopting West Finnish *-wr- somehow, some of them could have opted for different strategies in different words. And we indeed find a *-wr- ~ *-pr- vacillation in e.g. Fi. teuras ‘sacrificial animal’, teurastaa ‘to slaughter’ ~ Krl. teuraštoa ‘to slaughter’ | Es. tõbras ‘head of cattle’ ~ Votic tõbras ‘elk’. This lexeme is likely from Germanic *þeuraz ~ *steuraz ‘bull’, but no single PF form can be set up. Instead of assuming two parallel loans (*tëpras ‘head of cattle’, *tëuras ‘sacrificial animal’?), it will be possible to reckon with just a single early Finnish loan *teuras, further adopted in differing ways into Karelian and Southern Finnic.

There is one non-trivial cost as well, though. ‘Brittle’ happens to be one of the words showing the characteristic pan-Finnic sound change *š > *h. If the word regardless spread across Finnic by diffusion from dialect to dialect, it will be now fairly difficult to assume that this sound change occurred in unitary Proto-Finnic; it will instead have to be an “areal-genetic” post-Proto-Finnic development. [13]

I am prepared to defend this dating in detail. *š > *h has already been proposed by multiple researchers to date as later than the split between South Estonian and the rest of Finnic. Drawing it out it further yet would not seem outrageous considering what we know of the typical expansion history of this kind of “major”, i.e. phonologically simple but innovative sound changes — while it would seem to allow the phonological fine-tuning of a handful of other known etymologies as well. But that will have to be a topic of its own.


In case my analysis here is correct (and I think it at minimum should prompt some kind of a more detailed defense for why would there ever have existed a “metathetic sound substitution”), there is a moral to be learned as well. The Finnic languages are often taken as phonologically archaic; this is undoubtedly the case with regards to several features of their inherited lexicon, most prominently the bisyllabic root structure. However, loanwords have been a consistent source of new phonotactic complexity. It is then to be expected that there have been several layers of “renormalization” — processes that have pushed these new root shapes back in line, towards the native word structure. And this may have occasionally swept a few native words along as well. Such innovations will probably be impossible to identify as long as we only look at the native component of the vocabulary, however.

[1] Although often enough people with preconceptions about the archaicity of Finnic have also assumed that the metathesis was on the Samic side instead — despite how this would have to be irregular: Samic quite well allows *-rv-, as in e.g. ‘horn’.
[2] Essentially singlehandedly in his 1970 article “Suomen laiva-sanasta“.
[3] See e.g. Pauli Rahkonen (2011), “Finno-Ugrian hydronyms of the River Volkhov and Luga catchment areas“.
[4] Most IE cognates seem to point to meanings like ‘river’ or ‘flowing’, but the derivatives in modern Baltic such as Lithuanian jūra ‘sea’, jaura ‘bog’ may have gained this more stationary meaning early on. I wonder if this semantic shift might have originally taken place near the wide and slow-flowing middle parts of the Volga.
[5] Could it be possible for this root, apparently well-attested only in Balto-Slavic, to be a backloan from Uralic…? It would have to be at least old enough to be pre-Satemization, though, and the “epenthetic” thematic vowel seems hard to explain in this fashion as well.
[6] You can find a working version over here; written in Finnish though. Maybe I will post an English summary here at some point.
[7] Dating the assimilation *-jw- > *-ww- very early in pre-Livonian would also work. In this case, newer loanwords could be still subject to the metathesis *-wj- > *-jw-, they would just be later on assimilated in the opposite direction: *-Viv- > *-Vij-. This might be indeed preferrable in light of two other data points. The first is *vaiva ‘bother, trouble, ailment’, which yields Liv. vǭja; it is however a Germanic loanword, whose original seems to require reconstruction with *-jw- (given e.g. Old High German wēwa). The other is the known Livonian developments *-Vlv- > *-ll-, *-rv- > *-rr- (e.g. *sarvi > *sarro > sǭra ‘horn’) taken together, which would surely predict that at this same time *-jv- > *-jj- as well (and not > *-vv-).
[8] E.g. *śowə > *so ‘mouth’; *sewə- ‘to eat’ > *te-mä > *timä ‘tooth’.
[9] This would also then disprove the often presented Indo-Uralic comparison with the PIE root for ‘to hear’, *ḱlew-. Instead I believe that better IE comparanda might be √h₂ew- ‘to perceive’ (from which *h₂ōws ‘ear’ is derived); or perhaps *(s)kewh₁- ‘to sense’. (Are these a doublet of some sort?)
[10] For Khanty, another possibility is that this is from earlier *kʷaal-, as in Mansi; this could have come about as a distant assimilation *kVwC- > *kʷVC-. While speculative, this idea is not quite entirely ad hoc: a possible parallel is *käwd₁ə ‘rope’ > Mansi *kʷääləɣ.
[11] It would be remotely within possibility to also suggest starting from *korwa or *kowra, as required by Finnic, combined with an ad hoc loss of *r in this cluster. However, I suspect that Finnic *harva ‘sparse, rare’ may be cognate with Samoyedic *tïrå ‘dry’ (< PU *šërwa; cognates in various other branches for both “sides” of this comparison are known as well), which would allow establishing a rather more natural development: PU *rw > Smy. *r.
[12] This gets perhaps even more phonetically plausible, if we assumed the “cluster series shift” to not have happened immediately from *-wr- to *-pr-, but rather from something like more innovative Western Finnish *-wr- to slightly more conservative Western Finnish *-βr-. This latter cluster would then have had no other option than to be uptaken as *-pr- in pre-Karelian / Ingrian / Estonian / etc. — On the other hand, this would require such a fine-grained Finnish dialect distinction to have indeed existed at the time, which may prove problematic.
[13] One other technically possible but again contrived explanation would be to assume that the word was initially lost from all Finnic varieties except Western Finnish, and that it later staged a return from there.

Tagged with: , , , , , , ,
Posted in Etymology, Reconstruction

PIE verb roots, for the people

Last fall I blogged about a possible project on charting the distribution of reconstructed Proto-Indo-European terms in the descendants languages. Some discussion on here focused on the likely unreliability of the data, sourced for my initial survey from a conveniently available but unreferenced Wiktionary appendix.

This was not a choice out of ignorance as much as out of availability. To my knowledge, no public database of reasonably up-to-date etymological Indo-European data is currently available anywhere.

There is no reason though for us to resign to an inequal access to information, with easily found free data being of poor quality vs. “proper” data being locked away in exorbitantly expensive dead-tree-format publications. Data and theories, per se, are uncopyrightable, after all.

I am therefore happy to announce having digitized a list of PIE verb roots, as recorded in the LIV + in its online Addenda und Corrigenda. [1] A basic version is available at the English Wiktionary. You may also be interested in taking a look at the fully tabulated data, in spreadsheet form. The notes in my master file on word derivation and distribution are sketchy at best though, and will require further work to fill in. [2]

While this file is probably necessarily public domain, if anyone reading ends up using or referencing it somewhere, I would appreciate a shoutout or similar.


As comes to actual analysis, at this point the data mainly allows a look at root structure. I might as well note in this post some basic facts that stick out.

For starters, the usual stop phonation constraints (against **D-D, **T-Dʰ, **Dʰ-T) surface reliably. A more interesting related pattern emerges too: I’ve sometimes seen it suspected that the unusual PIE cluster *wr- could come from earlier *br-, therefore tying together with the lack of stem-initial *b-. (Not a lack altogether: at least in the preliminary data, *b still occurs often enough in stem-final position.) However, if this was assumed, we would end up with quite a large number of pre-PIE stems of the shape *b-D; 5 of the 12 roots with *wr- show a stem-final voiced stop; as in *wreg- ‘einer Spur folgen’. So either we’d need to also assume the reconstructible voicing constraints to have emerged only later; or to fine-tune this hypothesis to some kind of a chainshift like *bʰ- > *w-, *b- > *bʰ-.

I would be content to abandon the idea though and to instead assume that most cases of *wr- have rather arisen either thru the reduction of a 1st syllable of earlier roots (in PIE-internal terms ≈ as zero-grade derivatives of some root shaped *(C)wer-, *Cewr-), or thru some Schwebeablaut-ish metathesis process.

There is more interesting stuff going on with resonants. I do not recall seeing this discussed in the context of PIE root structure anywhere before (which of course could be ignorance on my behalf), but several non-trivial constraints on their distribution are apparent. Here are some quick observations on this topic:

  1. No roots — or perhaps better: “sonorant cores” of a shape **-R₁eR₁- occur. This is a fairly trivial application of the universal principle of Similar Place Avoidance, though.
  2. No cores of a shape **-ler-, **-rel- occur either. Again, this is fairly simple to understand as similar consonant avoidance.
  3. The core **-nel- is also absent: this seems less expected, but may have the same motivation as the above. It could also be an accidental gap, though, as onset *n- is relatively rare altogether, and *-len- is well attested. Perhaps it is rather the abundance of *-ney- and *-new- roots that should be questioned.
  4. *m in the onset does not appear to quite count as a sonorant. There are just about no roots beginning with a cluster *Tm-, where *T would be a stop consonant (the lone example is *dʰmeH- ‘blasen’). We do find *sm-, *Hm-, but then again, *sT- and *HT- are possible just as well.
    This also lines up well with how a few cases of *mR- occur as well. Historically, they seem likely to be mostly “zero-grade clusters” again; but this etymological explanation does not suffice to explain the absense of other sonorant-sonorant clusters such as **nR-, **lR-.
  5. Sonorant cores of a shape *-yeR- seem unexpectedly rare altogether. No examples with **-yel-, **-yer-, **-yen- occur at all, and only a single example of *-yem-.
  6. Conversely, even when looking at roots with stem-final obstruents only, onset *-y- is curiously common preceding a stem-final back consonant (velar, laryngeal or *w): 29 cases out of 33, or 88%, show this environment! I wonder if we could assume that such roots reflect some specific pre-PIE front vowel, which was diphthongized to *ye before back consonants. It would likely have to be separate from the source of PIE *-ey- though, which does not seem to have any aversion against occurring before velars and laryngeals.
  7. Initial *h₂w- appears to be more common than all other laryngeal + glide clusters altogether, and it is also quite common stem-finally (i.e. as *-h₂w-, not *-wh₂-!). I wonder if this should be assumed to represent an earlier single phoneme such as *[ħʷ], created even further back from the ancestor of *h₂ by the same processes that led to the rise of the PIE labiovelar series?

I could extend my discussion to onset and stem-final consonant clusters as well, but they do not seem to show anything especially interesting for me to raise up just yet.

[1] Two corrections on reconstruction remain mysterious to me: an alleged removal of a root **meyH- ‘lang werden’ (the two roots I’ve recorded with this shape do not seem to have such a meaning), and the adjustment of a root *kelh₁- to *k¹elh₁- (no such root occurs in the original data; although the root *kel- ‘antreiben’ is adjusted to *kelh₁- in another correction).
[2] I have at the moment no recollection what the column labeled “st” signifies, but I am leaving it in for possible further elaboration.
edit: On re-checking the data, apparently this indicates the number of branches with verbal reflexes given by LIV in the running text. However, footnotes often list nominal derivations, and closer checking also shows that some entries even list a few additional uncertain verbal reflexes in footnotes… meaning that this will be not quite an actual measure of the distribution of the reflexes. Perhaps I will remove this in later editions.

Tagged with: , , , , , ,
Posted in Uncategorized

A note on the Mitian Argument

An article to have caught my attention tonight: Mikael Parkvall (2008), Which parts of language are the most stable?, Sprachtypologie und Universalienforschung 61/3.

The main momentum of the paper is to define a statistical measure of the “arealness” or “geneticness” of a particular linguistic feature. This can be accomplished with fairly elementary calculations, once given a large dataset (the author uses, not especially surprizingly, WALS). Typologists will likely find the excercise illustrative, both in its general array of eyeball-able results, and in demonstrating how even the simplest bit of math can go a long way. [1]

One result stands out to me: among the features found the most strongly genetic, at #3 stands “M-T pronouns” — i.e. the likes of Uralic *minä, *tinä, and their suggested distant relatives in Indo-European, Yukaghir, Turkic, Mongolic, etc. (families that, taken together, form a subset of the Nostratic macrofamily hypothesis known as “Mitian”). Parkvall does not fail to notice this result either.

This may still require a number of caveats. WALS does not pack a very large number of etymological data sets, and is more geared towards features that can instead illuminate areal patterns. And, perhaps as a warning, the #1 most genetic feature on the list turns out to be “presence of phonemic clicks”.

As people who dabble in linguistic classification most probably know, click consonants have traditionally been held as a defining marker of an alleged “Khoisan” language family of southern Africa, first proposed by notorious “lumper” Joe Greenberg. However, putting together more conventional evidence for this grouping has over the years proven near-impossible, and these days conservative analyses instead seem to have settled on distinguishing some 3-4 separate families (the larger units with some acceptance being Khoe, Tuu, and Ju-ǂHoan) in place of unified Khoisan.

(An additional point, if you look closely at the math behind the stats, is that the highly genetic assessment of clicks gets a slice of its homogeneity score not just from the high homogeneity of the “Khoisan” families in their presence of clicks; but also from the complete homogeneity of all non-African language families in their absense of clicks. This argument can be expected to equally apply to any other trait that is truly a single-family or single-geographical-area idiosyncracy, rather than one found sporadically around the world.)

Regardless, we see “Mitianness” still squarely beating out various common tell-tale signs of established-family genetic relatedness, such as the presence of ejectives; sex-based noun gender systems; or polysynthesis.

At some point in the future, once we have an “etymological WALS” at our disposal, it would be moreover interesting to repeat this experiment with a few other lexical variables. E.g. how do numerals or body parts stack against pronouns in genetic classification? What are the stablest kinship terms? How good a job does the Swadesh list really do? Are there any interesting surprizes to be found in words for abstract concepts? Do old and universal enough cultural concepts (think “pottery”, “hunting technology”) behave as if they were core vocabulary? Etc, etc, time will tell.

[1] Of course, something like 90% of the time, “the simplest bit of maths” seems to be all that we have yet in linguistics. This is surely great news for people who are not professionals, but who want to follow linguistics arguments along from home; or for the career plans of people like myself, who know enough undergrad-level maths to craft a couple other elementary mathematical tools for testing this or that hypothesis, if necessary. On the other hand, it is a less than promising sign about the overall quantitative reliability of our field in general, so far…

Tagged with: , , , , ,
Posted in Commentary

On *ü in Mari vs. Proto-Uralic

It is always a low note of sorts when a scientific dispute gets resolved by quietly shifting consensus (e.g. due to proponents of one side passing away) rather than by actual discussion.

One of these seems to be the status of Proto-Uralic *ü. In literature up to about the mid-1900s, various skeptical viewpoints can be found on if a contrast between *i and *ü should be reconstructed or not. They dwindle away in later times however, with the modern researcher only really encountering any trace of the issue when perusing the UEW, which still provides proto-forms with *ü only as an alternative to proto-forms with *i. So far I have regardless been unable to locate any turning point source that argues in detail in favor of establishing *ü after all.

For sure, all major overviews of comparative Uralic vocalism (Steinitz 1944, Collinder 1960, Sammallahti 1988) still reconstruct contrastive front rounded *ü (or, in the case of Steinitz, largely equivalent reduced *ö̆), and give what they see as the regular later development in most individual languages. It is thus fairly simple to reverse-engineer a rough argument for in which cases to reconstruct *ü. Altogether, especially the following three contrasts appear to be relatively robust and in etymological correspondence to each other:

  • Finnic *i : *ü
  • Hungarian ë : ö
  • Khanty *e : *ö (perhaps rather *[ɪ] : *[ʏ])

Also the *i : *ɨ contrast in Permic correlates well with this (though *ɨ can also derive from PU *u and *ä).

Numerous further conditional developments, including also indirect traces in several Uralic languages that lack front rounded vowels, have also been identified. Collating these in one place would probably amount to an almost full answer to old skeptical viewpoints, which mostly have focused on the possibility that the contrasts seemingly pointing to *i : *ü have separately developed in each language.


I think one subgroup remains an open problem though. A phonetically equivalent contrast also appears in Mari, between *ĭ (> generally /ə/, in a couple of dialects /ɪ/ or /i/) and *ü̆ (> Hill Mari /ə̈ ~ ʏ/, Meadow Mari /y/). But this particular contrast seems to do a poor job at matching with the Proto-Uralic *i : *ü contrast, as could be reconstructed on the basis of the other languages. While reflexes with “correct” labiality seem to be in the lead, an abundance of counterexamples is also apparent: [1]

  • PU *i > Ma *ĭ: 15 cases
    *ićä ‘father’ > *ĭćä ‘older brother’, *kičək > *kĭčək ‘fresh snow’, *kirä- > *kĭre- ‘to hit’, *kiśkə- > *kĭške- ‘to throw’, *minä > *mĭńə ‘I’, *ńičkä- > *jĭčke- ‘to pluck’, *pićlä > *pĭćle ‘rowan’, *pilwə > *pĭl ‘cloud’, *pištä- > *pĭšte- ‘to put’, *pitä- > *pĭće- ‘to hold’, *śikšta (← II) > *šĭštə ‘beeswax’, *śilmä > *šĭnćä ‘eye’, *tinä > *tĭńə ‘thou’, *wittə > *wĭć ‘5’
  • PU *i > Ma *ü̆: 6 cases
    *kiwə > *kü ‘stone’, *piŋə > *pü ‘tooth’, *nimə > *lü̆m ‘name’, *śixələ > *šülə ‘hedgehog’, *šikšna (← Baltic) > *šü̆štə ‘strap’, *sitV- ‘to bind’ > *šüðəš ‘bind’
  • PU *ü > Ma *ĭ: 9 cases
    *küjə > *kĭškə ‘snake’, *külmä > *kĭlmə ‘cold’, *küńärä > *kĭńer ‘elbow’, *kütkə- > *kĭćke- ‘to harness’, *mükkä > *mĭk ‘mute’, *ńüktä- > *ńĭktä- ‘to pluck’, *süjə > *šĭjä ‘year ring’, *sükəśə > *šĭžə ‘autumn’, *śüklä (← Turkic) > *šĭɣəľə ‘wart’
  • PU *ü > Ma *ü̆: 11 cases
    *d₂ümä > *lü̆mə ‘glue’, *künčə > *kü̆č ‘nail’, *künčä- > *kü̆nče- ‘to dig’, *küsV > *kü̆žɣə ‘thick’, *kütV > *kü̆ðäl ‘middle’, *sülə > *šü̆lə ‘fathom’, *süskV- > *šü̆škä- ‘to cram’, *śüd₁ə > *šü ‘coal’, *śülkə > *šüwəl ‘spit’, *türə > *tü̆rəś ‘full’, [2] *tüŋə > *tü̆ŋ ‘base’, *wülä > *wü̆l- ‘over’

PU *e also mostly yields Ma *ĭ or *ü̆, again split fairly evenly.

  • PU *e > Ma *ĭ: 15 cases
    *e- > *ĭ- ‘negative verb’, *elä- > *ĭle- ‘to live’, *eštə- ‘to be in time’ > *ĭšte- ‘to do’, *jećə > *ĭške ‘self’, *jekä > *i ‘year’, *keltä- > *kĭlðe- ‘to bind’, *kenčV- > *kĭčälä- ‘to serch’, *neljä > *nĭl ‘4’, *le- > *liä- ‘to be’, *leštə > *lĭštäš ‘leaf’, *peljä > *pĭləkš ‘ear’, *penä > *pi ‘dog’, *pesä > *pĭžäkš ‘nest’, *repäś (← II) > *rĭwəž ‘fox’, *śerV > *sĭr ‘character, nature’
  • PU *e > Ma *ü̆: 12 cases
    *jetV > *jü̆t ‘night’, *kejə- > *küä- ‘to boil’, *kerə > *kü̆r ‘bast’, *pečä > *pü̆nčə ‘pine’, *pečkV- > *pü̆čkä- ‘to cut’, *sesar (← IE) > *šü̆žar ‘sister’, *śečä > *čü̆čə ‘uncle’, *śepä > *šü ‘neck’, *tejnəš (← II) > *tü̆əž ‘pregnant’, *terä (← II) > *tü̆r ‘blade’, *werə > *wü̆r ‘blood’, *wetə > *wü̆t ‘water’

I have included here cases with Proto-Mari *i and *ü only in stems of the shape CV(V-), where the appearence of “full” rather than “reduced” vowels is regular. Some other examples exist as well though, such as *ik ‘one’ (< *ü?), *üpš ‘smell’ (< *i?).

Existing literature does not seem to tackle the issue, and often I get the feeling that authors essentially try to sweep the problem under the carpet. Sammallahti leaves the history of Mari vocalism untreated. Collinder offers, for the cases with *e > *ü̆, only the slightly ad hoc rule that this development occurs “in the vicinity of *w and *r”, while he does not comment on the cases with *i > *ü̆ or *ü > *ĭ. Steinitz’ approach posits a late development *ĭ > *ü̆ again in the vicinity of labial consonants (and raises the possibility that it applies only to Meadow Mari and not even Proto-Mari), but leaves the other cases untreated.

I have not seen any specialized studies that would have fared better either. E. Itkonen in his major 1954 article on the history of Mari and Permic vocalism even explicitly notes that labiality assimilations that he posits next to *w, *p, *r cannot be considered regular. Contrast indeed e.g. ‘blood’ (*we- > *wü̆-) vs. ‘five’ (*wi- > *wĭ-), ‘tooth’ (*pi- > *pü-) vs. ‘cloud’ (*pi- > *pĭ-), ‘blade’ (*-er- > *-ü̆r) vs. ‘to hit’ (*-ir- > *-ĭr-). — Also, since when is *r a labial consonant anyway?


I suspect that already the basic assumptions underlying earlier research on this are incorrect. Instead of the developments *i > *ü̆ and *ü > *ĭ being some kind of exception cases to be explained away, the old skeptic contingent has been right this time: the contrast between Proto-Mari *ĭ and *ü̆ is unrelated to the contrast between Proto-Uralic *i and *ü. Rather, PU *i, *ü and *e merged in the early history of Mari, and this merged phoneme (I will mark it simply as *i) later secondarily split into *i > *ĭ and *ü > *ü̆ again — without regard for its PU origins.

The best single conditioning factor instead appears to be stem type:

  • *i-ä > *ĭ: 23 cases
    *elä- > *ĭle-, *ićä > *ĭćä, *jekä > *i, *külmä > *kĭlmə, *keltä- > *kĭlðe-, *küńärä > *kĭńer, *kirä- > *kĭre-, *minä > *mĭńə, *mükkä > *mĭk, *neljä > *nĭl, *ńičkä- > *jĭčke-, *ńüktä- > ńĭktä-, *pićlä > *pĭćle, *peljä > *pĭləkš, *penä > *pi, *pesä > *pĭžäkš, *pištä- > *pĭšte-, *pitä- > *pĭće-, *repäś > *rĭwəž, *śüklä > *śĭɣəľə, *śikšta > *šĭštə, *śilmä > *šĭnćä, *tinä > *tińə
  • *i-ä > *ü̆: 9 cases
    *d₂ümä > *lü̆mə, *künčä- > *kü̆nče-, *pečä > *pü̆nčə, *sesar > *šü̆žar, *śečä > *čü̆čə, *śepä > *šü, *šikšna > *šü̆štə, *terä > *tü̆r, *wülä > *wü̆l-
  • *i-ə > *ĭ: 11 cases
    *eštə- > *ĭšte-, *jećə > *ĭške, *kičək > *kĭčək, *küjə > *kĭškə, *kiśkə- > *kĭške-, *kütkə- > *kĭćke-, *leštə > *lĭštäš, *pilwə > *pĭl, *süjə > *šĭjä, *sükəśə > *šĭžə, *wittə > *wĭć
  • *i-ə > *ü̆: 15 cases
    *kejə- > *küä-, *künčə > *kü̆č, *kerə > *kü̆r, *kiwə > *kü, *nimə > *lü̆m, *piŋə > *pü, *sülə > *šü̆lə, *śüd₁ə > *šü, *śülkə > *šü̆wəl, *śixələ > *šülə, *tejnəš > *tüəž, *türə > *tü̆rəś, *tüŋə > *tü̆ŋ, *werə > *wü̆r, *wetə > *wü̆t
  • unclear/inapplicable > *ĭ: 4 cases
    *e- > *ĭ-, *kenčV- > *kĭčälä-, *le- > *liä-, *śerV > *sĭr
  • unclear > *ü̆: 6 cases
    *jetV > *jü̆t, *kütV > *kü̆ðäl, *küsV > *kü̆žɣə, *pečkV- >*pü̆čkä-, *süskV- > *sü̆skä-, *sitV- > *šüðəš

The raw accuracy of the maintenance hypothesis (*i > *ĭ, *ü > *ü̆) seems to be 26 cases predicted correctly out of 41 ≈ 63.5% (worse if we also wanted to presume *e > *ĭ). Assuming the typical reflexation to be *i-ä > *ĭ, *i-ə > *ü̆ instead reaches up to 38 correctly predicted out of 58 ≈ 65.5 %. Which is so far only marginally better… But there is room for fine-tuning here as well.

Some of the apparent exceptions in verb roots can be readily interpreted to indicate a shift of stem type in pre-Mari. *ĭšte- ‘to do’, *kĭške- ‘to throw’ and *kĭćke- ‘to harness’ (in red above) show 2nd syllable *e, which normally corresponds with PU *A-stem verbs; thus I would reconstruct pre-Mari *ist-ä-, *kiśk-ä- and *kitk-ä-. Here *-ä- is probably some kind of a transitivizing suffix, well known in Mari (the classic example is probably /koða-/ ‘to stay’ : /koð-e-/ ‘to leave’) and probably dating to earlier times already (reconstructible in a small number of PU doublets such as *künčə ‘nail’ ~ *künč-ä- ‘to plough/dig’; *ipsə ‘smell’ ~ *ips-ä- ‘to smell’). We could also take the final *-e, rather rare in nominals, of *ĭške ‘self’ as grounds to reconstruct pre-Mari *(j)iś-kä.

Similarly, *pü̆čkä- ‘to cut’, *šü̆škä- ‘to cram’ (in blue above) show 2nd syllable *ä, which normally corresponds with PU *ə-stems; and therefore I would reconstruct pre-Mari *pičkə-, *siskə-. The former thus turns out better compareable with Mordvinic *pečkə- ‘to cut’ than with Samic *peackē- ‘to cut (off)’ (< *pečk-ä-), and the latter with Samic *sëskë- ‘to rub against’ than with Fi. sysä-, Es. süska- ‘to push into’.

(This on the other hand creates new problems for *kĭčälä- ‘to serch’, *liä- ‘to be’, *ńĭktä- ‘to pluck’, which now start pointing to earlier *ə-stems…)

I would also take *kü̆žɣə ‘thick’ (also in blue) as pointing to earlier *kizəgV < *küsəkV (akin to Proto-Samic *kësëkV > Northern Sami gassat etc.), rather than the bare root *küsä that most sources report. Perhaps even *kĭškə ‘snake’ should be taken as pointing to PU *küjəwä (> Erzya /kijov/, Hung. kígyó, Smy. *kiwä) > pre-Mari *kiwä(-skV) rather than the bare root *küjə (> PF *küü, Udm. /kɨj/ [3]).

Nominal derivation phenomena could lie behind some of the other exceptions as well, though due to the non-maintenance of the PU stem vowel contrasts in Mari nominals, this will have to be more speculative. For example, Finnic *kidek ‘snowflake’ has a number of parallel derivatives etc. in the descendant languages, and the original root may well have been *kičä rather than *kičə. It would be also possible to assume PU *kičäk, and date the development *-Ak > *-Ek (as seen in cases such as Fi. jauha- ‘to grind’ ~ jauhe ‘powder’; jättä- ‘to leave behind’ ~ jäte ‘trash’) as inner-Finnic.

Consonant environment conditioning does not need to be ruled out entirely either. E.g. *šü ‘neck’ could be taken back to pre-Mari *siw(ä), and *šĭjä ‘year ring’ to pre-Mari *sijə, with the natural developments *iw > *ü̆ and *ij > *ĭ bleeding the usual stem type conditioning. (This provides also another possible line of explanation for ‘snake’.) The latter rule could be even generalized slightly to also capture *wĭć ‘5’.


The phonetics of this hypothesis do not have to be left arbitrary either: a kind of palatal umlaut mechanism seems to work. The root structure *i-ä > *ĭ(-e) remains consistently front-vocalic and illabial; while the root structure *i-ə would probably have been first retracted to something like *[ɨ]-[ə]. After this, I would suppose central *ɨ was labialized to [ʉ], and then re-fronted > [y] > [ʏ]. This development appears internally unmotivated (it could possibly be attributed to areal influence from Turkic) — but it has a good precedent in the fact that Mari is the only Uralic language with a front rounded reflex of PU *ë, for which we must then reconstruct the exactly parallel development [ɤ~ɜ] > [ɵ] > [ø] > [y].

Later vowel harmony between /a ~ ä/, as attested in Hill Mari (but not Meadow Mari) was likely not yet in effect by this stage. This appears to be shown by the straggling cases of Proto-Mari *ĭ-ä: where *ĭ is further reduced and retracted to /ə/ in Hill Mari, the stem vowel surfaces as /a/, not as /ä/. Cf. e.g. /kəčala-/ ‘to serch’, /ńəkta-/ ‘to skin’, /šəja/ ‘year ring’.

[1] This selection has been datamined from both older and newer literature. Individual referencing would go beyond the purposes of this blog post. Various dubious or difficult-to-reconstruct comparisons have been omitted, including e.g. most cases where some or most other reflexes point to original *ä rather than *e.
[2] To my knowledge, this comparison has not been previously presented, though it seems self-evident. The identity of the “suffix” is unclear to me however.
[3] Even this might derive from the longer form *küjəwä: contrast *süjə > /si/ ‘year ring’. Perhaps thus: *süjə > *süj > *si, but *küjəwä > *küjə > *kɨj?

Tagged with: , , , ,
Posted in Commentary, Reconstruction

More on umlaut chronology in Samic

I recently proposed that the fission of Proto-Uralic *ä and *e into more open and more close vowels in Samic, depending on the following second-syllable vowels (“stem type”), should be dated already to the dialectal West Uralic era, given that similar developments appear also in their closest relatives: the Finnic and Mordvinic languages. This diverges in a couple of ways from the views in the main handbooks on the historical development of Samic, i.e. Korhonen (1981): Johdatus lapin kielen historiaan and Sammallahti (1998): The Saami Languages: An Introduction.

One basic disagreement is over absolute chronology. While both Korhonen and Sammallahti (henceforth: K & S) agree that at least the merger of the stem types *e-ə and *i-ə [1] should indeed be dated to the earliest phase of the pre-Proto-Samic era, their treatises begin from the now obsolete “Proto-Finno-Samic”, dated as some half a millennia later, reconstructed with cross-reference mainly to Finnic, and usually also located some 1000 km more westerly (in the Gulf of Finland area) than my reference point in common West Uralic (around the upper reaches of the Volga). [2]

Another however concerns the overall relative chronology. K & S present the historical phonology of Samic in a highly tiered fashion that makes for some very attractive charts and graphics, with roughly four distinct periods of development:

  • an early phase (K’s “kantalapin I vaihe“, S’s “Pre-Saamic” and “Proto-Saamic 2“) with the loss of several inherited vowel contrasts, and the splitting of this smaller pre-Samic vowel system into several allophones, depending on stem type;
  • a complete revamp of the vowel length system (K’s “kantalapin II vaihe“, S’s “Proto-Saamic 3” in parts), depending on earlier vowel qualities;
  • a restructuring of the system of unstressed vowels (K’s “kantalapin III vaihe“, S’s “Proto-Saamic 3” in parts as well as “Proto-Saamic 4“)
  • late phonetic shifts in the sound values of several stressed vowels (K’s “kantalapin IV vaihe“, S’s “Proto-Saamic 5“).

As I have mentioned in an exchange in the comments section, I am however skeptical of the historical reality of this model. It strikes me as unnaturalistically neat altogether. Only a few of the changes can be explicitly shown to have been in the presented order in relative chronology, and probably most of the distinct “phases” here should be meshed together. Others might even be post-Proto-Samic entirely (though that will be another topic).

In particular I do not think that all Proto-Samic umlaut developments should be considered equally early. The Samic languages are some of the most “umlaut-rich” languages within Uralic, and the individual languages have continued to innovate new changes of this type pretty much as soon as new features arise among the unstressed vowel system. In this context it seems entirely implausible to me that at one point the pre-Proto-Samic speakers would have collectively decided “ok, that’s enough for now, let’s call a 500-year moratorium on umlauts”.

More specifically, while I think that developments *ä-ä > *ȧ-ȧ (> PS *ā-ē) versus *ä-ə > *e-ə (> PS *ē-ë) might be even earlier than has been previously suspected, by contrast I think that the a-umlaut of inherited *e and *o (e.g. PU *pesä >> PS *peasē ‘nest’; PU *kota >> PS *koatē ‘tent, teepee’) must be instead dated to a somewhat later Proto-Samic phase. This is due to some exception cases that appear to be explainable by them having been subject to both umlauts.

Umlaut stacking

It’s been observed already since the earliest reconstruction work on Uralic vocalism that PS *ea fairly often turns up in the Samic languages as a reflex of earlier *ä. Explanations for these cases have varied quite a bit, from considering this the regular reflex of the stem type *ä-ä (this was the opinion of Wolfgang Steinitz), to dismissing all instances as irregular or “sporadic” (thus K & S). Neither extreme is satisfying though, and it would be desirable to identify some conditions for the development. Dating the umlauts of *ä and *e into two different chronological stages seems to offer a lead on this.

If we assume that the pre-Samic dialect of late West Uralic — I will call it “pre-Samic” or “preS” for short — had already raised *ä-ə to *e-ə, as in words like the following:

  • PU *jäŋə [jɛŋə] > preS *jeŋə > PS *jēŋë ‘ice’
  • PU *kälə [kɛlə] > preS *kelə > PS *kēlë ‘tongue’
  • PU *mälkə [mɛlkə] > preS *melkə > PS *mēlkë ‘breast’

— then at this point, a derivational process turning one of these *ə-stem words into an *ä-stem word would allow it to be later subject to a-umlaut just as inherited *e is, yielding PS *ea. There appear to be some clear examples that involve the syncope of *-ə- upon the addition of a derivational suffix: PU *CäCə > preS *CeCə → *CeCə-Cä > *CeCCä > PS *CeaCCē. Some other examples involve a derivational process that leads to a pre-Samic *o-stem (which similarly trigger a-umlaut): preS *CeCə → *CeC-o > PS *CeaCō. [3]

This mechanism appears to explain a reasonable number of the cases of PU *ä yielding PS *ea. Thus far, I have identified seven possible front-vocalic cases (including one somewhat speculative new etymological proposal):

  • *keaćō ‘medium-sized whitefish’ (only in Lule Sami: getjuk) < preS *keć-o
    ← *kećə < PU *käśə(ŋ)
    Cf. Mansi *kääsəŋ, Hungarian keszeg ‘bream’, which both indicate earlier *ä. (Finnish keso ‘white bream’ has also been considered cognate, but is better derived from kesä ‘summer’.)
  • *leapō- (Lule Sami lehpagis ‘nice’, Old Swedish Sami leppotet) < preS *lep-o-
    ← *le(p)pə < PU *lä(p)pə
    Cf. Moksha /ľäpä/ ‘weak’, Mari *lewə ‘warm, mild (of weather)’, Khanty *leepət ‘weak’, which indicate *ä. Finnic *leppedä ‘balmy’ again looks like the odd member out in the cognate set. The similar *leepedä ‘mild’ could be instead compared here just about as easily. [4]
  • *meanō- ‘to become evasive’ < preS *men-o-
    ← *menə- < PU *mänə-
    Cf. Mordvinic *mäńə- ‘to dodge, to get free’, Komi /mɨn-/ ‘to get free’, Hungarian mentes ‘free’, which indicate *ä. The verb *mänə- ‘to get free’ is probably ultimately somehow related to *menə- ‘to go’, but the cognates suggest the two having been distinct already at the PU level. (I additionally wonder if contamination from the former could perhaps explain the irregular vowel in Savonian/Karelian mäne- ‘to go’.)
  • *peajō- ‘to shine’ < preS *pej-o-
  • *peajvē ‘day’ < preS *pejwä < *pejə-wä
    both ← *pejə < PU *päjə ‘bright, shining, etc.’
    The bare root does not appear to unambiguously survive anywhere (perhaps in Komi /bi/ ‘fire’?), but numerous other derivatives generally indicate *ä, e.g. Finnic *päivä ‘day, sun’, Hungarian fehér ‘white’.
  • *pealkē ‘thumb’ < preS *pelkä < *peləkkä < PU *pälə-kkä
    Cf. Mordvinic *päĺka, where the unvoiced cluster *ĺk must be secondary (PU *lk would have yielded **ĺg). Komi /pel ~ pev/ also suggests *ä. The underived root could be identified with *pälə ‘side’, as has been proposed by Janhunen. The messy Finnic words for ‘thumb’, often included here, mostly point to  PF *peikala or *peikoi; and they probably need to be kept separate (at best some kind of secondary contamination of the original Uralic word with some other source could be involved).
  • *veakkē ‘help’ < preS *wekkä < *wekə-(k)kä
    Formally, this might be a derivative of PU *wäkə ‘power’ > preS *wekə (> PS *vēkë ‘people’). A semantic intermediate ‘activity with several people, work bee’ could be involved.

It is however necessary to also assume similar but even earlier syncope in some other old derivatives, which do show regular a-umlaut in Samic.

  • *ńālmē ‘tongue’ < *ńälmä (~ Hungarian nyelv ‘tongue, language’ etc.) ← PU *ńälə- ‘to swallow’
  • *ńālkē ‘tasty’ < *ńälkä ← id.
  • *pāŋkē ‘reindeer’s headgear’ < *päŋkä ← PU *päŋə ‘head’

The first of these words has a very wide distribution, and the bisyllabic form *ńälmä could perhaps be assumed already for PU… though this would get in the way of a partial rule for *m-lenition in Hungarian that I have sketched some years ago.


I can also think of a slightly different mechanism to account for one of the remaining high-profile examples of *ä >> *ea. This is *pealē ‘side; half’. The polysemic meaning suggests that this may have come about as a blend of two originally distinct PU words: the above-mentioned *pälə ‘side’ (> Finnic *peeli, Mordvinic *päľ, Mari *pel), and the evidently closely related but distinct *pälä ‘half’ (> Finnic *pooli, Mordvinic *päľə, Mari *pelə).

The two words also seem to merge in Ugric: compare Hungarian fél : fele- ‘side; half’; Mansi *pääl ‘side; half’; Khanty *peeɭək ‘side; half’. But while this development can be simply due to the loss/reduction of 2nd-syllable vowels, the Samic development would require assuming contamination: the stem vowel seems to continue preS *pȧlȧ ‘half’, while the *e-type 1st syllable vowel seems to continue preS *pelə ‘side’. The two would led to the creation of a preS “compromise” form *pelä, from which then regularly > PS *pealē.

Finnic parallels

Worth noting is that in the case of ‘day’, a similar exception development is also found in Finnic. PF *päivä ‘day, sun’ (and not **paivi) has likewise escaped the early lowering/backing of *ä-ä, perhaps for the same reasons too: contraction from *päjə-wä taking place only after the a-umlaut of primary *ä-ä.

This pattern seems to extend further: among the remaining cases with *ä > PS *ea, Finnic cognates usually have *ä-ä as well. At least five cases can be identified that have correspondences in the more eastern Uralic languages:

  • ‘lichen’: PS *jeakēlē ~ PF *jäkälä (~ Permic)
  • ‘paw’: PS *keapēlē ~ PF *käpälä (probably related to *käppä ‘paw’ > Finnic, Mordvinic)
  • ‘bog’: PS *jeaŋkē ~ Fi. jänkä (~ Permic, Mansi, Khanty)
  • ‘flap, cover’: PS *leappē ~ PF *läppä (~ Mari, Permic, Hung., Mansi)
  • ‘smoke hole’: PS *reappēnē ~ PF *räppänä (~ Permic)

While this same correspondence is also common enough in loanwords (PS *(h)earkē ← PF *härkä ‘bull’; PS *kearnē ← PF *kärnä ‘crust’; both originally from Baltic), and this approach has in the past been applied to ‘bog’ (S → Fi) and ‘paw’, ‘flap’ (F → S) as well, nothing seems to outright require considering these words later than the pre-Samic / pre-Finnic period. If *ä-ä [ɛ-a] had in both groups been lowered to *ȧ-ȧ [a-a] by then, new lexical innovations of the time could reintroduce also a new, secondary *ɛ-ä in pre-Finnic (*jɛkälä ‘lichen’, etc.); while in pre-Samic, only *e-ä would have been available.

Conveniently enough, there is also one word of this type for which early loaning in the West Uralic period is assured: PS *keavrē ~ PF *käkrä ‘bent’, which probably derives from Indo-Iranian *čakra- ‘wheel’ (or from a slightly earlier *ḱɛkra-). [5]

To be sure, I still generally hold that if two competing etymologies are available for a word, then all other things being equal, the more recent explanation should be preferred. But this is only a probabilistic rule-of-thumb. So while several of the words here (and also many of the more numerous similar cases yet that are restricted to Samic & Finnic) probably have indeed been loaned between Finnic and Samic at a later date, I would not rule out the possibility of some of them still going back to different parallel preS and preF sound substitutions in the late West Uralic era.


For now I’m still sketching out the situation with back vowels. In particular it’s not clear to me how the raising PU *ë-ə and *aj(C)ə > preS *a-ə > PS *ō-ë should be dated: this is attested from numerous Germanic loanwords, and thus could be newer than *ä-ə > *e-ə. It may well be the same change as preS *a-a > PS *ō-ē (likewise attested from Germanic loanwords); and thus not triggered by stem vowels at all.

[1] In their view the result of this merger would not have been quite [i], but a near-close vowel they mark as *ḙ. I would suggest the sound value [ɪ] for this (similarly [ʊ] for their *o̭), reflecting the common tendency of close short vowels to reduce and centralize. Initially this probably would not have had any phonological signifigance though, so I will continue to use *i and *u for the early pre-Samic and early pre-Mordvinic era.
[2] “Common” rather than “proto”: while West Uralic at least seems like a defensible subgrouping to me (unlike its traditionally assumed kin like “Proto-Finno-Volgaic”, “Proto-Finno-Permic”, etc.), the common innovations are not many, and it remains effectively only a dialect of Proto-Uralic itself. This being the case, an accurate picture of West Uralic can only be gained by starting from Proto-Uralic and “reconstructing upwards”, not by presuming the existence of the group and attempting to compare Samic/Finnic/Mordvinic in isolation (a method that has traditionally generated rather Finnocentric models, further muddled by conflicting evidence from areal later-diffused vocabulary). It would also be premature to rule out entirely the possibility of WU being an “areal-genetic” group of dialects after all, since non-exact parallels for a few of the characteristic innovations (e.g. *ë- > *a, *åĆ > *aĆ, *-d₂- > *-d₁-) can be found in Mari, Permic and even Hungarian as well.
[3] It does not seem clear to me if these cases should be assumed to involve the suffixation of a consonantal suffix such as *-w and a later development *-əw > *-o, or simply the addition of *-o as a suffixal element right away, but this does not really affect their validity. If the former though, then this has some implications for the history of the PS stem type *ā-ō; they could not descend from *ä-o at the West Uralic level, but would have to go back to preS *ȧ-ȧ < PU *ä-ä, with the labial suffix only as an incidental addition.
[4] The irregular vowel correspondences in the Finnic words could perhaps be accounted for by assuming contamination from the Germanic loanword *leevä ‘slight; temperate’. This is one of the very old Germanic loans in Finnic that shows *ē → *ee. While both sides appear to point to a mid vowel [eː], I believe this is illusory. PIE *e was probably closer to [ɛ], and the eventual lowering and backing of *ē to *ā in Northwest Germanic suggests that even an intermediate [æː] existed at one point; as is also shown by the existence of a couple of loanwords in Finnic that have *ē → *ä (e.g. PGmc *wēgaz ‘lever, scales’ → PF *väkä ‘hook’). Pre-PGmc *klēwas ‘lukewarm’ was thus probably loaned as pre-Finnic *lääwä, later raised to PF *leevä together with inherited words like PU *lämə > preF *läämi > PF *leemi ‘broth’. At this period it could be assumed that pre-F *läppətä ‘mild’ was adjusted to *lääpətä, on the model of *lääwä; with later raising then giving PF *leepedä. — Even slightly earlier *leeppedä could perhaps be assumed, with PF *leepedä and *leppedä representing two ways of naturalizing the overheavy syllable structure.
[5] Another highly similar word family exists as well: PF *käprä ‘rolled up’; PS *kēpr-ë- ~ PF *käpr-i-stä- ‘to roll up’. As has been proposed by Katz, in principle this might represent an earlier parallel loan with PIE *kʷ still retained on the loangiving side, substituted by pre-S / pre-F *p. Dating *l > *r in Indo-Iranian as already this early seems unlikely however, and I suppose a more probable explanation would be that this is Uralic-internal descriptive variation. Note also a number of obviously secondary formations in Finnish such as käkkyrä, käppyrä ‘curved thing’.

Tagged with: , , , ,
Posted in Etymology, Reconstruction

The phonetic vagueness of laryngeal theory

While I continue to be strictly speaking Not An Indo-Europeanist, I regularly keep reading about comparative Indo-European research just as well. Including not only matters with immediate relevance to Uralic studies, but also the usual controversy honeypots: interpretations of the stop system (glottalic? aspiration where? how many velar series? etc.); and interpretations of the vowel system in relation to ablaut and laryngeal theory. They seem to often form an important “frontier” of sorts in the development of fine-grained historical phonology reconstruction methodologies, if only due to the large amount of attention they receive.

This doesn’t imply I would be particularly impressed with the average state of the field.

In the case of the last-mentioned, one thing that I see come up a lot is that given a certain degree of uncertainty over the original realizations of the laryngeals, almost everyone seems to be still treating them at least to some extent as deus ex machinae, outside of subjection to phonetically meaningful sound changes.

One particular repeat offender seems to be the interaction of laryngeals with syllabic resonants. Consider e.g. the following list of sound developments given by Peter Schrijver (2015), Pruners and trainers of the Celtic family tree:

  • *CRHjV > *CRījV (laryngeals vocalize to *ī between consonant+resonant and a palatal glide)
  • *R̥DC > *RaDC (word-initial syllabic resonants vocalize to resonant + *a before a voiced unaspirated stop + another consonant)
  • *HR̥C > *aRC (syllabic resonants vocalize to *a + resonant after a word-initial laryngeal — including voiced unaspirated stops)
  • *CR̥HV > *CaRV (syllabic resonants vocalize to *a + resonant before laryngeal + vowel)
  • *CR̥HT > *CRaT (syllabic resonants vocalize to resonant + *a before laryngeal + voiceless stop)
  • *CR̥HC > *CRāC (syllabic resonants vocalize to resonant + *ā before laryngeal + other consonant)
  • *N̥ > *aN (remaining syllabic nasals vocalize to *a + nasal)
  • *R̥ > *aR, *Ri (remaining syllabic liquids vocalize to *a + liquid or liquid + *i)

This is pretty much abstract symbol algebra. At best these can be called sound correspondences between Proto-Indo-European and Proto-Celtic. To suggest that a laryngeal or a syllabic resonant would directly change to or excrete *ī in the first case, but *ā in the sixth, is just about equivalent to claiming “a sound change” *dw > *erk- for Armenian. In reality, developments like these surely must have been composed of several stages.

Of course Schrijver is doing only an overview of Celtic historical phonology, and I would predict that some of the primary sources go into more detail. But it strikes me as an overall problem if there is little interest in IE studies in unpacking these kind of sound correspondences. Nowhere have I seen even fairly in-depth introductions to laryngeal theory attempt to explain these kind of developments using the normal tools and frameworks of historical sound change.


It’s not even very difficult at all to see how some elementary order could be imposed on this kind of a mess. We could note that there is e.g. tons of *a-insertion is going on (and I could add the change *CHC > *CaC, which Schrijver skips over, probably on account of being analyzeable as even earlier than Italo-Celtic). It seems likely there has been a single main epenthesis process, followed by diversification in different environments; not from numerous near-identical epentheses. Additionally, the epenthesis seems likely to have been not quite to *a, given some reflexes as *i.

So for the sake of an example, suppose e.g. that early on, all syllabic resonants first break to *əRə. From such a starting point, most of the more complex developments here will be explainable with what are reasonably natural phonetic developments:

  • *R̥DC “>” *RaDC will be simply the loss of word-initial *ə: *əRəDC- > *RəDC- > *RaDC-.
  • *HR̥C “>” *aRC will be explainable as the blocking of the previous change due to an earlier laryngeal, followed by loss of the second schwa: *HəRəC- > *HəRC- (**HRəC) >> *arC-.
  • *CR̥HV “>” *CaRV will be explainable as the loss of a schwa from an open syllable before a full vowel: *CəRəHV > *CəRHV-. It is not clear if the first schwa would be better assumed to have remained due to schwa lowering to *a intervening (> *CaRHV- > *CaRV-), or due to the laryngeal remaining long enough that the loss of schwa from open syllables was no longer operational (> *CəRV- > *CaRV-).
  • *CR̥HC “>” *CRāC appears to show that the second schwa will now remain in a closed syllable, leading to the loss of the first one instead: *CəRəHC- > *CRəHC-. The compensatory loss of laryngeals may have then kicked in around this time: *CRəHC- > *CRə̄C- > *CRāC-.
  • *CR̥HT “>” *CRaT might diverge from the previous due to any number of reasons. One is that medial voiceless *-T- was likely pronounced longer than its voiced counterparts, and could have induced a shortening *ə̄ > *ə.
  • *CRHjV “>” *CRījV (where we probably expect a syllabic resonant in the input?) could be routed thru e.g. a metathesis *Hj > *iH: thus first *CəRəHjV- > *CəRəiHV-. Then assume a monophthongization *əi > *ī, and loss of the first schwa, now found before a full vowel: *CəRəiH- > *CRīHV-. Finally, suppose loss of the stray laryngeal, and epenthesis of *j as a hiatus filler to acquire *CRījV-, as required.

This is but a quick drabble, and I don’t mean to claim that this would be an accurate view of the actual history. But I would like to see more IEists take a stab at developing an analysis of the finer details of laryngeal theory that at least works more like this second set of sound changes.

I’ve already seen some promising work on syllabification in PIE that posits schwa epenthesis already as an original phonological process, but it seems certain that such research could be also linked to numerous the branch-specific historical developments.


My hunch is moreover that this line of query could end up going much further. To my knowledge, even counting barely attested ancient epigraphic languages, no IE language retains any direct evidence of syllabic nasals, or of the phonetically mysterious “syllabic laryngeals”. And if it were to turn out that phonetic vowels can be assumed to have been there all along: what exactly will be benefits of an analysis that claims *[əH] or *[əN] to really have been phonologically plain */H/ or */N/?

As far as I can tell, a lot about this hangs on the urge to group Indo-European ablaut alternations into neater patterns. And I won’t oppose that investigation — but I get the feeling that its proponents fail to show proper respect for the distinction between internal and comparative reconstruction. Alternations along the lines of *sek- : *sk-, *semk- : *sm̥k- certainly have a greater algebraic consistency, but it’s less clear to me if they could be presumed for PIE itself.

(Similarly it’s interesting how numerous introductions to PIE or some individual IE branch will outline laryngeal coloring as an “early sound change”, but neither outline the slightest amount of evidence for dating it as post-PIE, nor clearly assert that the assumed sound changes are pre-PIE, derived by internal reconstruction rather than by comparative evidence.)

So I could ask…: why would we even assume that the stage *s[ə]mk- is the innovation here? Cross-linguistically, the loss of reduced vowels is far more common than their insertion. Yet IE studies instead outline an amazing cornucopia of early epenthesis processes. Another look at the field also reveals several theories about the rise of zero grades from pre-PIE vowel reduction. Still for some reason it seems to have remained overwhelmingly difficult for scholars to put 2 and 2 together and to conclude that many of these “epentheses” are probably archaisms rather than innovations.

Tagged with: , , , ,
Posted in Commentary, Methodology

Early a-umlaut in West Uralic?

In a footnote to my previous post I passingly speculated that Finnic *ä-backing: *ä-ä > *a-ə (> late Proto-Finnic *a-i : *a-ë-) should perhaps be split in two phases: stem vowel reduction leading to a split from *ä-ə as an earlier stage, completion of the 1st-syllable vowel backing as a later stage.

I have already gathered some other evidence for this particular chronology, from the analysis of some forthcoming examples. But if I were to suppose for early Finnic an intermediate vowel *ȧ in these words, how should the situation be analyzed phonetically (or for that matter, phonologically)?

My initial thought was to posit a central vowel *ȧ (IPA [ä]). But this would have contrasted with both front *ä (IPA [æ]) and back *a (IPA [ɑ]); e.g. *särkə ‘roach’ : *sȧrńə (< *särńä) ‘ash tree’ : *śarwə ‘horn’. Such a crowded low vowel inventory is highly rare in the world’s languages.

But since I was also speculating that this *ȧ still induced front vowel harmony, perhaps a better alternative will be to reconstruct this as a fully open front vowel (IPA [a]). Contrasts between this and near-open [æ] are also rare, but this situation seems to be well salvageable by replacing the latter with an open-mid vowel *ɛ instead.

PU *ä in fact shows mid reflexes in most Uralic languages:

  • In Samic, *ä-ə yields *ē-ë (though *ä-ä still yields *ā-ē).
  • Erzya merges *ä with Proto-Mordvinic *e (from PU *i, *e) as /e/.
  • I’ve seen [ɛ] rather than [æ] reported for some dialects of Moksha, though I don’t have a clear picture on the exact distribution of this.
  • Mari reflects *ä as *e, which normally remains /e/ in all varieties.
  • Permic reflects *ä most often as a vowel that has been reconstructed as Proto-Permic *ɛ, which in turn yields Komi /ɤ/, Udmurt /o ~ e/. PP *e > Komi /e/, Udm. /o ~ e/ is also common. [1] Some cases show Proto-Permic *a > Udm., Komi /a/, but they’re rarer and tend to involve messier data. I suspect this last vowel was in origin a rare conditional allophone at best, later strongly reinforced by loanwords from various sources.
  • Hungarian reflects *ä as /ɛ/ ~ /eː/, the latter from Old Hungarian *ɛː.
  • Far Eastern, Southern and Northern Khanty reflect *ä as tense /e/ (conventional Proto-Khanty *ee).
  • All Samoyedic languages show a change *ä > *e. This looks like it would have to be dated as later than *e > *i (which does not apply to Nganasan), but the resulting “Late Samoyedic” *e is generally indeed realized closer to /ɛ/ than /e/.

Aside from Finnic, the only languages uniformly in favor of an open value are Mansi (*ä > *ää) and Surgut Khanty (*ä > reduced /ä̆/). The idea of an original open *ä thus rather starts looking as yet another Finno-centricism of Proto-Uralic reconstruction.

Suppose we consider Finnic and Ob-Ugric outvoted, and adjust the PU vowel system ever so slightly by reconstructing original *i *e *ɛ rather than *i *e *ä. This vowel-height inventory is well attestable from the world’s languages, and can also be encoded phonologically identically, with *ɛ as simply a [+open] vowel.

After the initial stage of *ä-backing in early Finnic, the inventory would be extended to four heights *i *e *ɛ *ȧ: a rarer setup, but again still quite well attestable (e.g. in English). To get from here to the attested Finnic setup, a counterclockwise mini-chain shift is required: *ȧ > *a [ɑ], *ɛ > *ä [æ]. The phonological makeup of this four-height system looks a bit more precarious, and may require assuming a feature like [+tense] making a fleeting appearence.


This all also has some unexpected synergy with the development of back open vowels in Western Uralic. I have already a good while ago outlined a defense of the following model:

  • Proto-Uralic had labial *å [ɒ] in the first syllable, illabial *a [ɑ] in the 2nd syllable.
  • In Proto-West Uralic, illabial *a in the first syllable arose thru three innovations:
    • *ë > *a in all positions (*sënə > *sanə ‘sinew’, *mëksa > *maksa ‘liver’)
    • *å-a > *a-a (*kåla > *kala ‘fish’)
    • in palatal environments, *å > *a (*wåjə > *wajə ‘butter’)
  • Remaining cases of *å later merged with *o in Samic and Mordvinic, with *a in Finnic (*śårwə > pre-S and pre-Mo *śorwa, pre-F *śarwə ‘horn’).

Assume now that the first point holds mutatis mutandis also in the case of front vowels: the PU vowel structure I mark as *ä-ä was not phonetically a fully harmonic setup either, but instead phonetically *[ɛ]-[a]. [2] This provides a great motivation for height assimilation to *[a]-[a]. Such a change could perhaps be assumed to have been common to pre-Finnic and pre-Samic, and also substantially demystifies the phonetic motivation for Finnic *ä-backing. (Regardless, it will still have to remain unclear why, on the Finnic side, the stem vowel was concurrently reduced to *ə; much like it remains unclear why *å-ə in Samic and Mordvinic yields *o-a rather than *o-ə.)

Some further similarities:

  • *ɛ-ȧ > *ȧ-ȧ is exactly parallel to *å-a > *a-a: a kind of sub-phonemic a-umlaut.
  • The Finnic shift *ɛ(-ə) > *ä(-ə) is closely parallel to *å(-ə) > *a(-ə): both constitute a shift of non-cardinal vowels towards more cardinal values (though the former change is sub-phonemic, the latter an actual merger).
  • The Samic shift *ɛ(-ə) > *e(-ə) is also closely parallel to *å(-ə) > *o(-a): both constitute a reduction of openness contrasts through raising. The former will have to be later than the merger of *e-ə with *i-ə — but as this change is shared with Mordvinic, dating it as quite early does not seem problematic to me. It may have begun e.g. as a push chain in early Samic, with the second merger then spreading to Mordvinic. (Indeed, perhaps also to Mari, where *e and *i seem to have identical reflexes across the line.)

Finally, one further interesting corollary of this model is probably that the split of *ä-ə and *ä-ä in Samic will end up being earlier than the a-umlaut of *e and *o to eventual ea and oa. This chronology will go quite well together with some other hypotheses of mine under work as well.

[1] As the two have seemingly identical outcomes in Udmurt, I suspect that their split might even be post-Proto-Permic.
[2] It would be also possible to reconstruct non-vowel-harmonic *ä-a = *[ɛ]-[ä], *å-a = *[ɒ]-[ä]. Despite vowel harmony being clearly reconstructible for both Proto-Finnic and Proto-Samoyedic, and at least probable for many of the branches in-between, I do not currently have a firm opinion on if vowel harmony existed in PU. There seem to be a number of indications that it could be late Turkic influence in at least (Hill?) Mari, Hungarian and Southern Mansi — but, on the other hand, all three have clearly been subject to reduction and loss of unstressed syllables, which could have already early on eliminated inherited vowel harmony (as also has happened e.g. in Livonian, standard Estonian, and dialects of Veps).

Tagged with: , , , , , ,
Posted in Reconstruction

Enter your email address to follow this blog and receive notifications of new posts by email.

Follow

Get every new post delivered to your Inbox.

Join 37 other followers