Two Lemmata: PU *ë, PMs *ee *ëë *oo

Not “lemma” in the usual linguistic “citation form” sense, but in the mathematical “intermediate result” sense. I’ve noticed having to clarify these topics at quite a few points, so here’s a single post for the purpose. I’ll keep it brief here, i.e. without going into detailed presentation of the underlying etymological material… though that could be arranged too, if someone so requests?

Proto-Uralic *ë

A back unrounded non-open vowel, contrasting with the more basic *a and *o, has been reconstructed for Proto-Uralic or Proto-Finno-Ugric at various times. Originally, this was motivated by the appearence of a back unrounded /ëë/ [ʌː ~ ɤː] in certain varieties of Mansi; and of corresponding /ïï/ [ɯ] in Eastern Khanty. A “new” such vowel was established in Janhunen ’81, [1] on the basis of a correspondence of Proto-Samoyedic *ë and *ï (in largely complementary distribution with each other) to West Uralic *a. PSmy *ë and *ï also correspond regularly to Ob-Ugric cases of /ëë/ or /ïï/ — hence the reconstruction of a distinct PU *ë rests now on quite firm ground. Further traces of this vowel can actually be identified in most Uralic languages west of the Urals.

There has been some uncertainty here ever since Janhunen’s paper, though. For reasons not fully elucidated, he prefers to reconstruct a close vowel *ï instead of a mid vowel *ë, although his actual evidence does not explicitly support a close value for this vowel. What arguments he does give are based solely on an (IMO mistaken) analysis of the PU 2nd-syllable vocalism, without addressing the situation in the 1st syllable. This problem has been only halfway addressed by the treatment in the other current-day key work on PU reconstruction, Sammallahti ’88: according to him, 1st-syllable *ï would have lowered to *ë at the level of “Proto-Finno-Permic” (whose existence I reject).

As a survey of the later reflexes will show, Sammallahti’s conclusion that most western Uralic languages point to *ë rather than *ï is correct. It must, however, be extended for Ugric and Samoyedic as well, which leaves no option but to reconstruct original *ë.

  • The languages of the West Uralic group (Samic, Finnic, Mordvinic) show a development *ë > *a in all positions, suggesting a relatively open value. *a does get further shifted to *ā > *ō and later yet diphthongized to *uo in Samic; but on the basis of Proto-Germanic and even Proto-Scandinavian loanwords, this can be seen to be a fairly late development. [2] Under certain conditions, the same process happens in Finnic as well. (An older PU/PFU *ō used to be reconstructed for these words during the mid 20th century, but this can be recognized as no longer necessary and would, at any rate, run into several difficulties in explaining the reflexes in the other Uralic languages.)
  • Mari and Hungarian also show the merger *ë > *a, but only before 2nd syllable *a. Possibly this can be analyzed as an assimilation development.
    Before 2nd syllable *ə, both languages have a distinctive reflex: Mari *ü, Hungarian *ï (> modern H í). Although these are close vowels, they in fact point to an original mid value: the original PU close vowels *i *ü *u are reflected in both languages as reduced *ɪ *ʏ *ʊ (> modern H short mid e ö o). In both languages, the unreduced close vowels normally derive from mid or open PU vowels under various conditions. [3]

    • *ä > *i in Mari in e.g. *äjmä > *imə ‘needle’, *lämpə > *liwä- ‘to warm up’
    • *e > *i (> í) in Hungarian in e.g. *wetə > víz ‘water’
    • *o > *u in Mari in e.g. *kota > *kuðə ‘house’, *oksa > *ukš ‘branch’
    • *o > *u (> ú) in Hungarian in e.g. *molə- > múlik ‘to pass by’
  • In Permic, *ë is normally reflected as *u. While a close vowel, this is also the default reflex of PU *a and *o, again suggesting a relatively open original value. Additionally, PU *u is reflected as *ï — so even if PU *ï were reconstructed, an intervening development *ï > *ë would still have to be assumed here to route this vowel out of the way of *u. [4]
    Under certain conditions, there is also a development *ë > *ë (e.g. *sënə > *sën ‘sinew’), which looks like a retention. All these words have *ə in the 2nd syllable; so perhaps the initial step of this split, too, was a lowering *ë > *a / _(C)Ca (and also in some other environments), later followed by *a >> *u.
  • Mansi reflects *ë as a long vowel *ëë. Even if we were to accept the currently commonly accepted reconstruction as a long close *ïï (see below), this regardless point to an original non-close value: the other PMs long vowels uniformly derive from PU open and mid vowels. The PU close vowels *i, *ü, *u are meanwhile uniformly reflected as PMs short vowels, even if they are also generally lowered. So again, even if PU *ï were reconstructed, we would have to posit a fairly early lowering to *ë, for this vowel to participate in the general lengthening of non-close vowels that seems to have occurred in Mansi.
  • The Samoyedic vowel split *ë > *ë, *ï cannot be a priori resolved in favor of either starting point: the stated conditions can be easily reversed. Janhunen ’81 suggests that *ë occurs in closed, *ï in open syllables, either of which would make a plausible environment for a vowel shift.
    However, there is circumstantial evidence against *ï as a starting point. The PU close *i and *u are split in Samoyedic as well: either retained as *i, *u, or reduced to *ə. Yet, they are not lowered to the corresponding mid vowels *e, *o. If PU *ï were reconstructed, the expected Samoyedic split would therefore be *ï ~ *ə, not *ï ~ *ë. [5]
    There are moreover some cases of *ï or *ë of irregular/unclear origin. These include no examples of close to mid development (*u > **ë or the like), but at least one mid to close development: *joŋsə > *jïntə ‘bow’ — perhaps a case of glide-induced coloring. One parsimonious explanation would be to assume here first *o > *ë, then *ë > *ï along the other cases. This depends on how we model the *ë/*ï split exactly, though, and since there are also examples of *u > *ï (at least *kuŋə > *kïj ‘moon’), it’s also entirely possible that the history here has been *o > *u > *ï instead.
  • The Khanty situation is complicated and does not seem to allow clear conclusions. The main reflexes seem to be *ïï and *aa (in a largely similar distribution as in Samoyedic). The other PKh close tense vowels [6] *ii *üü *uu generally go back to PU open or mid vowels, so the first reflex could be seen as a point in favor of original mid *ë. PKh open tense *ää *aa in other positions likewise also mostly go back to PU open or mid vowels.
    On the other hand: the PU close *i yields PKh mid tense *ee by default, and PU close *u can yield PKh mid tense *oo and *ɔɔ under certain conditions. PKh also conspicuously lacks a mid back unrounded *ëë. If PU *ë/*ï >> *aa went thru an intermediate *ëë stage (the lowering *ëë > *aa has direct parallels in the Mansi dialects in contact with Khanty), then this reflex could suggest an original close value.
    — Some newer reconstructions of Proto-Khanty posit *ä and *a in place of *ee and *oo, though. It would be possible to suggest that the last was actually labial *å and to argue that *aa < *ëë < *ë < *a, to maintain my previous idea in place. But alternately, deriving /oo/ from *a, as found in the generally fairly conservative Far Eastern and Far Northern Khanty, would seem to make better sense, if the development were *a > *aa > *oo; and if so, original “*aa”, unaffected by these changes, would have to have been *ëë at this time. And we’d then be back to a similar argument as seen with Mansi: PU close > PKh lax, vs. PU open/mid > PKh tense.

The evidence thus seems quite clear: reconstructing mid *ë is preferrable to reconstructing close *ï. Some of the Khanty evidence may point to *ï, but given the complicated history of Khanty vowels, this should not count as decisive.

Typologically, the reconstruction of mid *ë without a close counterpart *ï is also unproblematic. A similar situation can be observed in e.g. Votic and Estonian, with only ‹õ› /ɤ/ [7]; many dialects of modern English, with an open-mid /ʌ/ for ‹u› in words like strut, and even a rhotic counterpart /ɚ/ in words like nurse; or Bulgarian, with ‹ъ› /ɤ/, descending from Proto-Slavic *ъ /ʊ/.

True, there are also a great many languages with a superficially unpaired non-open back unrounded vowel. Yet such languages tend to have simple vowel inventories along the lines of /i ɨ u e a o/, where /a/ can be analyzed as the open counterpart of /ɨ/! The same applies to the pan-Turkic vowel system /i ü ı u e ö a o/ (/ı/ might vary from [ɨ] to [ɯ] I think; any Turkologists passing by are welcome to set me straight). OTTOMH the only language that would have both a three-degree height contrast, and an ï-type vowel without an ë-type one, is precisely Eastern Khanty.

Proto-Mansi long mid vowels

The PMs vowel system is normally reconstructed as contrasting two degrees of both height and length. The long vowels comprise five units: the open vowels *ää, *aa, and the non-open vowels *ee, *ëë, *oo.

What I write here as *ee and *oo have been traditionally reconstructed as close *ii and *uu. *ëë has moreover been reconstructed as *ïï since Honti ’82, including many default reference works such as Sammallahti ’88.

While I agree with the idea that these three vowels should be treated as a single set, I belive Honti got this adjustment the wrong way around. This is because the majority treatment seems to be mid values:

  • *ee: Reflected as mid /ee/ in most varieties of Mansi. Close /ii/ is found in most positions in Southern Mansi, and in a couple of words also Western and Eastern Mansi.
  • *ëë: Reflected as mid /ëë/ or open /aa/ in all varieties of Mansi.
  • *oo: Reflected as mid /oo/ in Southern Mansi, but as close /uu/ in the Core Mansi varieties (West+East+North). I assume the latter value is due to a chainshift: PMs *aa shifts to /oo/ in these same varieties.

Etymology also supports mid values for these vowels. *ee is a reflex of PU *e under unclear conditions; *ëë is the main reflex of PU *ë (which I hope to have just now established as indeed a mid vowel); and *oo is a reflex of PU *a and, probably under some conditions, PU *o. It strikes me as terribly inefficient to assume that these vowels first became close, then proceeded to again become mid vowels widely across the Mansi varieties.

Then there are the known general principles of length/height interaction in vowel shifts:

  • Long vowels tend to be raised
  • Short vowels tend to be lowered
  • Open vowels tend to be lengthened
  • Close vowels tend to be shortened

…which come into action particularly well in vowel shifts involving general restructuring of the vowel system. [8] I can think of tons of examples (e.g. pretty much everything relevant that happens during West Uralic > Proto-Samic), while counterexamples are much rarer. [9] These in mind, it is already a priori preferrable to reconstruct any unconditional /ii/ ~ /ee/ or /uu/ ~ /oo/ correspondences from original *ee or *oo, not *ii or *uu.

Summing up, everything seems to check out: *ee, *ëë, *oo is a superior reconstruction equally well from the viewpoint of the attested Mansi varieties; the viewpoint of Proto-Uralic; and the viewpoint of typology of sound change.

—Note however that I am only arguing about phonetical reconstruction here. Phonologically speaking, I have nothing against an analysis according to which these vowels would have been distinguished from *aa and *ää by being simply [+close]. Yet, seeing how the Latin letters ‹i u› very much suggest non-mid values, we’d be better off using the available mid vowel base symbols ‹e o› instead. In my opinion broad transcription generally ought to be user-friendly rather than maximally adherent to any particular theory.

[1] Please cf. the newly published Bibliography page!
[2] E.g. Proto-Germanic *wētjō- > Proto-Scandinavian *wātjō- → pre-Proto-Samic *waććo > Proto-Samic *vōććō > Northern Sami vuohčču ‘bog’.
[3] Hungarian also has some long close vowels representing older *VwV sequences: eg. *ńomala > *ńowɜl/*ńuwɜl (?) > nyúl ‘hare’, *täktɜmɜ > *tätɜw > tetű ‘louse’.
[4] Technically, a labiality detour could also be arranged: *ɨ *u > *ɯ *ʉ > *u *ɨ? But this seems contrived — not the least for requiring an intermediate stage during which there are two non-front close vowels around, neither of which is [u].
[5] There is some uncertainly in this argument though, since no lowered reflex of the 3rd PU close vowel *ü is found — neither as *ə nor *ö. For that matter, the other PU mid vowels *o and *e don’t quite match the behavior of *ë either: *o splits “downward”, to yield *å~*o; while *e stays around as is (becoming /i/ later on in most Samoyedic languages, but per the evidence of Nganasan, not yet in Proto-Samoyedic).
[6] I will remind that although I use “single”/”double” transcription for Proto-Khanty vowels, just as also for e.g. Mansi and Finnic, this does not indicate a length distinction, but instead one of tenseness: the more numerous “double” vowels are the unmarked ones.
[7] A corresponding close y /ɯ/ has developed in South Estonian, but this is a later innovation. Livonian has similarly later expanded its set of vowels by ȯ, described as /ʊ/ or /ɯ/ in different sources.
[8] Conditional splits in the vowel system: umlauts, coloring effects, length changes due to prosodic factors… are a different issue.
[9] Though not nonexistent: two cases that come to mind are Northwest Germanic, where *ē > *ā but *e ≡ , and late Proto-Slavic, where *a > *o but *ā > *a.

Tagged with: , , , , , ,
Posted in Reconstruction
16 comments on “Two Lemmata: PU *ë, PMs *ee *ëë *oo
  1. David Marjanović says:

    the pan-Turkic vowel system

    For Proto-Turkic, two more vowel phonemes are reconstructed: *e (as opposed to the more common *ɛ) and *ʌ.

    but on the basis of Proto-Germanic and even Proto-Scandinavian loanwords, this can be seen to be a fairly late development.

    Could the “Proto-Scandinavian” ones have been Proto-Northwest Germanic? For the example you give in footnote 2, it’s impossible to tell.

    • j. says:

      For Proto-Turkic, two more vowel phonemes are reconstructed: *e (as opposed to the more common *ɛ) and *ʌ.

      Interesting. Do you happen to know if these require Chuvash or Siberian Turkic evidence to recover, or if the “mainstream” Turkic languages suffice? I don’t recall hearing about any of them having a separate ë-type phoneme of any sort.

      Could the “Proto-Scandinavian” ones have been Proto-Northwest Germanic? For the example you give in footnote 2, it’s impossible to tell.

      Many could, I’m sure. I’d have to look this up but I think *wātjō only turns up in the meaning “bog” in dialectal Swedish vät, so perhaps we should not project this loan further back than necessary.

      • David Marjanović says:

        Do you happen to know if these require Chuvash or Siberian Turkic evidence to recover, or if the “mainstream” Turkic languages suffice?

        They don’t; it’s the comparison of them (merger with */ɑ/) to Chuvash and Yakut (other Siberian languages aren’t mentioned; merger with */ɯ/).

        perhaps we should not project this loan further back than necessary

        Oh. I agree.

  2. David Marjanović says:

    late Proto-Slavic, where *a > *o but *ā > a

    Also Standard Hungarian: a /ɒ/, á /aː/. But examples of a shift in the opposite direction are more common.

    • j. says:

      No, that’s not an exception: /ɒ/ is still a low vowel that remains in contrast to mid /o/. Unlike raising, the fronting/labialization of a-type vowels can really swing either way just as well.

      The other unconditional vowel shifts from Old Hungarian to Modern Hungarian are actually a very clean example of the Labovian principles:

      1. Short *o is lowered to /ɒ/
      2. Short *ɪ *ʏ *ʊ are lowered to /e ø o/
      3. In the central dialects (incl. standard Hungarian), short /e/ is further lowered to merge into /ɛ/
      4. Long close *iː *yː *uː are in several positions shortened to /i y u/
      5. In most dialects, long *ɛː is raised to /eː/
      6. Long *eː is in the northern dialects raised to /iː/
  3. crculver says:

    For reconstruction of both a lower e and higher ė in Turkic (or a lower ä and higher e, if you prefer that notation), see Róna-Tas’s chapter in Routledge’s The Turkic Languages, p. 70, and for much greater detail, the Фонетика volume of the Сравительно-историческая грамматика тюркских языков series, pp. 18–23. The latter book can finally be had from the Helsinki university library, as I bought my own copy after keep the library book borrowed for years and years.

  4. crculver says:

    David, what is your source for reconstruction of *ʌ in Proto-Turkic? I can find no proposal of the sort in my home library?

    • David Marjanović says:

      Anna Dybo’s chapter on Turkic (3.1; 136–149) in the 270-page preface of An Etymological Dictionary of the Altaic Languages (S. A. Starostin, A. V. Dybo, O. A. Mudrak, “with the assistance of I. Gruntov and V. Glumov”, 2003; Brill). I can send you the pdf if you let me know an e-mail address.

      I misremembered the transcription; it’s *ạ, not *ʌ. No exact phonetic value is explained, but everything from [ʌ] to [ɤ] makes sense.

      Quote from p. 136–140:


      The phonological section of this introduction would be incomplete
      without an account of phonological developments in each of the Altaic subgroups. Although for the most part we use traditional reconstructions and correspondences, there are also some innovations presented and some points to discuss. Therefore we give below a short outline of the comparative-historical phonology for each of the subgroups of Altaic, as currently perceived by the authors of the dictionary.

      3.1. Turkic (by A. Dybo)

      The system of Proto-Turkic accepted in this dictionary looks like this:
      p b -m
      t d s -n- -r-, -l
      č j -ń- -ŕ-, -ĺ
      k g -ŋ-
      Vowels of the first syllable:
      i ü ɨ u
      e ö ạ o
      e a
      All the vowels could be short or long.
      Vowels of other syllables:
      I U
      The row of any non-first vowel (front or back) depended on the row
      of the vowel of the first syllable, thus producing seven (eight?) vocalic allophones:
      i ü ɨ u
      e ö a (o)
      The back -o- is actually not attested, but it may be perhaps reconstructed in some auxiliary morphemes.
      Thus, the reconstruction is almost completely traditional, with only
      the following modifications:
      7. For a detailed account of the reflexes of Turkic vowels in Chuvash
      see Мудрак 1993, Мудрак Дисс.
      9. One of the most complicated problems in Turkic reconstruction is the distinction of open/close *e vs. *ẹ, *a vs. *ạ.
      Close *ạ was reconstructed by O. Mudrak (see Мудрак 1993, Муд-
      рак Дисс.) for the correspondence Turk. a – Chuv. ɨ, Yak. ɨ. Let us mention that Yak. can also have a secondary -ɨ- < *a in front of -j-, cf. ɨj ῾moon’, kɨj̃at ῾wing’, ɨj- ῾show, describe’.
      As to the reconstruction of *e and *ẹ, no final agreement has been
      reached so far. […]
      11. Proto-Turkic and most modern Turkic languages possess the so-called vowel harmony: all words are subdivided into “front” (with
      vowels *i, *e, *ẹ, *ü, *ö) and “back” (with vowels *ɨ, *a, *ạ, *u, *o). The vowel of any non-initial syllable has to be “harmonized” with the vowel of the initial syllable.


      Pp. 148–149 are a table of vowel correspondences. *a is reflected as a everywhere except in Chuvash, where it’s “In the Upper dialect o, in the Lower dialect and in literary Chuvash – u; u in all dialects adjacent to the reflexes of *g and *b”, in Uzbek, where it’s sometimes ɔ (“Details see in Мудрак 2002.”), and in modern (but not Old) Uyghur, where it becomes e in the positions where “Uyghur Umlaut” operated (“before ä, i in the second syllable”). *ā is reflected in the exact same ways, except that length is retained in Yakut, Turkmen and Khalaj. *ạ merges into *a everywhere except in Yakut and Chuvash, where it becomes ɨ; *ạ̄ is identical except for, of course, retaining phonemic length in Yakut, Turkmen and Khalaj.

      I am not qualified to judge how convincing this is.

      References cited in the quotes (I haven’t read any of them):
      Мудрак О. А. Исторические соответствия чувашских
      и тюркских гласных: Опыт реконструкции и интерпретации.
      М., 1993.
      Мудрак О. А. Развитие тюркского а в узбекском язы-
      ке. // Алтайские языки и восточная филология. К 80-летию
      Э.Р.Тенишева. М., 2002
      Мудрак О. А. Обособленный язык и проблема ре-
      конструкции праязыка. Диссертация на соискание ученой сте-
      пени докт. филол. наук. М., 1994.

      • crculver says:

        Dybo and her circle in Moscow are well out of the mainstream.

        • David Marjanović says:

          OK, that’s good to know.

          What is the mainstream explanation for the cases (other than the “secondary” ones in front of /j/) where Chuvash and Yakut /ɯ/ corresponds to the /ɑ/ of “mainstream Turkic” (including Khalaj, which is definitely interesting)?

          • crculver says:

            In the aforementioned Фонетика volume this is covered on pp. 57–65. In Chuvash, this is almost always a result of delabialization of /u/ (the regular reflex of Proto-Turkic /a/), and that this is a recent development can be seen from the fact that it is attested in some loanwords from Tatar; that Chuvash dialects occasionally differ in whether the vowel is /ï/ or /u/, and that some Volga Bulgarian attestations show /u/,/o/ for Chuvash /ï/.

  5. David Marjanović says:

    Sorry for the many line breaks I managed to overlook. PDF copypasta is hard to work with.

  6. David Marjanović says:

    [2] E.g. Proto-Germanic *wētjō- > Proto-Scandinavian *wātjō- → pre-Proto-Samic *waććo > Proto-Samic *vōććō > Northern Sami vuohčču ‘bog’.

    Another argument for a late formation of this word seems to be the absence of Sievers’s law (**wētijō-, which I suppose couldn’t have produced PS *ćć).

    • j. says:

      I’m not sure if Siever’s Law provides a lot of evidence for the dating: it is really quite persistently not well-reflected in loanwords into Finnic and Samic (there is a 1986 paper by Koivulehto on loans from Germanic, though I have not read it). Even many apparent examples could be accounted by Finnish’ “own” Siever’s Law: e.g. Fi. kallio ‘rock, bedrock’ < Gmc. *xallijōn sure looks like a PF *kalli(j)o, but Estonian kalju and especially Votic kaľľo (in contrast to e.g. *lapidV > Vt. lapia ‘shovel’) instead suggest *kalljo or even simply *kaljo.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Enter your email address to follow this blog and receive notifications of new posts by email.

%d bloggers like this: