Etymologically opaque Votic words

For later reference, here’s a collection of etymologically opaque (to me) Eastern Votic words harvested from my new dictionary. I will not attempt any detailed analysis yet. (Presumably some investigation into Russian, Ingrian, Estonian, maybe even Latvian & German could turn up known cognates for many of these.)

  • aimo ‘carbon monoxide’
  • alëtsë ‘mitten’
  • hilkeä ‘ugly’ — if this is not a hypercorrect cognate of Finnish ilkeä ‘evil’.
  • hulkkuag ‘to travel’
  • hülpeä ‘disobedient’
  • ikolookka ‘rainbow’ — a compound based on lookka ‘bow, curve’, but the 1st element is unclear.
  • jahsaag ‘to take off shoes’ — does not seem related to Finnish jaksaa ‘to have energy for’.
  • kaaliag ‘to lick’
  • kaputta ‘sock’
  • kineri ‘melted fat’
  • koltši ‘old-fashioned ladle’
  • kosma ‘hair’
  • lainatag ‘to swallow’ — does not seem related to Finnish lainata ‘to borrow’.
  • lautta ‘cowshed’ — does not seem related to Finnish lautta ‘raft’.
  • liblo ‘oat awn’
  • linnaasëd ‘malt’
  • lohko ‘soup’
  • lühtši : lühdže- ‘pail’
  • läntü ‘milk’
  • mauttši ‘intestine’
  • naka ‘cask spigot’
  • nakliska ‘some part in a sleigh’ (“the informant is unable to explain what exactly”)
  • nëikko ‘rockable cradle’
  • nättšelikko ‘burdock’
  • nätši ‘uncooked (of bread)’
  • nättü ‘rag’
  • ootava ‘cheap’
  • pallo ‘pigeon’
  • pelssimed ‘loom’
  • peltta ‘leftovers of threshing’
  • pihta ‘shoulder’
  • pilpa ‘dandruff’
  • pärähmä ‘fathom, armful’
  • raaka ‘twig’
  • ramitsaag ‘to limp’
  • ratiz ‘granary’
  • rehnüüz ‘entrance hall’
  • rehtilä ‘griddle’
  • ringuttaag ‘to stretch’
  • ripa ‘footwraps’
  • ripila ‘fireplace poker’
  • rooppa ‘porridge’
  • rootšiag ‘to dig, to rummage’
  • ruttaag ‘to hurry’
  • śalko ‘foal’
  • servä ‘edge’
  • sippelikko ‘ant’
  • sisava ‘nightingale’
  • sultsiag ‘to wash’
  • surmukaz ‘relative’ — probably not derived from surma ‘death’?
  • säblä ‘kitchen hook’
  • šitinka ‘bristle’
  • šlotta ‘slush’
  • taari ‘ale’
  • tahtši : tahdžë- (!) ‘chaff’
  • tauttaag ‘to take’
  • tiheh ‘mosquito’
  • turvaz : turpaa- ‘ladder’
  • tuutikko ‘washbundle’
  • türü ‘food comprising breadcrumbs mixed with milk or water’
  • tšiutarë ‘coldroom’
  • tšiutto ‘shirt’
  • tšäppeä ‘beautiful’
  • uhër ‘auger’
  • unka ‘wooden cup’
  • upa ‘bean’ = Est. uba.
  • ursi : urtë- ‘bed curtain’
  • vaattaag ‘to look’
  • valo ‘dung’
  • varo ‘hoop’
  • veelatag ‘to soak’ — compound with vete- : vee- ‘water’?
  • vokki ‘spindle’
  • väitšiäg ‘to call’
  • ördžähtäässäg ‘to wake’
Tagged with: , , ,
Posted in Uncategorized

A potential Turkic-Yukaghir loanword

A project I am working on and off is compiling lexical parallels that have been proposed in connection to various proposed external relationships of Uralic. Occasionally this kind of work turns up nice new etymological insights.

One of the best-retained — and also one of the more specific — verbs of motion reconstructible for Proto-Uralic is *kälä- ‘to wade': reflected in e.g. Northern Sami gállit ‘to wade’, Finnish kahlata ‘to wade’ (an old loan from Samic), and Hungarian kel ‘to rise’. (The meaning ‘to rise’ is found also in Mansi and Khanty; the latter also has ‘to step up on land’.)  This has been compared with the Yukaghir verb *kel- ‘to come’. The pairing is phonetically OK, but semantically it does not seem impressive. It might be acceptable if a relationship between Uralic and Yukaghir were already established, but it offers hardly any evidence for a relationship in the first place.

Interestingly enough, the same Uralic verb has also been compared with Turkic *gel- ‘to come’ — with the exact same semantics and an equally compatible phonetic shape! (E.g. already Björn Collinder in Fenno-Ugric Vocabulary, 1955/1977, reports both comparisons.) Probably the first step here should be to analyze the Yukaghir word as a loan from Siberian Turkic, and worry about any possible Uralic relationships later.

I would predict that pitting the Uralo-Yukaghir and Ural-Altaic hypotheses against each other may turn up further cases like this where a straightforward loan etymology is available. It’s already been noted by Rédei in his “Zu den uralisch-jukagirischen Sprachkontakten” (1999, in FUF 55) that many of the Uralic-Yukaghir lexical parallels extend to some of the “Altaic” languages as well…

Tagged with: , , , , ,
Posted in Uncategorized

Statistical etymology: A Votic example

I have last Friday picked up a dictionary of the Mahu dialect of Eastern Votic (Castreanianumin toimitteita 27, 1986), based on Lauri Kettunen’s collections from about a hundred years ago. [1]

This is not a particularly huge book, with only about 150 pages of lexical data, set in a relatively large monotype font, too. It probably won’t be of much use if one wished to e.g. translate Firefox into Votic. Its usability as tourist dictionary might be limited as well (even if we ignore the sad fact that Votic is hard moribund, with only some dozens of speakers left). But it seems like a good reference for a linguist wishing to make some contact with the language. Or: a handy unit of data for a linguist wishing to understand the lexical structure of languages.

The lexicons of natural languages are not random in their makeup. Phonemes have differing frequencies of occurrence in different positions of words; and different tendencies of combining with each other. And although one can certainly find linguists who will attempt to offer explanations in terms of elaborate synchronic phonological constraints and preferences, I find this a fundamentally flawed approach. [2] Much more often, any patterns evident in the lexicon are best understood as the fossilized results of historical processes: sound changes, loanword strata and evolving standards of sound-symbolic conventions. The study of a language’s lexicon even at a single point in time will likely turn up insights into its history.

For this type of analysis, this Votic dictionary actually seems like a rather good sample size. The lexicon of any major literary language would be both overwhelming in size (possibly thousands of pages); as well as swamped with recent cultural loanwords (if you happen to find a word shaped approx. like /banana/ or /platinum/ in a given language, this will not tell you much about its prehistory). Neither of these problems is apparent here, and it’s possible to focus on the big picture without getting stuck on data wrangling. On the other end, a simpler list yet of say 100 words, whether artificially truncated or recorded in passing in 1820 from some now-extinct language, would not allow for many statistically significant conclusions at all.


A simple starter example: the Finnic languages have, originally, not contrasted voicing in obstruents (as was the case already in Proto-Uralic). This situation still remains in place in Estonian, Northern Karelian, and dialects of Finnish. Votic, however, sits on the side of the siblings to have fully embraced voicing, and contrasts voiced and voiceless versions of all obstruent consonants: /p t tš k f s š/ ≠ /b d dž g v z ž/. Suppose we were to hand a copy of this dictionary to a linguist who’s never worked with Finnic before. Will they be able to uncover this older constraint?

The answer seems likely to be “yes”. Only minor etymological analysis is required — which the dictionary itself provides, even. The lexemes in the dictionary are glossed in both Russian and Finnish, the two major contact languages of Votic. Additionally, several words identifiable as recent Russian loans are indeed so marked. This allows an initial separation of the lexicon to two mostly disjoint layers: those of Finnic vs. Russian background. (Though of course Finnish has some Russian loanwords as well, and small amounts of words whose origin is not immediately obvious can also be found.)

A look at words beginning with voiced obstruents other than /v/, as well as words beginning with /f/ shows that they, as a rule, belong in the Russian layer. This is a small set to begin with, and after this cleanup, no more than seven counterexamples remain:

  • balalaittaag ‘to gossip’
  • bëëg ‘isn’t’
  • borissag ‘to bubble’
  • bulissag ‘to bubble’
  • börö ‘ironing board’
  • däädi ‘some relative’
  • filissaag ‘to whistle’

So we have four onomatopoetic verbs, one unstressed particle, one nursery word, and one fully legit content word. This is not sufficient evidence to postulate the voicing contrast to be original in the initial position, not when evidently inherited words beginning with /p t tš k s v/ number multiple hundreds altogether. [3]

A more detailed examination would find that medial voiced consonants other than /v/ can similarly be shown to be secondary — they occur as the consonant gradation alternants of the voiceless ones. Exceptions, as a rule, again occur only in Russian loans and probably some onomatopoeia. The full details would be more difficult to dig up though, so I am leaving this as an excercise for the interested reader. ;)

[1] In case anyone else is interested, some overflow stock of these from dunno where is still up for grabs at the University of Helsinki’s Dept. of Finno-Ugric Studies (Metsätalo/Unioninkatu 40, 4th floor).
[2] This may not be an entirely fair comparison, but… I have in mind the image of a “generative geologist” attempting to locate physical constraints present in gneiss or sediment that force its minerals to hold a macroscopically banded rather than homogenous structure.
[3] I will not dwell on /š/, also mainly a loanword phoneme.

Tagged with: , , , , , ,
Posted in Uncategorized

Interplay of minor soundlaws: Samic glide clusters

Shifting and widening my scope a little, here’s a look into the history of two consonant clusters across the Samic languages as a whole.

The two-glide cluster *-jv- is a simple place to start. The development of this is straightforward: this is retained essentially intact everywhere across all Sami varieties. (If you want to have a look for yourself, I am including copious links to the Álgu database in this post.) Possibly the coda *j may have been vocalized into the 2nd component of a diphthong/triphthong, but this is basically trivial.

A small further complication still comes up in Southern Sami. Two words have here a seemingly irregular *-jj-: *oajvē > åejjie “head, end”; and *peajvē > biejjie “sun; day”.

Both of these happen to be are inherited Uralic words, with cognates stretching all the way to Samoyedic. So my first reflex was to go “a ha! does this mean that the words showing /jv/ are therefore newer loanwords?” The answer is “no”, though: at least *koajvō- > gåajvodh “to dig” is of equally ancient pedigree. But I think I can dial this hypothesis back a little. Perhaps the shift *jv > *jj occurred due to the following front vowel *-ē (in Southern Sami characteristically diphthongized to /ie/ even in the 2nd syllable). This seems phonetically plausible & drops the number of counterexamples from half a dozen to one: *vājvē > vaejvie “pain”. This last word is in turn a known Finnish loanword, which may have indeed diffused into Southern Sami at a late date.

This idea seems to be preliminarily further supported by an interesting derivative of “head”: åajvadidh “to advice”. My chops in SS historical morphology are insufficient to present an implicit PS reconstruction, but we can clearly see here at least a retained stem vowel *ā, a regular feature before 3rd syllable *ë; in other positions this was further raised to *ē already in PS. And before this lower vowel, *-jv- survives after all.


Now let’s consider the opposite PS cluster *-vj-. This turns out to have had a much more complicated history.

Three Sami varieties have completely regular development. Lule Sami and Ter Sami have in all involved words metathesized this cluster, merging it with *-jv-. Inari Sami has always retained /vj/. Northern Sami also might belong here, depending on who you ask: the Álgu database claims /jv/ in a single word *sāvjë > sájva “isolated lake”, while my copy of Yhteissaamelainen sanasto presents sawˈjâ (= equivalent to sávja in the current NS orthography). I would guess that there are dialectal differences involved? FWIW, Sammallahti in The Saami Languages claims that at least the Torne Sami dialect group “originally” belonged with Lule Sami rather than Northern Sami. [1]

In a couple of other varieties, it is also possible to state a mostly applicable rule. Pite Sami aligns with its sibling Lule in having -jv- everywhere except in *jēvjë > jievja “white reindeer”; while Skolt and Kildin Sami align with Inari Sami in having -vj- everywhere except in *ćōvjë > Sk čuõivâk, K čuəivex “grey reindeer”. Probably these sorts of exceptions again represent loaning from neighboring dialects. [2]

Southern Sami again shows a few more complications; as does the neighboring Ume Sami. Covering SS first, metathesized /jv/ occurs in two words: *tāvjā > daajvaj “often”, *sāvjë > saajve “gnome”. Unmetathesized /vj/ is found in three: *jēvjë > joevje “light grey reindeer”, *jēvjë- > joevjeme “beard moss” (don’t ask me what’s the oe doing in these), *vōvjē > vuevjie “wedge”. Lastly, an assimilated /jj/ is found in *ćoavjē > tjåejjie “stomach”. This appears to confirm the assimilation rule I proposed in the 1st section: v > j / j_ie. Provided that we assume the metathesis *vj > *jv to have occurred before this…

The Ume Sami reflexes seem to support this last assumption. Although not many of the involved words have been recorded from here, /jv/ is found in those lexemes that have SS /jv/ ~ /jj/: dàivài “often” and tjåìvee “stomach” — while /vj/ is found in those that have SS /vj/: jauja “grey reindeer”, vyöyjee “wedge-shaped patch”. There is also one word with a somewhat baffling three-glide reflex: guyvjas “grey reindeer” (with unetymological /g-/ to boot). [3]

How should this distinction between S+U-metathesizing and S+U-unmetathesizing *-vj- be accounted for? Could this be etymological somehow? An interesting fact is that *vōvjē “wedge” is one of the Samic words showing lenition of original coda *k before sonorants (as shown by the Finnic cognates: e.g. modern Finnish vaaja, Karelian voakie, Livonian vaigā < PF *vakja). [4] So, perhaps this change occurred only after the metathesis of inherited *vj to *jv in Southern and Ume Sami? A late date for the change has already been suspected:

This sound change cannot be reliably dated, but it may well have taken place during a relatively late phase of Proto-Saami.

(Aikio 2006: 3.11 §) [5]

With this interpretation, a “maximally hereditary” chronology would be:

  1. Lenition *kj > *ɣj in Finnish.
  2. Samic *tāvjā “frequent” is loaned from Finnish *taɣja.
  3. Metathesis *vj > *jv in South & Ume.
  4. Lenition *kj > *vj all across Samic.
  5. Samic *jēvjë “white reindeer” is loaned from Germanic.
  6. Assimilation *v > j / j_ie in South. — Metathesis *vj > jv in Pite, Lule & Ter. — Raising *eu > *iu in Germanic.
  7. Samic *vājvē “pain” is loaned from Finnish vaiva.

…But is it a good idea to attempt maximizing the degree to which various Samic words would have been inherited from a common ancestor? I think it is important to keep in mind that fresh loanwords readily diffuse across dialect continua.

As for the particular downsides of the abov scenario, at minimum I am uncomfortable assuming that the specifically Finnish change *kj > *ɣj occurred earlier than the supposedly Proto-Germanic change *eu > *iu / _j. [6] OK, it’d be possible to go on making some cleanup assumptions; e.g. that in the numerous newer Germanic loans in Finnish where *ɣj can be reconstructed, this was substituted for original *kj; or perhaps, that the /k/ ~ /g/ found in the other Finnic languages would be a reversal from *ɣ; but this would all be for no other reason than ensuring a Proto-Samic ancestry for SS daajvaj, US dàivài. We could instead assume that S+U acquired these words from the direction of P+L, and show /jv/ for this reason.

This should also call into question whether my step 3 above existed at all. *sāvjë “gnome” (elsewhere in Samic also with meanings like “underground water”, “lake with an underworld entrance”, “isolated lake”) seems like a potential cultural loan from the P+L direction at least. It is of Germanic ultimate origin, but seems to have acquired its mythical flavor only on the Sami side: the PGmc root is simply *saiwiz “lake”.

Note moreover that this loan etymology actually predicts PS *-jv-, not *-vj-! And yet there is no evidence for the inverse metathesis *-jv- > *-vj- to have regularly occurred in any Samic variety. So are we therefore forced to furthermore conclude that this word was originally adopted specifically in the Pite/Lule area, and hypercorrectly metathesized to *-vj- when loaned eastward from these varieties? The Southern /jv/ could similarly also turn out to be original after all.

This leaves just the question of *ćoavjē “stomach”. Relationship to Samoyedic *t¹äjwə “stomach” has been proposed. The initial consonant, vowel frontness, and glide cluster order all fail to match, though, so I suspect this is only an accidental resemblance. I could just as well propose that the Samic word is a metathesis from something like earlier *voaćjē, and therefore related to Finnic *vacca “stomach”? (Ha ha.) With the case for inheritance being in this shape, I don’t think it would be too much of a problem to assume that here, too, the S+U forms have been loaned from the direction of P+L. — But still early enough to have participated in cluster smoothing in SS, apparently.


An additional topic to ponder at this point would be the motivation of the metathesis *-vj- > *-jv-, which altogether appears to be attested in at least two widely separated parts of the Samic dialect continuum. Pite and Lule Sami are spoken in northern Sweden and adjacent areas of Norway (also Finland if we count Torne Sami), Ter Sami at the eastern end of the Kola peninsula. It seems unlikely that these groups have been in any direct contact with each other since Proto-Samic times. It also seems unlikely that this incredibly specific metathesis was purely coincidentally innovated in both. One possibility might be some kind of a phonological precondition for this change having existed already in Proto-Samic, which in only two areas led to the change running to completion?

A better solution though might be a common external source. This exact same metathesis happens to be known furthermore from the Finnic languages! Late Proto-Finnic allows no *-vj- (or *-Vuj-/*-Vüj-: we are better off reconstructing diphthongs rather than coda glides at this date), and although no words with PU *-wj- have been retained in Finnic, a number of loanwords allow reconstructing a metathesis here. E.g. PGmc *flauja- → Finnish laiva “ship”. [7] Metatheses of some other similar clusters including older *-wr- (PS *jāvrē ~ LPF *järvi “lake”) are also found, which suggests that this type of change originated in Finnic, and might have been in the case of *-vj- > *-jv- passed on to Samic.

Still, why just these specific varieties? The Lule Sami probably had numerous connections with Finnic traders and settlers in the Torne Valley and adjacent areas since a much older period than the Finnmark/Inari/Skolt/Kildin Sami living further inland, that much is clear. Yet should we expect this shift to have therefore also been also present in the extinct “southeastern” Sami varieties such as the marginally attested Kemi Sami?

Particularly difficult to understand is Ter Sami. I do not think we even know at present whether the Kola Sami languages developed entirely in situ, or if they may have spread to Kola from e.g. the southern reaches of the White Sea, some of their characteristic features already in tow? The presence of this sound change might demand, at minimum, for Ter to descend from a dialect that was originally spoken further south than the corresponding ancestral dialect of Kildin…

[1] One wonders how and why may we claim that it no longer does; or whether we are to conclude that “Northern Sami” is an areal entity rather than a genealogical one.
[2] I wonder if these last two words have some relation to each other. The semantic closeness is obvious, and the consonant skeletons are quite similar as well. The proposed etymology for *jēvjë is loaning from (pre-?)Proto-Germanic *heuja- “hue”, and the Germanic *h- moreover comes from PIE *ḱ-. Wiktionary mentions here e.g. Lithuanian šývas “white”. Could any of the Satemic cognates have plausibly been loaned to yield pre-Samic *ćawjə or *ćowjə “grey”?)
[3] Or could this indicate a substitution *ḱ- >*k-, from some non-Satem variety? Perhaps not, since this would be chronologically problematic and there are other known examples of irregular *ć- > *k- in some varieties of Sami.
[4] Also by the word’s etymology as stemming from Baltic: cf. Lithuanian vagis, Latvian vadzis. For more details cf. Itkonen, Terho (1982): Laaja, lavea, lakea ja laakea. In: Virittäjä 86.
[5] Aikio, Ante (2006): On Germanic-Saami contacts and Saami prehistory. In: SUSA 91.
[6] I actually suspect this was “only” Northwest Germanic, given how Gothic shifts *e to *i always anyway. More details to come on this point later though. At any rate this would still not be a huge chronological relief.
[7] For further details cf. Koivulehto, Jorma (1970): Suomen laiva-sanasta. In: Virittäjä 74.

Tagged with: , , , , , , ,
Posted in Uncategorized

Notes on Eastern Sami vowel history, part 2

(← Part 1)

For initial details: a few complications involving *i and *ë.

In the Kola Sami branch (Kildin & Ter Sami), the default reflex of PS *i seems to be /ï/. (I dunno if this is [ɨ] or [ɯ], though I’m relatively sure that this isn’t really relevant.) E.g.:

  • *ćimpē > K čï´mmb, T čïḿḿb́e “shin”
  • *ijë > K ïjj, T jïjj “night”
  • *kikë- > K kïggeð, T kïkkɐd “to rut”
  • *kirtē- > K kï´rrdeð, T kïŕŕďed “to fly”
  • *nisōn > K T nïzan “woman”
  • *piksë > K pïxxs, T pïkks “bird’s sternum”
  • *pirë > K T pïrr “around”
  • *rissē > K rï´ss, T rïśśe “twig”
  • *silpë > K T sïllb “silver”
  • *tikkē > K tï´kk, T tïx́x́ḱe “tick”
  • *vitë > K vïdd, T vïtt “five”

There seems to be one regular exception to this development: /i/ remains after *ń-. The cases are:

  • *ńiŋēlës > K ńiŋŋlȧs “female”. Lehtiranta reconstructs this with initial *n-, but variation might go back to Proto-Samic; Pite, Northern and Skolt Sami also have /ń-/, while Ume, Lule and Inari Sami have /n-/.
  • *ńińćē > K ńi´ńńdž, T ńińńdže “teat”
  • *ńipćōs > K ńipčas, T ńipčs “roasting spit”

Not a whole lot, but this makes good phonetical sense, and there seem to be no counterexamples.

Another environment where /i/ comes up frequently is before coda *j. However, this is not fully regular, and given that PS *-ijC- only occurs in loanwords from Finnic & Scandinavian, analyzing these as post-Proto-Samic loanwords adopted after the change *i > *ï seems preferrable:

  • K ki´jjteð, T kijjtad “to thank” (← Finnic *kiittä-)
  • K li´jjg “excess” (← Finnic *liika)
  • K ni´jjb, T nijjb́e “knife” (← Scand.)
  • K T sijjd “village” (← Scand.)

The expected /ïj/ is still found in three words (and also cf. *ijë “night” above):

  • *lijnē > K lï´jjn, T lïjjńe “linen”
  • *rijtō > K rïjjd “quarrel”
  • *tijmā > K T tïjjma “last year”

So far, so good. But let’s kick it up a notch. PS *ë, as I mentioned before, has a variety of differing reflexes across the Eastern Samic varities. All of the main ones are some flavor of open-to-mid, back-to-central. For Kola Sami, a representative selection would be:

  • *mënë- > K mëënneð, T mɐnnɐd “to go”
  • *nënōs > K nȧnas, T nɐnas “strong”
  • *tënē > K tȧ´nn, T tɐńńe “tin”

Before PS *ń- and *j-, though, several words point to *i across Eastern Samic. Lehtiranta lists 7 roots beginning with the sequence *jë-, and 10 with *ńë-, that are found in ES. 8 of these have *i-like reflexes in at least one ES variety. This does not seem like a coincidence — similar cases in other consonant environments, including before *ć-, are very rare.

The regular cases are:

  • *jëlkëtē > Inari jolgad, Skolt jõlggâd “flat”
  • *jëllë > I jolla, Sk jõll “crazy”
  • *jëlŋēs > I jalŋes, Sk jââ´lnjes, K jȧ´lŋes, T jɐĺĺŋ́eś “tree stump”
  • *jërŋë > I jorŋa, Sk jõrŋŋ, K jëërn “open water”
  • *jëskë(tē) > I joska, Sk jõskk, K jëëskeð, T jɐsskɐd “quiet”
  • *ńëðē- > I njađđeeđ, Sk njââ´đđed, K ńȧ´ddeð, T ńɐťťed “to affix together”
  • *ńëlë- > I njoollađ, Sk njõõllâd, K ńëlleð “to debark a tree”
  • *ńël-tē- > I njaldeđ, Sk njâ´ldded, K ńȧlldeð “to peel” (a derivative of the previous)
  • *ńëvē > I njauve, Sk njââ´vv, T ńɐv́v́e “rapids”

The seemingly irregular cases are:

  • *jëkē > I ihe, Sk ee´ǩǩ, Ki ï´gg, T jïḱḱe “year”
  • *jëŋë- > I iiŋŋađ, Sk iiŋŋâd, K ïŋŋeð “to dry”
  • *ńëckē- > I njiskođ; — but Sk njõõcksed, K ńȧ´ckseð “to scrape (off)”
  • *ńëkē- > I njihe-, Sk njee´ǩǩ-, K ńï´gg-, T ńïkke- “slanted”
  • *ńëkkē(ńë)- > I njihanjas, Sk njikknâsted, K ńiggnȧ´steð, T ńïx́x́ḱed “to hiccup”
  • *ńëmë- > Sk njiimmâd, K ńïmmeð, T ńïmmɐd; — but I njommađ “to suck”
  • *ńëncē- > I njiʒʒed, Sk nje´ʒʒed; — but K ńȧ´nndzeð “to rip off”
  • *ńëvlē > I njivle, Sk njeu´ll, K ńi´vvl “slime”

There are actually some hints for the conditioning of the split here. After *ń, *ë mostly remains low in original open syllables, vs. is reflected as more close/front in original closed syllables. Vowel length in Skolt seems like an even better indicator that allows also understanding “to suck”. Hence it seems that this change is related to the secondary vowel lengthening that I mentioned last time: only short *ë is palatalized to *i, while lengthened *ë remains. The lack of raising in *ńëltë-, then, might be due to the derivational relationship to *ńëlë-.

Bizarrely, the situation seems to be the inverse for *jë-: going again per Skolt, the lengthened cases are raised/fronted, while the short cases remain.

Furthermore, the interaction of this phenomenon with the previous one does something weird: the change *ńi > /ńi/ fails to occur in several Kola Sami words in this 2nd group (“slanted”, “to suck”, partially “hiccup”)! This is quite mysterious. To route these words in as regular developments, we’d have to assume that *ńë > *ńi only happened after *ńi > /ńi/ — but also before the change *i > /ï/. That is, *ńi > /ńi/ would not represent a simple absense of sound change, but instead some sort of a shunt to a different vowel altogether, later booted back to regular /i/?!

Perhaps these are actually cases of etymological hypercorrection. Suppose that the words with /ńï/ are not actually inherited in Kola Sami, but were loaned from Skolt (or some other eastern but non-Kola variety)? If so, the speakers could have latched on to the usual pattern /i/ : /ï/ and overgeneralized here. Several examples are known of this kind of process even between Samic and Finnic; why not also between the individual Sami varieties?


 The phonetic nature of the change *ë > *i / {j, ń}_ is interesting too. PS *ë is generally taken to derive from earlier *i (< PU *i, *ü), while at this timeframe *i would have been a long vowel *ī. In this light, I wonder if the palatal assimilation was actually sufficiently early that *ë was still hanging somewhere around the front vowel region, e.g. as [ɪ], and the change amounted to simple raising? If the assimilation operated on an already retracted vowel like [ə] or [ɤ] or [ʌ], I’d rather expect the result to have been a mid front vowel like [e]. — OTOH the Samic languages do seem to be somewhat “allergic” to this sound, so raising all the way back to /i/ does not seem entirely out of the question.

Tagged with: , , , , ,
Posted in Uncategorized

Notes on Eastern Sami vowel history, part 1

Recently I sat down with my copy of J. Lehtiranta’s Proto-Samic dictionary, Yhteissaamelainen sanasto (1989; SUST 200) to work out the development of the vowel systems in the Eastern Samic languages. I do not know if this has been done before; it might have, though I am not exactly worried about rederiving results. [1] At minimum this topic is absent from Sammallahti’s handbook The Saami Languages (1998). His historical phonology appendix covers at length only the evolution from Proto-Uralic to Proto-Samic, and from there to Northern Sami. In the main chapters, too, he only mentions a handful of innovations for Eastern Samic (that he deems diagnostic for defining its taxonomy). Yet it’s obvious that there’s been much divergence going on here: cf. e.g. *kōlē “fish” > Inari Sami kyeli, Skolt Sami kue´ll, Kildin Sami kū´ll, Ter Sami kïĺĺe. [2]

The following fairly general features stand out:

  • The umlaut tendencies that already must have begun in the Proto-Samic era have continued wildly. Most vowels have distinct reflexes before each of the three common PS stem vowels: *-ë, *-ē, *-ō. (Since Lehtiranta only lists citation forms of words, I don’t have much idea what the effect of PS *-ā, *-i, and *-u, which are rare outside of inflected forms, has been.) As usual, this must’ve been allophonic at first, but was later widely phonemicized by loss of unstressed vowels.
  • The mid vowels *ē, *ea, *oa, *ō, *o, *ë have the most varied reflexes. The close *i, *u are mostly unaffected (only Skolt has any umlauts going on with these), and *ā has not been majorly affected either.
  • Although not all languages distinguish all different umlaut “grades” of various PS vowels, I suspect umlauts for the most part regardless occurred in Proto-East Samic already, and that various languages have simply secondarily lost certain distinctions — since they seem to have done so in different ways. E.g. in Inari, *ë-ë and *ë-ō both > /o/, versus *ë-ē > /a/; but in Ter, instead *ë-ë and *ë-ē both > /ɐ/, versus *ë-ō > /o/. Skolt and Kildin distinguish all three types.
  • In addition to the umlaut splits, there also seems to be a vowel lenght split. There is of course no sign of this in Ter, where vowel lenght contrasts have been lost altogether; but it’s found relatively robustly in the other three languages. This seems regardless a little bit more like an areal phenomenon: lengthening in Inari almost always implies lengthening in Skolt, but Kildin corresponds poorer to these, and there are also cases where lengthening is found only in Skolt. As for conditioning, long vowels seem to be the rule of thumb before singleton medials, short vowels more general before two-stop clusters. This includes geminates, so the change must have been earlier than the strengthening of the strong grade of single stop consonants to geminates in Skolt. I’ve not worked out the conditions for other consonant clusters yet.
  • Skolt Sami seems to be altogether the Sami variety with the most complicated vocalism (though Southern Sami could give it a good run for the title). At its best, *ea has no less than seven different reflexes: eä, iä, iâ, iõ, ie, e, ee!

I do have a full correspondence table charted out, but further details shall come later once I’m done dubblechecking things.

All this has clearly had one important effect, though: loanwords seem to frequently “fail to keep up” with all the hair-thin split rules going on. Generally such cases seem to remain phonetically closer to the loaning language. It follows that such loans have to be dated as newer than Proto-Samic; indeed, possibly as newer than the splitting of all dialects in question. Even then, many such loanwords show a distribution across nearly all of the Samic languages. This seems to be another good demonstration of a point I think Uralic etymology needs to pay a lot more attention to: the “distributional principle” (“a word dates to the common ancestor of the languages it is found in”) cannot be trusted in the case of loanwords.

— There’s also one interesting feature that suggests some reinterpretation of the Proto-Samic vowel system. The *-ē-grade reflexes generally seem to be somewhat fronted, when distinct from the “unmarked” *-ë-grade reflexes (cf. e.g. “fish” above). On the other hand, *-ō has had a fairly general lowering effect, not so much a labializing one. This is only natural insofar as *-ō merges with *-ā in Skolt thru Ter. But it does remain a labial vowel in Inari. So what’s up with changes such as *pēŋkë > piegga “wind” vs. *pērkō > piärgu “food”; *mōrë > muora “tree” vs. *mōlōs > muálus “thawed water at shore”? And for that matter, Sammallahti notes that *ō caused also earlier lowering of PU *e, *o to PS *ea, *oa; [3] he posits a relatively open value [ɔː] for the vowel for this reason.

I now have formulated a different hypothesis. The etymological origin of *-ō is unclear — but most proposals have involved a coloring of PU *-a in some fashion. However! If there was indeed a change *-aw > *-o, perhaps this should be postdated to the dialectal Sami era. The following chronology seems to have potential:

  1. Late Proto-Samic: 2nd syllable *-a > *-ā generally changes to PS *-ē, but remains in PS stems of the shape *-āw.
  2. After the W/E split: Secondary *ā-umlaut in Eastern Sami.
  3. After further dialectification: *āw coalesces to *ō in Western + Inari Sami; but merges with *ā in Skolt + Kola Sami.

Of course, this would require looking into the consequences. One issue is that Proto-Samic had not only the traditional *-ō-stems, but also the class of *-ōj-stems. How should these be reconstructed in this system? I don’t think anything with front rounded glides (*-āẅ?!) would work, since PS had eliminated front rounded vowels from its phonology. Maybe *-āwjV?


 Followups: Part 2

[1] If anything, I consider this a much better way to get a hang of known results than just reading about them from a reference book. Also, I did this kind of a survey on Livonian once before and that ended up with me making a couple of discoveries that have by now grown to a draft paper.
[2] Or, supposedly, with “light” palatalization (UPA subscript half ring), not “heavy” (UPA superscript acute). I’ve seen the similar contrast of “palatal prosody” vs “segmental palatalization” in Skolt Sami transcribed as one of secondary palatalization [tsʲ sʲ nʲ  lʲ] vs. full palatality [tɕ ɕ ɲ ʎ] though — and given how the UPA is surprizingly terrible at representing primary palatals, I’m guessing this is the case for Ter Sami as well. Especially since both languages lack a “heavily palatalized ŕ” where expected, which squares well with how palatal trills are physiologically impossible.
[3] Actually [ɛː], [ɔː] according to him. I have a couple reasons to think these may have diphthongized already early on, though; but that’s ever so slightly off-topic for this post…

Tagged with: , , , ,
Posted in Uncategorized

Proto-Yukaghir voiced stops (and their implications)

One of the more popular proposals for external relationships of the Uralic family is the Uralo-Yukaghir hypothesis. By certain measures it might even count as the most popular one. The idea has been around for a long while, but in an infuriatingly entrenched state, with views divided between mainstream specialists dismissing everything as speculation, vs. macro-comparativists and several outsiders taking the relationship as more or less granted. [1] E.g. from the humbler and more “professionally credible” end of the latter group, consider Michael Fortescue’s 1998 monograph Language Relations Across Bering Strait: the book makes no attempt to explore the possibility of any Uralic/Yukaghir similarities resulting from anything but genetic inheritance. This is a particularly jarring omission since he does still cover other contact influences relevant to his idea of relating Uralic, Yukaghir, Chukotko-Kamchatkan and Eskimo-Aleut: those between Y + CK, CK + EA, and even between the individual branches of CK and EA.

Research into the hypothesis seems to be finally picking up these days, though. Much of this must have been enabled by Elena Nikolayeva’s ongoing work on the Yukaghir side, culminating in her 2006 monograph, A Historical Dictionary of Yukaghir. After an apparent latency period of diffusion and digestion, a bunch of new views on U/Y relations have emerged here in Finland within the last few years in particular:

  • Häkkinen, Jaakko (2012): Early contacts between Uralic and Yukaghir. [Appendix.] In: SUST 264.
    — An attempt to model lexical correspondences as several strata of loanwords, and to determine what this would imply for Uralic and Yukaghir prehistory in geographical and archeological terms.
  • Piispanen, Peter S. (2013): The Uralic-Yukaghiric connection revisited: Sound Correspondences of Geminate Clusters. In: SUSA 94.
    — A more optimistic take, presuming a relationship and suggesting some new lexical comparisons requiring rather wild new soundlaws.
  • Luobbal Sámmol Sámmol Ante (Ante Aikio): The Uralic-Yukaghir lexical correspondences:
    genetic inheritance, language contact or chance resemblance? [Preprint.] To appear in: FUF 62.
    — A detailed, conservative review, suggesting that the currently known material is too scarce to establish regular sound correspondences, and that therefore many lexical comparisons may turn out to be simply accidental similarities.

According to the word on the grapevine, there is also at least one further paper in the works on the topic.

I have yet to subscribe to any particular hypothesis on the topic (though of course a burden of proof should lie on those claiming a particularly close U/Y relationship). But it seems to me any assessment of the situation is going to strongly depend on our general understanding of Uralic and Yukaghir prehistory. One of the aims of my various ongoing work on Proto-Uralic is indeed to allow better assessing the various external relationships that have been proposed. I present here one proposal for amending Proto-Yukaghir as well.


The presence of voiced spirant consonants (at minimum *ð, *ɣ) have been listed by Fortescue as one of the better phonological markers of his “Uralo-Siberian” group of language families. The phonetic character of at least the Proto-Uralic “spirants” is however anything but clear… And on closer examination, I believe that for Proto-Yukaghir they’re probably a mistaken assumption.

The modern Yukaghir languages — Kolyma Yukaghir and Tundra Yukaghir — do not have any systematic series of voiced spirants. These only show up in Proto-Yukaghir as reconstructed by Nikolayeva. She posits PY word-medial *w, *ð, *ɣ [2] behind the following three sound correspondences:

  • Kolyma /b/ ~ Tundra /w/
  • Kolyma /d/ ~ Tundra /r/
  • Kolyma /g, ʁ/ ~ Tundra /g, ʁ/ (depending on the PY vowel backness)

This is not an immediately obvious reconstruction. Several changes are required here to derive the modern sound values: across-the-line spirant fortition in Kolyma, rhotacism of *ð + sporadic fortition of *ɣ in Tundra. It seems to me it would be more parsimonious to reconstruct here PY voiced stops *b, *d, *g (~ [ʁ]), and to assume only the lenition of *b and *d in Tundra. Note also that the change *d > *r can easily occur directly, without any intermediate *ð stage.

*w is reconstructed also word-initially for Proto-Yukaghir: again reflected as Tundra /w/, but instead lost in Kolyma. This is an odd asymmetry. Normally, glide or spirant fortition is more likely to occur word-initially — for example cf. Spanish and Selkup. [3] On the other hand, *b is not a consonant that is commonly lost word-initially, so reconstructing that here, too, would not help either. I suggest accepting the asymmetry instead of trying to explain it away: reconstructing initial *w- but medial *-b-. This state of affairs still technically allows identifying these two as the same proto-phoneme — which would provide a motivation for my newly assumed shift *b > /w/ in Tundra (and yet not *g > ˣ/ɣ/, which is a more common 1st step in voiced stop lenition chainshifts).

Perhaps there was also an earlier original word-internal *-w-, which was vocalized/lost in all attested Yukaghir varieties; either already in Proto-Yukaghir, or even slightly later on, in which case it might explain some of the numerous irregular vowel correspondences between Tundra and Kolyma.

The history of PY consonant clusters can furthermore be streamlined here. Nikolayeva sets up a set of nasal + voiceless stop clusters such as *mt, *ŋć, *ŋk, and has to assume later voicing to yield the actually attested /md/, /ŋď/, /ŋg/, etc. However, if voiced stops and not spirants are posited for PY, they can easily be reconstructed here as well. Nikolayeva also reconstructs liquid + stop clusters, and notes that the stops “mostly” remain unvoiced in these; yet with some exceptions. It seems these “exceptions”, that correlate neatly between Tundra and Kolyma, could have been in place already in Proto-Yukaghir.

The overall phonotactic pattern here — voiced stops that are restricted to word-medial positions and only contrast with voiceless stops between vowels (and, perhaps, after liquids?) — still suggests that some pre-Yukaghir stage only had voiceless stops; which were then voiced in some medial positions; followed by the introduction of new medial voiceless stops from some secondary source (e.g. geminate voiceless stops, loanwords). Some variation of this history has occurred widely among the Uralic languages, for one. But this is no reason to assume that the change is recent! Dialects of Mokša and Mari have resisted initial voiced stops in loanwords until fairly modern times (18th-20th century), despite medial voiced stops having existed already in Proto-Mordvinic and Proto-Mari times (somewhere around the 1st millennium CE).

Lexical correspondences with the Uralic languages also appear to support this model. I will refer here to Proto-Yukaghir roots by their index numbers in the Historical Dictionary, following Aikio’s paper linked above (it includes a useful appendix of Nikolayeva’s U/Y comparisons).

Considering the labial consonants other than *m, three recurring patterns involving these seem to be attested:

  • PU *w ~ PY ∅ (#620, “tree” ~ “birch”; #1112, “vapor” ~ “smoke”; ? #2050, “to hear” ~ “sound”)
  • PU *(m)p ~ PY *w (#139, “older sister”; #1048, “warm”)
  • PU *pp ~ PY *p (#362, “sharp”; #1038, “to tear”; #2150, “to hit”)

Medial *-w-, *-p-, *-pp- are actually a fairly rare in PU, so even though some of the Uralic roots involved here are uncertain and there are some semantic differences, I find this a not quite trivial tally.

The correspondence *w ~ *w also seems to be absent (#806 “to leave” is a clearly rejectable comparison since the supposed “Uralic” root is a Germanic loan). While the material is scarce and so this could be an accidental gap, it seems regardless preferrable to interpret the material as reflecting the following developments:

  • (pre-)PU *w → pre-Y *w > PY ∅
  • (pre-)PU *(m)p → PY *b (voiced either in pre-Yukaghir or in some loaning Uralic branch)
  • (pre-)PU *pp → PY *p (shortened either in pre-Y or in some loaning Uralic branch)

…which also implies that we should indeed not expect any examples of the correspondence *-w-  ~ *-b- to turn up. [4]

Though this does not seem to generalize to the other POAs. There indeed do not seem to be any recurring correspondences involving intervocalic dental obstruents (or even more suspiciously, any comparisons involving *-t- on either side [5]); and the only recurring intervocalic velar correspondence is PU *x ~ PY *g (#1480, “guard” ~ “hunt”; #2599, “lead, take”). There is also one example each of *k ~ *g (#1302, “hill(s)”) and of *w ~ *g (#1019, “to eat”). These bring to mind the East Uralic development of *-k-, *-w- to *-ɣ-, which seems to suggest that if these comparisons are correct, they probably represent loans rather than inheritance.


Additionally, I wonder if the current issue has partly also been an issue of terminology. Nikolayeva’s model of the history of Yukaghir includes not only the Proto-Yukaghir stage, but also an “Old Yukaghir” stage, which would already have e.g. featured voiced stops in clusters. This is mainly used as a cover term for early historical records prior to the mid-19th century, but perhaps her underlying mental model in full detail actually looks like this:

Proto-Yukaghir > Old Yukaghir > dialectified Old Yukaghir > modern Kolyma Yukaghir & Tundra Yukaghir

Under this scenario, the 1st “Old Y.” stage would be the actual last common ancestor of the recorded Yukaghir varieties, while “Proto-Y.” would be an internally reconstructed entity. It would not be the first time a historical linguist were to abuse terminology in this way.

This is not a random guess. There are a couple other hints for this interpretation, e.g. the treatment of long vowels. Nikolayeva does not reconstruct these in certain positions where they do not contrast with short vowels, even though they appear in all records. She assumes that they must hence be ultimately somehow secondary even in other positions. This does not necessarily follow: consider e.g. Modern English, where “vowel length” (well, tenseness) fails to be contrastive in open monosyllables, in most dialects also before /r/. Regardless of this, and even regardless of numerous reconstructible processes of compensatory lengthening (e.g. light /laɪt/ ~ German Licht /lɪçt/), the vowel length contrast in English is absolutely ancient: it can be traced back all the way to Proto-Indo-European!

(English incidentally and probably coincidentally works as a typological parallel also for my idea that medial *-w- could have been lost earlier on while initial *w- still remained.)

Finally, I can’t help noticing that the long vowel issue and the reconstruction of spirants rather than voiced stops both swerve “Proto-Y.” typologically closer to standard-issue Proto-Uralic. Is this perhaps not an accident, but rather a general bias that has resulted from Nikolayeva’s working hypothesis of a Uralo-Yukaghir relationship?

[1] Incidentally I find it an interesting question why this particular hypothetical relationship is so pervasively accepted by Nostraticists and the like. There is no shortage of competing proposals, such as Indo-Uralic or Uralo-Dravidian; and neither does Uralo-Yukaghir have a history of recognition by the general public, unlike e.g. the Ural-Altaic or Uralo-Sumerian hypotheses. Is it perhaps that the relative obscurity of Yukaghir has made it more difficult to notice weaknesses of the idea?
[2] Yes, I am aware that /w/ is a semivowel, not a spirant, though frequently it may pattern as one (or, perhaps better: “isolated” voiced spirants may pattern as dental/velar glides).
[3] Even more so for geminate glides actually, with some precedents being North Germanic + Gothic (*ww > *ggw, *jj > *ddj ~ *ggj); Northern Sami (*jj > /dj/); Votic (*jj > /ďď/); various Prakrits including Pāli (e.g. *vv > /bb/); and several Berber varieties (e.g. *ww > /ggʷ/). This doesn’t seem to come into question here, though.
[4] There is a development *w > *b in most Samoyedic languages that could allow this, but being post-Proto-Samoyedic (absent from Nenets and Selkup), this might have been too late to be relevant.
[5] This is particularly curious since PU *-t- has, by contrast, Indo-European correspondences in abundance. Any macrocomparativist model that proposed common ancestry for all three, or even just for Y+U, would be hard-pressed to explain why Yukaghir has lost such words so consistently.

Tagged with: , , , , , , ,
Posted in Uncategorized

Email works again

Whoops. I noticed that the email alias I had been using on my About page no longer works (and might not have worked for a while). I hope this has not led to too many lost messages. :/

Tagged with:
Posted in Uncategorized

Career adjustment in progress

Recently I have presented my first “official” conference talk: Palatal unpacking in Finnic, based on an old blog post series. A humble step forward on my ongoing project of swapping the career/hobby statuses of the two fields of research I am currently most involved with: math and linguistics. Or perhaps a step sideways, rather?

The title of this blog will still remain “Freelance Reconstruction” for the time being, but if all goes well I’ll have to think up a new name within a few years.

I never did finish the supposed blog post part 5, but what I had planned for it checks out: there are no cases with traditionally reconstructed PU *-ć- or *-ś- that would get in the way of my new proposal for the later Finnic development of these. I’m assembling a full article on the matter as well. Time will tell if it will be fit for release on its own or if I’ll integrate it into other work. Say, a wider analysis of the historical vowel coloring effects of the Proto-Uralic palatal consonants?

— A side observation: STEM background seems to be not too rare at all on the linguistics side of the Internet. Out of the small handful of blogs I check with some frequency, I am under the impression this holds also for at least Lameen S. of Jabal Al-Lughat and Steve D. of languagehat. Many further cases can also be found in the para-academic linguistics scene centered on online mailing lists such as Cybalist. Nerds of prey flock together, of course, but there might be a deeper selection bias of some sort involved here too…

Tagged with: ,
Posted in Uncategorized

Two Lemmata: PU *ë, PMs *ee *ëë *oo

Not “lemma” in the usual linguistic “citation form” sense, but in the mathematical “intermediate result” sense. I’ve noticed having to clarify these topics at quite a few points, so here’s a single post for the purpose. I’ll keep it brief here, i.e. without going into detailed presentation of the underlying etymological material… though that could be arranged too, if someone so requests?

Proto-Uralic *ë

A back unrounded non-open vowel, contrasting with the more basic *a and *o, has been reconstructed for Proto-Uralic or Proto-Finno-Ugric at various times. Originally, this was motivated by the appearence of a back unrounded /ëë/ [ʌː ~ ɤː] in certain varieties of Mansi; and of corresponding /ïï/ [ɯ] in Eastern Khanty. A “new” such vowel was established in Janhunen ’81, [1] on the basis of a correspondence of Proto-Samoyedic *ë and *ï (in largely complementary distribution with each other) to West Uralic *a. PSmy *ë and *ï also correspond regularly to Ob-Ugric cases of /ëë/ or /ïï/ — hence the reconstruction of a distinct PU *ë rests now on quite firm ground. Further traces of this vowel can actually be identified in most Uralic languages west of the Urals.

There has been some uncertainty here ever since Janhunen’s paper, though. For reasons not fully elucidated, he prefers to reconstruct a close vowel *ï instead of a mid vowel *ë, although his actual evidence does not explicitly support a close value for this vowel. What arguments he does give are based solely on an (IMO mistaken) analysis of the PU 2nd-syllable vocalism, without addressing the situation in the 1st syllable. This problem has been only halfway addressed by the treatment in the other current-day key work on PU reconstruction, Sammallahti ’88: according to him, 1st-syllable *ï would have lowered to *ë at the level of “Proto-Finno-Ugric” edit: “Proto-Finno-Permic (whose existence I reject).

As a survey of the later reflexes will show, Sammallahti’s conclusion that most western Uralic languages point to *ë rather than *ï is correct. It must, however, be extended for Ugric and Samoyedic as well, which leaves no option but to reconstruct original *ë.

  • The languages of the West Uralic group (Samic, Finnic, Mordvinic) show a development *ë > *a in all positions, suggesting a relatively open value. *a does get further shifted to *ā > *ō and later yet diphthongized to *uo in Samic; but on the basis of Proto-Germanic and even Proto-Scandinavian loanwords, this can be seen to be a fairly late development. [2] Under certain conditions, the same process happens in Finnic as well. (An older PU/PFU *ō used to be reconstructed for these words during the mid 20th century, but this can be recognized as no longer necessary and would, at any rate, run into several difficulties in explaining the reflexes in the other Uralic languages.)
  • Mari and Hungarian also show the merger *ë > *a, but only before 2nd syllable *a. Possibly this can be analyzed as an assimilation development.
    Before 2nd syllable *ə, both languages have a distinctive reflex: Mari *ü, Hungarian *ï (> modern H í). Although these are close vowels, they in fact point to an original mid value: the original PU close vowels *i *ü *u are reflected in both languages as reduced *ɪ *ʏ *ʊ (> modern H short mid e ö o). In both languages, the unreduced close vowels normally derive from mid or open PU vowels under various conditions. [3]

    • *ä > *i in Mari in e.g. *äjmä > *imə “needle”, *lämpə > *liwä- “to warm up”
    • *e > *i (> í) in Hungarian in e.g. *wetə > víz “water”
    • *o > *u in Mari in e.g. *kota > *kuðə “house”, *oksa > *ukš “branch”
    • *o > *u (> ú) in Hungarian in e.g. *molə- > múlik “to pass by”
  • In Permic, *ë is normally reflected as *u. While a close vowel, this is also the default reflex of PU *a and *o, again suggesting a relatively open original value. Additionally, PU *u is reflected as *ï — so even if PU *ï were reconstructed, an intervening development *ï > *ë would still have to be assumed here to route this vowel out of the way of *u. [4]
    Under certain conditions, there is also a development *ë > *ë (e.g. *sënə > *sën “sinew”), which looks like a retention. All these words have *ə in the 2nd syllable; so perhaps the initial step of this split, too, was a lowering *ë > *a / _(C)Ca (and also in some other environments), later followed by *a >> *u.
  • Mansi reflects *ë as a long vowel *ëë. Even if we were to accept the currently commonly accepted reconstruction as a long close *ïï (see below), this regardless point to an original non-close value: the other PMs long vowels uniformly derive from PU open and mid vowels. The PU close vowels *i, *ü, *u are meanwhile uniformly reflected as PMs short vowels, even if they are also generally lowered. So again, even if PU *ï were reconstructed, we would have to posit a fairly early lowering to *ë, for this vowel to participate in the general lengthening of non-close vowels that seems to have occurred in Mansi.
  • The Samoyedic vowel split *ë > *ë, *ï cannot be a priori resolved in favor of either starting point: the stated conditions can be easily reversed. Janhunen ’81 suggests that *ë occurs in closed, *ï in open syllables, either of which would make a plausible environment for a vowel shift.
    However, there is circumstantial evidence against *ï as a starting point. The PU close *i and *u are split in Samoyedic as well: either retained as *i, *u, or reduced to *ə. Yet, they are not lowered to the corresponding mid vowels *e, *o. If PU *ï were reconstructed, the expected Samoyedic split would therefore be *ï ~ *ə, not *ï ~ *ë. [5]
    There are moreover some cases of *ï or *ë of irregular/unclear origin. These include no examples of close to mid development (*u > **ë or the like), but at least one mid to close development: *joŋsə > *jïntə “bow” — perhaps a case of glide-induced coloring. One parsimonious explanation would be to assume here first *o > *ë, then *ë > *ï along the other cases. This depends on how we model the *ë/*ï split exactly, though, and since there are also examples of *u > *ï (at least *kuŋə > *kïj “moon”), it’s also entirely possible that the history here has been *o > *u > *ï instead.
  • The Khanty situation is complicated and does not seem to allow clear conclusions. The main reflexes seem to be *ïï and *aa (in a largely similar distribution as in Samoyedic). The other PKh close tense vowels [6] *ii *üü *uu generally go back to PU open or mid vowels, so the first reflex could be seen as a point in favor of original mid *ë. PKh open tense *ää *aa in other positions likewise also mostly go back to PU open or mid vowels.
    On the other hand: the PU close *i yields PKh mid tense *ee by default, and PU close *u can yield PKh mid tense *oo and *ɔɔ under certain conditions. PKh also conspicuously lacks a mid back unrounded *ëë. If PU *ë/*ï >> *aa went thru an intermediate *ëë stage (the lowering *ëë > *aa has direct parallels in the Mansi dialects in contact with Khanty), then this reflex could suggest an original close value.
    — Some newer reconstructions of Proto-Khanty posit *ä and *a in place of *ee and *oo, though. It would be possible to suggest that the last was actually labial *å and to argue that *aa < *ëë < *ë < *a, to maintain my previous idea in place. But alternately, deriving /oo/ from *a, as found in the generally fairly conservative Far Eastern and Far Northern Khanty, would seem to make better sense, if the development were *a > *aa > *oo; and if so, original “*aa”, unaffected by these changes, would have to have been *ëë at this time. And we’d then be back to a similar argument as seen with Mansi: PU close > PKh lax, vs. PU open/mid > PKh tense.

The evidence thus seems quite clear: reconstructing mid *ë is preferrable to reconstructing close *ï. Some of the Khanty evidence may point to *ï, but given the complicated history of Khanty vowels, this should not count as decisive.

Typologically, the reconstruction of mid *ë without a close counterpart *ï is also unproblematic. A similar situation can be observed in e.g. Votic and Estonian, with only ‹õ› /ɤ/ [7]; many dialects of modern English, with an open-mid /ʌ/ for ‹u› in words like strut, and even a rhotic counterpart /ɚ/ in words like nurse; or Bulgarian, with ‹ъ› /ɤ/, descending from Proto-Slavic *ъ /ʊ/.

True, there are also a great many languages with a superficially unpaired non-open back unrounded vowel. Yet such languages tend to have simple vowel inventories along the lines of /i ɨ u e a o/, where /a/ can be analyzed as the open counterpart of /ɨ/! The same applies to the pan-Turkic vowel system /i ü ı u e ö a o/ (/ı/ might vary from [ɨ] to [ɯ] I think; any Turkologists passing by are welcome to set me straight). OTTOMH the only language that would have both a three-degree height contrast, and an ï-type vowel without an ë-type one, is precisely Eastern Khanty.

Proto-Mansi long mid vowels

The PMs vowel system is normally reconstructed as contrasting two degrees of both height and length. The long vowels comprise five units: the open vowels *ää, *aa, and the non-open vowels *ee, *ëë, *oo.

What I write here as *ee and *oo have been traditionally reconstructed as close *ii and *uu. *ëë has moreover been reconstructed as *ïï since Honti ’82, including many default reference works such as Sammallahti ’88.

While I agree with the idea that these three vowels should be treated as a single set, I belive Honti got this adjustment the wrong way around. This is because the majority treatment seems to be mid values:

  • *ee: Reflected as mid /ee/ in most varieties of Mansi. Close /ii/ is found in most positions in Southern Mansi, and in a couple of words also Western and Eastern Mansi.
  • *ëë: Reflected as mid /ëë/ or open /aa/ in all varieties of Mansi.
  • *oo: Reflected as mid /oo/ in Southern Mansi, but as close /uu/ in the Core Mansi varieties (West+East+North). I assume the latter value is due to a chainshift: PMs *aa shifts to /oo/ in these same varieties.

Etymology also supports mid values for these vowels. *ee is a reflex of PU *e under unclear conditions; *ëë is the main reflex of PU *ë (which I hope to have just now established as indeed a mid vowel); and *oo is a reflex of PU *a and, probably under some conditions, PU *o. It strikes me as terribly inefficient to assume that these vowels first became close, then proceeded to again become mid vowels widely across the Mansi varieties.

Then there are the known general principles of length/height interaction in vowel shifts:

  • Long vowels tend to be raised
  • Short vowels tend to be lowered
  • Open vowels tend to be lengthened
  • Close vowels tend to be shortened

…which come into action particularly well in vowel shifts involving general restructuring of the vowel system. [8] I can think of tons of examples (e.g. pretty much everything relevant that happens during West Uralic > Proto-Samic), while counterexamples are much rarer. [9] These in mind, it is already a priori preferrable to reconstruct any unconditional /ii/ ~ /ee/ or /uu/ ~ /oo/ correspondences from original *ee or *oo, not *ii or *uu.

Summing up, everything seems to check out: *ee, *ëë, *oo is a superior reconstruction equally well from the viewpoint of the attested Mansi varieties; the viewpoint of Proto-Uralic; and the viewpoint of typology of sound change.

—Note however that I am only arguing about phonetical reconstruction here. Phonologically speaking, I have nothing against an analysis according to which these vowels would have been distinguished from *aa and *ää by being simply [+close]. Yet, seeing how the Latin letters ‹i u› very much suggest non-mid values, we’d be better off using the available mid vowel base symbols ‹e o› instead. In my opinion broad transcription generally ought to be user-friendly rather than maximally adherent to any particular theory.

[1] Please cf. the newly published Bibliography page!
[2] E.g. Proto-Germanic *wētjō- > Proto-Scandinavian *wātjō- → pre-Proto-Samic *waććo > Proto-Samic *vōććō > Northern Sami vuohčču “bog”.
[3] Hungarian also has some long close vowels representing older *VwV sequences: eg. *ńomala > *ńowɜl/*ńuwɜl (?) > nyúl “hare”, *täktɜmɜ > *tätɜw > tetű “louse”.
[4] Technically, a labiality detour could also be arranged: *ɨ *u > *ɯ *ʉ > *u *ɨ? But this seems contrived — not the least for requiring an intermediate stage during which there are two non-front close vowels around, neither of which is [u].
[5] There is some uncertainly in this argument though, since no lowered reflex of the 3rd PU close vowel *ü is found — neither as *ə nor *ö. For that matter, the other PU mid vowels *o and *e don’t quite match the behavior of *ë either: *o splits “downward”, to yield *å~*o; while *e stays around as is (becoming /i/ later on in most Samoyedic languages, but per the evidence of Nganasan, not yet in Proto-Samoyedic).
[6] I will remind that although I use “single”/”double” transcription for Proto-Khanty vowels, just as also for e.g. Mansi and Finnic, this does not indicate a length distinction, but instead one of tenseness: the more numerous “double” vowels are the unmarked ones.
[7] A corresponding close y /ɯ/ has developed in South Estonian, but this is a later innovation. Livonian has similarly later expanded its set of vowels by ȯ, described as /ʊ/ or /ɯ/ in different sources.
[8] Conditional splits in the vowel system: umlauts, coloring effects, length changes due to prosodic factors… are a different issue.
[9] Though not nonexistent: two cases that come to mind are Northwest Germanic, where *ē > *ā but *e ≡ , and late Proto-Slavic, where *a > *o but *ā > *a.

Tagged with: , , , , , ,
Posted in Uncategorized
Follow

Get every new post delivered to your Inbox.