Interplay of minor soundlaws: Samic glide clusters

Shifting and widening my scope a little, here’s a look into the history of two consonant clusters across the Samic languages as a whole.

The two-glide cluster *-jv- is a simple place to start. The development of this is straightforward: this is retained essentially intact everywhere across all Sami varieties. (If you want to have a look for yourself, I am including copious links to the Álgu database in this post.) Possibly the coda *j may have been vocalized into the 2nd component of a diphthong/triphthong, but this is basically trivial.

A small further complication still comes up in Southern Sami. Two words have here a seemingly irregular *-jj-: *oajvē > åejjie “head, end”; and *peajvē > biejjie “sun; day”.

Both of these happen to be are inherited Uralic words, with cognates stretching all the way to Samoyedic. So my first reflex was to go “a ha! does this mean that the words showing /jv/ are therefore newer loanwords?” The answer is “no”, though: at least *koajvō- > gåajvodh “to dig” is of equally ancient pedigree. But I think I can dial this hypothesis back a little. Perhaps the shift *jv > *jj occurred due to the following front vowel *-ē (in Southern Sami characteristically diphthongized to /ie/ even in the 2nd syllable). This seems phonetically plausible & drops the number of counterexamples from half a dozen to one: *vājvē > vaejvie “pain”. This last word is in turn a known Finnish loanword, which may have indeed diffused into Southern Sami at a late date.

This idea seems to be preliminarily further supported by an interesting derivative of “head”: åajvadidh “to advice”. My chops in SS historical morphology are insufficient to present an implicit PS reconstruction, but we can clearly see here at least a retained stem vowel *ā, a regular feature before 3rd syllable *ë; in other positions this was further raised to *ē already in PS. And before this lower vowel, *-jv- survives after all.


Now let’s consider the opposite PS cluster *-vj-. This turns out to have had a much more complicated history.

Three Sami varieties have completely regular development. Lule Sami and Ter Sami have in all involved words metathesized this cluster, merging it with *-jv-. Inari Sami has always retained /vj/. Northern Sami also might belong here, depending on who you ask: the Álgu database claims /jv/ in a single word *sāvjë > sájva “isolated lake”, while my copy of Yhteissaamelainen sanasto presents sawˈjâ (= equivalent to sávja in the current NS orthography). I would guess that there are dialectal differences involved? FWIW, Sammallahti in The Saami Languages claims that at least the Torne Sami dialect group “originally” belonged with Lule Sami rather than Northern Sami. [1]

In a couple of other varieties, it is also possible to state a mostly applicable rule. Pite Sami aligns with its sibling Lule in having -jv- everywhere except in *jēvjë > jievja “white reindeer”; while Skolt and Kildin Sami align with Inari Sami in having -vj- everywhere except in *ćōvjë > Sk čuõivâk, K čuəivex “grey reindeer”. Probably these sorts of exceptions again represent loaning from neighboring dialects. [2]

Southern Sami again shows a few more complications; as does the neighboring Ume Sami. Covering SS first, metathesized /jv/ occurs in two words: *tāvjā > daajvaj “often”, *sāvjë > saajve “gnome”. Unmetathesized /vj/ is found in three: *jēvjë > joevje “light grey reindeer”, *jēvjë- > joevjeme “beard moss” (don’t ask me what’s the oe doing in these), *vōvjē > vuevjie “wedge”. Lastly, an assimilated /jj/ is found in *ćoavjē > tjåejjie “stomach”. This appears to confirm the assimilation rule I proposed in the 1st section: v > j / j_ie. Provided that we assume the metathesis *vj > *jv to have occurred before this…

The Ume Sami reflexes seem to support this last assumption. Although not many of the involved words have been recorded from here, /jv/ is found in those lexemes that have SS /jv/ ~ /jj/: dàivài “often” and tjåìvee “stomach” — while /vj/ is found in those that have SS /vj/: jauja “grey reindeer”, vyöyjee “wedge-shaped patch”. There is also one word with a somewhat baffling three-glide reflex: guyvjas “grey reindeer” (with unetymological /g-/ to boot). [3]

How should this distinction between S+U-metathesizing and S+U-unmetathesizing *-vj- be accounted for? Could this be etymological somehow? An interesting fact is that *vōvjē “wedge” is one of the Samic words showing lenition of original coda *k before sonorants (as shown by the Finnic cognates: e.g. modern Finnish vaaja, Karelian voakie, Livonian vaigā < PF *vakja). [4] So, perhaps this change occurred only after the metathesis of inherited *vj to *jv in Southern and Ume Sami? A late date for the change has already been suspected:

This sound change cannot be reliably dated, but it may well have taken place during a relatively late phase of Proto-Saami.

(Aikio 2006: 3.11 §) [5]

With this interpretation, a “maximally hereditary” chronology would be:

  1. Lenition *kj > *ɣj in Finnish.
  2. Samic *tāvjā “frequent” is loaned from Finnish *taɣja.
  3. Metathesis *vj > *jv in South & Ume.
  4. Lenition *kj > *vj all across Samic.
  5. Samic *jēvjë “white reindeer” is loaned from Germanic.
  6. Assimilation *v > j / j_ie in South. — Metathesis *vj > jv in Pite, Lule & Ter. — Raising *eu > *iu in Germanic.
  7. Samic *vājvē “pain” is loaned from Finnish vaiva.

…But is it a good idea to attempt maximizing the degree to which various Samic words would have been inherited from a common ancestor? I think it is important to keep in mind that fresh loanwords readily diffuse across dialect continua.

As for the particular downsides of the abov scenario, at minimum I am uncomfortable assuming that the specifically Finnish change *kj > *ɣj occurred earlier than the supposedly Proto-Germanic change *eu > *iu / _j. [6] OK, it’d be possible to go on making some cleanup assumptions; e.g. that in the numerous newer Germanic loans in Finnish where *ɣj can be reconstructed, this was substituted for original *kj; or perhaps, that the /k/ ~ /g/ found in the other Finnic languages would be a reversal from *ɣ; but this would all be for no other reason than ensuring a Proto-Samic ancestry for SS daajvaj, US dàivài. We could instead assume that S+U acquired these words from the direction of P+L, and show /jv/ for this reason.

This should also call into question whether my step 3 above existed at all. *sāvjë “gnome” (elsewhere in Samic also with meanings like “underground water”, “lake with an underworld entrance”, “isolated lake”) seems like a potential cultural loan from the P+L direction at least. It is of Germanic ultimate origin, but seems to have acquired its mythical flavor only on the Sami side: the PGmc root is simply *saiwiz “lake”.

Note moreover that this loan etymology actually predicts PS *-jv-, not *-vj-! And yet there is no evidence for the inverse metathesis *-jv- > *-vj- to have regularly occurred in any Samic variety. So are we therefore forced to furthermore conclude that this word was originally adopted specifically in the Pite/Lule area, and hypercorrectly metathesized to *-vj- when loaned eastward from these varieties? The Southern /jv/ could similarly also turn out to be original after all.

This leaves just the question of *ćoavjē “stomach”. Relationship to Samoyedic *t¹äjwə “stomach” has been proposed. The initial consonant, vowel frontness, and glide cluster order all fail to match, though, so I suspect this is only an accidental resemblance. I could just as well propose that the Samic word is a metathesis from something like earlier *voaćjē, and therefore related to Finnic *vacca “stomach”? (Ha ha.) With the case for inheritance being in this shape, I don’t think it would be too much of a problem to assume that here, too, the S+U forms have been loaned from the direction of P+L. — But still early enough to have participated in cluster smoothing in SS, apparently.


An additional topic to ponder at this point would be the motivation of the metathesis *-vj- > *-jv-, which altogether appears to be attested in at least two widely separated parts of the Samic dialect continuum. Pite and Lule Sami are spoken in northern Sweden and adjacent areas of Norway (also Finland if we count Torne Sami), Ter Sami at the eastern end of the Kola peninsula. It seems unlikely that these groups have been in any direct contact with each other since Proto-Samic times. It also seems unlikely that this incredibly specific metathesis was purely coincidentally innovated in both. One possibility might be some kind of a phonological precondition for this change having existed already in Proto-Samic, which in only two areas led to the change running to completion?

A better solution though might be a common external source. This exact same metathesis happens to be known furthermore from the Finnic languages! Late Proto-Finnic allows no *-vj- (or *-Vuj-/*-Vüj-: we are better off reconstructing diphthongs rather than coda glides at this date), and although no words with PU *-wj- have been retained in Finnic, a number of loanwords allow reconstructing a metathesis here. E.g. PGmc *flauja- → Finnish laiva “ship”. [7] Metatheses of some other similar clusters including older *-wr- (PS *jāvrē ~ LPF *järvi “lake”) are also found, which suggests that this type of change originated in Finnic, and might have been in the case of *-vj- > *-jv- passed on to Samic.

Still, why just these specific varieties? The Lule Sami probably had numerous connections with Finnic traders and settlers in the Torne Valley and adjacent areas since a much older period than the Finnmark/Inari/Skolt/Kildin Sami living further inland, that much is clear. Yet should we expect this shift to have therefore also been also present in the extinct “southeastern” Sami varieties such as the marginally attested Kemi Sami?

Particularly difficult to understand is Ter Sami. I do not think we even know at present whether the Kola Sami languages developed entirely in situ, or if they may have spread to Kola from e.g. the southern reaches of the White Sea, some of their characteristic features already in tow? The presence of this sound change might demand, at minimum, for Ter to descend from a dialect that was originally spoken further south than the corresponding ancestral dialect of Kildin…

[1] One wonders how and why may we claim that it no longer does; or whether we are to conclude that “Northern Sami” is an areal entity rather than a genealogical one.
[2] I wonder if these last two words have some relation to each other. The semantic closeness is obvious, and the consonant skeletons are quite similar as well. The proposed etymology for *jēvjë is loaning from (pre-?)Proto-Germanic *heuja- “hue”, and the Germanic *h- moreover comes from PIE *ḱ-. Wiktionary mentions here e.g. Lithuanian šývas “white”. Could any of the Satemic cognates have plausibly been loaned to yield pre-Samic *ćawjə or *ćowjə “grey”?)
[3] Or could this indicate a substitution *ḱ- >*k-, from some non-Satem variety? Perhaps not, since this would be chronologically problematic and there are other known examples of irregular *ć- > *k- in some varieties of Sami.
[4] Also by the word’s etymology as stemming from Baltic: cf. Lithuanian vagis, Latvian vadzis. For more details cf. Itkonen, Terho (1982): Laaja, lavea, lakea ja laakea. In: Virittäjä 86.
[5] Aikio, Ante (2006): On Germanic-Saami contacts and Saami prehistory. In: SUSA 91.
[6] I actually suspect this was “only” Northwest Germanic, given how Gothic shifts *e to *i always anyway. More details to come on this point later though. At any rate this would still not be a huge chronological relief.
[7] For further details cf. Koivulehto, Jorma (1970): Suomen laiva-sanasta. In: Virittäjä 74.

Notes on Eastern Sami vowel history, part 2

(← Part 1)

For initial details: a few complications involving *i and *ë.

In the Kola Sami branch (Kildin & Ter Sami), the default reflex of PS *i seems to be /ï/. (I dunno if this is [ɨ] or [ɯ], though I’m relatively sure that this isn’t really relevant.) E.g.:

  • *ćimpē > K čï´mmb, T čïḿḿb́e “shin”
  • *ijë > K ïjj, T jïjj “night”
  • *kikë- > K kïggeð, T kïkkɐd “to rut”
  • *kirtē- > K kï´rrdeð, T kïŕŕďed “to fly”
  • *nisōn > K T nïzan “woman”
  • *piksë > K pïxxs, T pïkks “bird’s sternum”
  • *pirë > K T pïrr “around”
  • *rissē > K rï´ss, T rïśśe “twig”
  • *silpë > K T sïllb “silver”
  • *tikkē > K tï´kk, T tïx́x́ḱe “tick”
  • *vitë > K vïdd, T vïtt “five”

There seems to be one regular exception to this development: /i/ remains after *ń-. The cases are:

  • *ńiŋēlës > K ńiŋŋlȧs “female”. Lehtiranta reconstructs this with initial *n-, but variation might go back to Proto-Samic; Pite, Northern and Skolt Sami also have /ń-/, while Ume, Lule and Inari Sami have /n-/.
  • *ńińćē > K ńi´ńńdž, T ńińńdže “teat”
  • *ńipćōs > K ńipčas, T ńipčs “roasting spit”

Not a whole lot, but this makes good phonetical sense, and there seem to be no counterexamples.

Another environment where /i/ comes up frequently is before coda *j. However, this is not fully regular, and given that PS *-ijC- only occurs in loanwords from Finnic & Scandinavian, analyzing these as post-Proto-Samic loanwords adopted after the change *i > *ï seems preferrable:

  • K ki´jjteð, T kijjtad “to thank” (← Finnic *kiittä-)
  • K li´jjg “excess” (← Finnic *liika)
  • K ni´jjb, T nijjb́e “knife” (← Scand.)
  • K T sijjd “village” (← Scand.)

The expected /ïj/ is still found in three words (and also cf. *ijë “night” above):

  • *lijnē > K lï´jjn, T lïjjńe “linen”
  • *rijtō > K rïjjd “quarrel”
  • *tijmā > K T tïjjma “last year”

So far, so good. But let’s kick it up a notch. PS *ë, as I mentioned before, has a variety of differing reflexes across the Eastern Samic varities. All of the main ones are some flavor of open-to-mid, back-to-central. For Kola Sami, a representative selection would be:

  • *mënë- > K mëënneð, T mɐnnɐd “to go”
  • *nënōs > K nȧnas, T nɐnas “strong”
  • *tënē > K tȧ´nn, T tɐńńe “tin”

Before PS *ń- and *j-, though, several words point to *i across Eastern Samic. Lehtiranta lists 7 roots beginning with the sequence *jë-, and 10 with *ńë-, that are found in ES. 8 of these have *i-like reflexes in at least one ES variety. This does not seem like a coincidence — similar cases in other consonant environments, including before *ć-, are very rare.

The regular cases are:

  • *jëlkëtē > Inari jolgad, Skolt jõlggâd “flat”
  • *jëllë > I jolla, Sk jõll “crazy”
  • *jëlŋēs > I jalŋes, Sk jââ´lnjes, K jȧ´lŋes, T jɐĺĺŋ́eś “tree stump”
  • *jërŋë > I jorŋa, Sk jõrŋŋ, K jëërn “open water”
  • *jëskë(tē) > I joska, Sk jõskk, K jëëskeð, T jɐsskɐd “quiet”
  • *ńëðē- > I njađđeeđ, Sk njââ´đđed, K ńȧ´ddeð, T ńɐťťed “to affix together”
  • *ńëlë- > I njoollađ, Sk njõõllâd, K ńëlleð “to debark a tree”
  • *ńël-tē- > I njaldeđ, Sk njâ´ldded, K ńȧlldeð “to peel” (a derivative of the previous)
  • *ńëvē > I njauve, Sk njââ´vv, T ńɐv́v́e “rapids”

The seemingly irregular cases are:

  • *jëkē > I ihe, Sk ee´ǩǩ, Ki ï´gg, T jïḱḱe “year”
  • *jëŋë- > I iiŋŋađ, Sk iiŋŋâd, K ïŋŋeð “to dry”
  • *ńëckē- > I njiskođ; — but Sk njõõcksed, K ńȧ´ckseð “to scrape (off)”
  • *ńëkē- > I njihe-, Sk njee´ǩǩ-, K ńï´gg-, T ńïkke- “slanted”
  • *ńëkkē(ńë)- > I njihanjas, Sk njikknâsted, K ńiggnȧ´steð, T ńïx́x́ḱed “to hiccup”
  • *ńëmë- > Sk njiimmâd, K ńïmmeð, T ńïmmɐd; — but I njommađ “to suck”
  • *ńëncē- > I njiʒʒed, Sk nje´ʒʒed; — but K ńȧ´nndzeð “to rip off”
  • *ńëvlē > I njivle, Sk njeu´ll, K ńi´vvl “slime”

There are actually some hints for the conditioning of the split here. After *ń, *ë mostly remains low in original open syllables, vs. is reflected as more close/front in original closed syllables. Vowel length in Skolt seems like an even better indicator that allows also understanding “to suck”. Hence it seems that this change is related to the secondary vowel lengthening that I mentioned last time: only short *ë is palatalized to *i, while lengthened *ë remains. The lack of raising in *ńëltë-, then, might be due to the derivational relationship to *ńëlë-.

Bizarrely, the situation seems to be the inverse for *jë-: going again per Skolt, the lengthened cases are raised/fronted, while the short cases remain.

Furthermore, the interaction of this phenomenon with the previous one does something weird: the change *ńi > /ńi/ fails to occur in several Kola Sami words in this 2nd group (“slanted”, “to suck”, partially “hiccup”)! This is quite mysterious. To route these words in as regular developments, we’d have to assume that *ńë > *ńi only happened after *ńi > /ńi/ — but also before the change *i > /ï/. That is, *ńi > /ńi/ would not represent a simple absense of sound change, but instead some sort of a shunt to a different vowel altogether, later booted back to regular /i/?!

Perhaps these are actually cases of etymological hypercorrection. Suppose that the words with /ńï/ are not actually inherited in Kola Sami, but were loaned from Skolt (or some other eastern but non-Kola variety)? If so, the speakers could have latched on to the usual pattern /i/ : /ï/ and overgeneralized here. Several examples are known of this kind of process even between Samic and Finnic; why not also between the individual Sami varieties?


 

The phonetic nature of the change *ë > *i / {j, ń}_ is interesting too. PS *ë is generally taken to derive from earlier *i (< PU *i, *ü), while at this timeframe *i would have been a long vowel *ī. In this light, I wonder if the palatal assimilation was actually sufficiently early that *ë was still hanging somewhere around the front vowel region, e.g. as [ɪ], and the change amounted to simple raising? If the assimilation operated on an already retracted vowel like [ə] or [ɤ] or [ʌ], I’d rather expect the result to have been a mid front vowel like [e]. — OTOH the Samic languages do seem to be somewhat “allergic” to this sound, so raising all the way back to /i/ does not seem entirely out of the question.

Notes on Eastern Sami vowel history, part 1

Recently I sat down with my copy of J. Lehtiranta’s Proto-Samic dictionary, Yhteissaamelainen sanasto (1989; SUST 200) to work out the development of the vowel systems in the Eastern Samic languages. I do not know if this has been done before; it might have, though I am not exactly worried about rederiving results. [1] At minimum this topic is absent from Sammallahti’s handbook The Saami Languages (1998). His historical phonology appendix covers at length only the evolution from Proto-Uralic to Proto-Samic, and from there to Northern Sami. In the main chapters, too, he only mentions a handful of innovations for Eastern Samic (that he deems diagnostic for defining its taxonomy). Yet it’s obvious that there’s been much divergence going on here: cf. e.g. *kōlē “fish” > Inari Sami kyeli, Skolt Sami kue´ll, Kildin Sami kū´ll, Ter Sami kïĺĺe. [2]

The following fairly general features stand out:

  • The umlaut tendencies that already must have begun in the Proto-Samic era have continued wildly. Most vowels have distinct reflexes before each of the three common PS stem vowels: *-ë, *-ē, *-ō. (Since Lehtiranta only lists citation forms of words, I don’t have much idea what the effect of PS *-ā, *-i, and *-u, which are rare outside of inflected forms, has been.) As usual, this must’ve been allophonic at first, but was later widely phonemicized by loss of unstressed vowels.
  • The mid vowels *ē, *ea, *oa, *ō, *o, *ë have the most varied reflexes. The close *i, *u are mostly unaffected (only Skolt has any umlauts going on with these), and *ā has not been majorly affected either.
  • Although not all languages distinguish all different umlaut “grades” of various PS vowels, I suspect umlauts for the most part regardless occurred in Proto-East Samic already, and that various languages have simply secondarily lost certain distinctions — since they seem to have done so in different ways. E.g. in Inari, *ë-ë and *ë-ō both > /o/, versus *ë-ē > /a/; but in Ter, instead *ë-ë and *ë-ē both > /ɐ/, versus *ë-ō > /o/. Skolt and Kildin distinguish all three types.
  • In addition to the umlaut splits, there also seems to be a vowel lenght split. There is of course no sign of this in Ter, where vowel lenght contrasts have been lost altogether; but it’s found relatively robustly in the other three languages. This seems regardless a little bit more like an areal phenomenon: lengthening in Inari almost always implies lengthening in Skolt, but Kildin corresponds poorer to these, and there are also cases where lengthening is found only in Skolt. As for conditioning, long vowels seem to be the rule of thumb before singleton medials, short vowels more general before two-stop clusters. This includes geminates, so the change must have been earlier than the strengthening of the strong grade of single stop consonants to geminates in Skolt. I’ve not worked out the conditions for other consonant clusters yet.
  • Skolt Sami seems to be altogether the Sami variety with the most complicated vocalism (though Southern Sami could give it a good run for the title). At its best, *ea has no less than seven different reflexes: eä, iä, iâ, iõ, ie, e, ee!

I do have a full correspondence table charted out, but further details shall come later once I’m done dubblechecking things.

All this has clearly had one important effect, though: loanwords seem to frequently “fail to keep up” with all the hair-thin split rules going on. Generally such cases seem to remain phonetically closer to the loaning language. It follows that such loans have to be dated as newer than Proto-Samic; indeed, possibly as newer than the splitting of all dialects in question. Even then, many such loanwords show a distribution across nearly all of the Samic languages. This seems to be another good demonstration of a point I think Uralic etymology needs to pay a lot more attention to: the “distributional principle” (“a word dates to the common ancestor of the languages it is found in”) cannot be trusted in the case of loanwords.

— There’s also one interesting feature that suggests some reinterpretation of the Proto-Samic vowel system. The *-ē-grade reflexes generally seem to be somewhat fronted, when distinct from the “unmarked” *-ë-grade reflexes (cf. e.g. “fish” above). On the other hand, *-ō has had a fairly general lowering effect, not so much a labializing one. This is only natural insofar as *-ō merges with *-ā in Skolt thru Ter. But it does remain a labial vowel in Inari. So what’s up with changes such as *pēŋkë > piegga “wind” vs. *pērkō > piärgu “food”; *mōrë > muora “tree” vs. *mōlōs > muálus “thawed water at shore”? And for that matter, Sammallahti notes that *ō caused also earlier lowering of PU *e, *o to PS *ea, *oa; [3] he posits a relatively open value [ɔː] for the vowel for this reason.

I now have formulated a different hypothesis. The etymological origin of *-ō is unclear — but most proposals have involved a coloring of PU *-a in some fashion. However! If there was indeed a change *-aw > *-o, perhaps this should be postdated to the dialectal Sami era. The following chronology seems to have potential:

  1. Late Proto-Samic: 2nd syllable *-a > *-ā generally changes to PS *-ē, but remains in PS stems of the shape *-āw.
  2. After the W/E split: Secondary *ā-umlaut in Eastern Sami.
  3. After further dialectification: *āw coalesces to *ō in Western + Inari Sami; but merges with *ā in Skolt + Kola Sami.

Of course, this would require looking into the consequences. One issue is that Proto-Samic had not only the traditional *-ō-stems, but also the class of *-ōj-stems. How should these be reconstructed in this system? I don’t think anything with front rounded glides (*-āẅ?!) would work, since PS had eliminated front rounded vowels from its phonology. Maybe *-āwjV?


 

Followups: Part 2

[1] If anything, I consider this a much better way to get a hang of known results than just reading about them from a reference book. Also, I did this kind of a survey on Livonian once before and that ended up with me making a couple of discoveries that have by now grown to a draft paper.
[2] Or, supposedly, with “light” palatalization (UPA subscript half ring), not “heavy” (UPA superscript acute). I’ve seen the similar contrast of “palatal prosody” vs “segmental palatalization” in Skolt Sami transcribed as one of secondary palatalization [tsʲ sʲ nʲ  lʲ] vs. full palatality [tɕ ɕ ɲ ʎ] though — and given how the UPA is surprizingly terrible at representing primary palatals, I’m guessing this is the case for Ter Sami as well. Especially since both languages lack a “heavily palatalized ŕ” where expected, which squares well with how palatal trills are physiologically impossible.
[3] Actually [ɛː], [ɔː] according to him. I have a couple reasons to think these may have diphthongized already early on, though; but that’s ever so slightly off-topic for this post…

Proto-Yukaghir voiced stops (and their implications)

One of the more popular proposals for external relationships of the Uralic family is the Uralo-Yukaghir hypothesis. By certain measures it might even count as the most popular one. The idea has been around for a long while, but in an infuriatingly entrenched state, with views divided between mainstream specialists dismissing everything as speculation, vs. macro-comparativists and several outsiders taking the relationship as more or less granted. [1] E.g. from the humbler and more “professionally credible” end of the latter group, consider Michael Fortescue’s 1998 monograph Language Relations Across Bering Strait: the book makes no attempt to explore the possibility of any Uralic/Yukaghir similarities resulting from anything but genetic inheritance. This is a particularly jarring omission since he does still cover other contact influences relevant to his idea of relating Uralic, Yukaghir, Chukotko-Kamchatkan and Eskimo-Aleut: those between Y + CK, CK + EA, and even between the individual branches of CK and EA.

Research into the hypothesis seems to be finally picking up these days, though. Much of this must have been enabled by Elena Nikolayeva’s ongoing work on the Yukaghir side, culminating in her 2006 monograph, A Historical Dictionary of Yukaghir. After an apparent latency period of diffusion and digestion, a bunch of new views on U/Y relations have emerged here in Finland within the last few years in particular:

  • Häkkinen, Jaakko (2012): Early contacts between Uralic and Yukaghir. [Appendix.] In: SUST 264.
    — An attempt to model lexical correspondences as several strata of loanwords, and to determine what this would imply for Uralic and Yukaghir prehistory in geographical and archeological terms.
  • Piispanen, Peter S. (2013): The Uralic-Yukaghiric connection revisited: Sound Correspondences of Geminate Clusters. In: SUSA 94.
    — A more optimistic take, presuming a relationship and suggesting some new lexical comparisions requiring rather wild new soundlaws.
  • Luobbal Sámmol Sámmol Ante (Ante Aikio): The Uralic-Yukaghir lexical correspondences:
    genetic inheritance, language contact or chance resemblance? [Preprint.] To appear in: FUF 62.
    — A detailed, conservative review, suggesting that the currently known material is too scarce to establish regular sound correspondences, and that therefore many lexical comparisions may turn out to be simply accidental similarities.

According to the word on the grapevine, there is also at least one further paper in the works on the topic.

I have yet to subscribe to any particular hypothesis on the topic (though of course a burden of proof should lie on those claiming a particularly close U/Y relationship). But it seems to me any assessment of the situation is going to strongly depend on our general understanding of Uralic and Yukaghir prehistory. One of the aims of my various ongoing work on Proto-Uralic is indeed to allow better assessing the various external relationships that have been proposed. I present here one proposal for amending Proto-Yukaghir as well.


The presence of voiced spirant consonants (at minimum *ð, *ɣ) have been listed by Fortescue as one of the better phonological markers of his “Uralo-Siberian” group of language families. The phonetic character of at least the Proto-Uralic “spirants” is however anything but clear… And on closer examination, I believe that for Proto-Yukaghir they’re probably a mistaken assumption.

The modern Yukaghir languages — Kolyma Yukaghir and Tundra Yukaghir — do not have any systematic series of voiced spirants. These only show up in Proto-Yukaghir as reconstructed by Nikolayeva. She posits PY word-medial *w, *ð, *ɣ [2] behind the following three sound correspondences:

  • Kolyma /b/ ~ Tundra /w/
  • Kolyma /d/ ~ Tundra /r/
  • Kolyma /g, ʁ/ ~ Tundra /g, ʁ/ (depending on the PY vowel backness)

This is not an immediately obvious reconstruction. Several changes are required here to derive the modern sound values: across-the-line spirant fortition in Kolyma, rhotacism of *ð + sporadic fortition of *ɣ in Tundra. It seems to me it would be more parsimonious to reconstruct here PY voiced stops *b, *d, *g (~ [ʁ]), and to assume only the lenition of *b and *d in Tundra. Note also that the change *d > *r can easily occur directly, without any intermediate *ð stage.

*w is reconstructed also word-initially for Proto-Yukaghir: again reflected as Tundra /w/, but instead lost in Kolyma. This is an odd asymmetry. Normally, glide or spirant fortition is more likely to occur word-initially — for example cf. Spanish and Selkup. [3] On the other hand, *b is not a consonant that is commonly lost word-initially, so reconstructing that here, too, would not help either. I suggest accepting the asymmetry instead of trying to explain it away: reconstructing initial *w- but medial *-b-. This state of affairs still technically allows identifying these two as the same proto-phoneme — which would provide a motivation for my newly assumed shift *b > /w/ in Tundra (and yet not *g > ˣ/ɣ/, which is a more common 1st step in voiced stop lenition chainshifts).

Perhaps there was also an earlier original word-internal *-w-, which was vocalized/lost in all attested Yukaghir varieties; either already in Proto-Yukaghir, or even slightly later on, in which case it might explain some of the numerous irregular vowel correspondences between Tundra and Kolyma.

The history of PY consonant clusters can furthermore be streamlined here. Nikolayeva sets up a set of nasal + voiceless stop clusters such as *mt, *ŋć, *ŋk, and has to assume later voicing to yield the actually attested /md/, /ŋď/, /ŋg/, etc. However, if voiced stops and not spirants are posited for PY, they can easily be reconstructed here as well. Nikolayeva also reconstructs liquid + stop clusters, and notes that the stops “mostly” remain unvoiced in these; yet with some exceptions. It seems these “exceptions”, that correlate neatly between Tundra and Kolyma, could have been in place already in Proto-Yukaghir.

The overall phonotactic pattern here — voiced stops that are restricted to word-medial positions and only contrast with voiceless stops between vowels (and, perhaps, after liquids?) — still suggests that some pre-Yukaghir stage only had voiceless stops; which were then voiced in some medial positions; followed by the introduction of new medial voiceless stops from some secondary source (e.g. geminate voiceless stops, loanwords). Some variation of this history has occurred widely among the Uralic languages, for one. But this is no reason to assume that the change is recent! Dialects of Mokša and Mari have resisted initial voiced stops in loanwords until fairly modern times (18th-20th century), despite medial voiced stops having existed already in Proto-Mordvinic and Proto-Mari times (somewhere around the 1st millennium CE).

Lexical correspondences with the Uralic languages also appear to support this model. I will refer here to Proto-Yukaghir roots by their index numbers in the Historical Dictionary, following Aikio’s paper linked above (it includes a useful appendix of Nikolayeva’s U/Y comparisions).

Considering the labial consonants other than *m, three recurring patterns involving these seem to be attested:

  • PU *w ~ PY ∅ (#620, “tree” ~ “birch”; #1112, “vapor” ~ “smoke”; ? #2050, “to hear” ~ “sound”)
  • PU *(m)p ~ PY *w (#139, “older sister”; #1048, “warm”)
  • PU *pp ~ PY *p (#362, “sharp”; #1038, “to tear”; #2150, “to hit”)

Medial *-w-, *-p-, *-pp- are actually a fairly rare in PU, so even though some of the Uralic roots involved here are uncertain and there are some semantic differences, I find this a not quite trivial tally.

The correspondence *w ~ *w also seems to be absent (#806 “to leave” is a clearly rejectable comparision since the supposed “Uralic” root is a Germanic loan). While the material is scarce and so this could be an accidental gap, it seems regardless preferrable to interpret the material as reflecting the following developments:

  • (pre-)PU *w → pre-Y *w > PY ∅
  • (pre-)PU *(m)p → PY *b (voiced either in pre-Yukaghir or in some loaning Uralic branch)
  • (pre-)PU *pp → PY *p (shortened either in pre-Y or in some loaning Uralic branch)

…which also implies that we should indeed not expect any examples of the correspondence *-w-  ~ *-b- to turn up. [4]

Though this does not seem to generalize to the other POAs. There indeed do not seem to be any recurring correspondences involving intervocalic dental obstruents (or even more suspiciously, any comparisions involving *-t- on either side [5]); and the only recurring intervocalic velar correspondence is PU *x ~ PY *g (#1480, “guard” ~ “hunt”; #2599, “lead, take”). There is also one example each of *k ~ *g (#1302, “hill(s)”) and of *w ~ *g (#1019, “to eat”). These bring to mind the East Uralic development of *-k-, *-w- to *-ɣ-, which seems to suggest that if these comparisions are correct, they probably represent loans rather than inheritance.


Additionally, I wonder if the current issue has partly also been an issue of terminology. Nikolayeva’s model of the history of Yukaghir includes not only the Proto-Yukaghir stage, but also an “Old Yukaghir” stage, which would already have e.g. featured voiced stops in clusters. This is mainly used as a cover term for early historical records prior to the mid-19th century, but perhaps her underlying mental model in full detail actually looks like this:

Proto-Yukaghir > Old Yukaghir > dialectified Old Yukaghir > modern Kolyma Yukaghir & Tundra Yukaghir

Under this scenario, the 1st “Old Y.” stage would be the actual last common ancestor of the recorded Yukaghir varieties, while “Proto-Y.” would be an internally reconstructed entity. It would not be the first time a historical linguist were to abuse terminology in this way.

This is not a random guess. There are a couple other hints for this interpretation, e.g. the treatment of long vowels. Nikolayeva does not reconstruct these in certain positions where they do not contrast with short vowels, even though they appear in all records. She assumes that they must hence be ultimately somehow secondary even in other positions. This does not necessarily follow: consider e.g. Modern English, where “vowel length” (well, tenseness) fails to be contrastive in open monosyllables, in most dialects also before /r/. Regardless of this, and even regardless of numerous reconstructible processes of compensatory lengthening (e.g. light /laɪt/ ~ German Licht /lɪçt/), the vowel length contrast in English is absolutely ancient: it can be traced back all the way to Proto-Indo-European!

(English incidentally and probably coincidentally works as a typological parallel also for my idea that medial *-w- could have been lost earlier on while initial *w- still remained.)

Finally, I can’t help noticing that the long vowel issue and the reconstruction of spirants rather than voiced stops both swerve “Proto-Y.” typologically closer to standard-issue Proto-Uralic. Is this perhaps not an accident, but rather a general bias that has resulted from Nikolayeva’s working hypothesis of a Uralo-Yukaghir relationship?

[1] Incidentally I find it an interesting question why this particular hypothetical relationship is so pervasively accepted by Nostraticists and the like. There is no shortage of competing proposals, such as Indo-Uralic or Uralo-Dravidian; and neither does Uralo-Yukaghir have a history of recognition by the general public, unlike e.g. the Ural-Altaic or Uralo-Sumerian hypotheses. Is it perhaps that the relative obscurity of Yukaghir has made it more difficult to notice weaknesses of the idea?
[2] Yes, I am aware that /w/ is a semivowel, not a spirant, though frequently it may pattern as one (or, perhaps better: “isolated” voiced spirants may pattern as dental/velar glides).
[3] Even more so for geminate glides actually, with some precedents being North Germanic + Gothic (*ww > *ggw, *jj > *ddj ~ *ggj); Northern Sami (*jj > /dj/); Votic (*jj > /ďď/); various Prakrits including Pāli (e.g. *vv > /bb/); and several Berber varieties (e.g. *ww > /ggʷ/). This doesn’t seem to come into question here, though.
[4] There is a development *w > *b in most Samoyedic languages that could allow this, but being post-Proto-Samoyedic (absent from Nenets and Selkup), this might have been too late to be relevant.
[5] This is particularly curious since PU *-t- has, by contrast, Indo-European correspondences in abundance. Any macrocomparativist model that proposed common ancestry for all three, or even just for Y+U, would be hard-pressed to explain why Yukaghir has lost such words so consistently.

Email works again

Whoops. I noticed that the email alias I had been using on my About page no longer works (and might not have worked for a while). I hope this has not led to too many lost messages. :/

Career adjustment in progress

Recently I have presented my first “official” conference talk: Palatal unpacking in Finnic, based on an old blog post series. A humble step forward on my ongoing project of swapping the career/hobby statuses of the two fields of research I am currently most involved with: math and linguistics. Or perhaps a step sideways, rather?

The title of this blog will still remain “Freelance Reconstruction” for the time being, but if all goes well I’ll have to think up a new name within a few years.

I never did finish the supposed blog post part 5, but what I had planned for it checks out: there are no cases with traditionally reconstructed PU *-ć- or *-ś- that would get in the way of my new proposal for the later Finnic development of these. I’m assembling a full article on the matter as well. Time will tell if it will be fit for release on its own or if I’ll integrate it into other work. Say, a wider analysis of the historical vowel coloring effects of the Proto-Uralic palatal consonants?

— A side observation: STEM background seems to be not too rare at all on the linguistics side of the Internet. Out of the small handful of blogs I check with some frequency, I am under the impression this holds also for at least Lameen S. of Jabal Al-Lughat and Steve D. of languagehat. Many further cases can also be found in the para-academic linguistics scene centered on online mailing lists such as Cybalist. Nerds of prey flock together, of course, but there might be a deeper selection bias of some sort involved here too…

Two Lemmata: PU *ë, PMs *ee *ëë *oo

Not “lemma” in the usual linguistic “citation form” sense, but in the mathematical “intermediate result” sense. I’ve noticed having to clarify these topics at quite a few points, so here’s a single post for the purpose. I’ll keep it brief here, i.e. without going into detailed presentation of the underlying etymological material… though that could be arranged too, if someone so requests?

Proto-Uralic *ë

A back unrounded non-open vowel, contrasting with the more basic *a and *o, has been reconstructed for Proto-Uralic or Proto-Finno-Ugric at various times. Originally, this was motivated by the appearence of a back unrounded /ëë/ [ʌː ~ ɤː] in certain varieties of Mansi; and of corresponding /ïï/ [ɯ] in Eastern Khanty. A “new” such vowel was established in Janhunen ’81, [1] on the basis of a correspondence of Proto-Samoyedic *ë and *ï (in largely complementary distribution with each other) to West Uralic *a. PSmy *ë and *ï also correspond regularly to Ob-Ugric cases of /ëë/ or /ïï/ — hence the reconstruction of a distinct PU *ë rests now on quite firm ground. Further traces of this vowel can actually be identified in most Uralic languages west of the Urals.

There has been some uncertainty here ever since Janhunen’s paper, though. For reasons not fully elucidated, he prefers to reconstruct a close vowel *ï instead of a mid vowel *ë, although his actual evidence does not explicitly support a close value for this vowel. What arguments he does give are based solely on an (IMO mistaken) analysis of the PU 2nd-syllable vocalism, without addressing the situation in the 1st syllable. This problem has been only halfway addressed by the treatment in the other current-day key work on PU reconstruction, Sammallahti ’88: according to him, 1st-syllable *ï would have lowered to *ë at the level of “Proto-Finno-Ugric” edit: “Proto-Finno-Permic (whose existence I reject).

As a survey of the later reflexes will show, Sammallahti’s conclusion that most western Uralic languages point to *ë rather than *ï is correct. It must, however, be extended for Ugric and Samoyedic as well, which leaves no option but to reconstruct original *ë.

  • The languages of the West Uralic group (Samic, Finnic, Mordvinic) show a development *ë > *a in all positions, suggesting a relatively open value. *a does get further shifted to *ā > *ō and later yet diphthongized to *uo in Samic; but on the basis of Proto-Germanic and even Proto-Scandinavian loanwords, this can be seen to be a fairly late development. [2] Under certain conditions, the same process happens in Finnic as well. (An older PU/PFU *ō used to be reconstructed for these words during the mid 20th century, but this can be recognized as no longer necessary and would, at any rate, run into several difficulties in explaining the reflexes in the other Uralic languages.)
  • Mari and Hungarian also show the merger *ë > *a, but only before 2nd syllable *a. Possibly this can be analyzed as an assimilation development.
    Before 2nd syllable *ə, both languages have a distinctive reflex: Mari *ü, Hungarian *ï (> modern H í). Although these are close vowels, they in fact point to an original mid value: the original PU close vowels *i *ü *u are reflected in both languages as reduced *ɪ *ʏ *ʊ (> modern H short mid e ö o). In both languages, the unreduced close vowels normally derive from mid or open PU vowels under various conditions. [3]

    • *ä > *i in Mari in e.g. *äjmä > *imə “needle”, *lämpə > *liwä- “to warm up”
    • *e > *i (> í) in Hungarian in e.g. *wetə > víz “water”
    • *o > *u in Mari in e.g. *kota > *kuðə “house”, *oksa > *ukš “branch”
    • *o > *u (> ú) in Hungarian in e.g. *molə- > múlik “to pass by”
  • In Permic, *ë is normally reflected as *u. While a close vowel, this is also the default reflex of PU *a and *o, again suggesting a relatively open original value. Additionally, PU *u is reflected as *ï — so even if PU *ï were reconstructed, an intervening development *ï > *ë would still have to be assumed here to route this vowel out of the way of *u. [4]
    Under certain conditions, there is also a development *ë > *ë (e.g. *sënə > *sën “sinew”), which looks like a retention. All these words have *ə in the 2nd syllable; so perhaps the initial step of this split, too, was a lowering *ë > *a / _(C)Ca (and also in some other environments), later followed by *a >> *u.
  • Mansi reflects *ë as a long vowel *ëë. Even if we were to accept the currently commonly accepted reconstruction as a long close *ïï (see below), this regardless point to an original non-close value: the other PMs long vowels uniformly derive from PU open and mid vowels. The PU close vowels *i, *ü, *u are meanwhile uniformly reflected as PMs short vowels, even if they are also generally lowered. So again, even if PU *ï were reconstructed, we would have to posit a fairly early lowering to *ë, for this vowel to participate in the general lengthening of non-close vowels that seems to have occurred in Mansi.
  • The Samoyedic vowel split *ë > *ë, *ï cannot be a priori resolved in favor of either starting point: the stated conditions can be easily reversed. Janhunen ’81 suggests that *ë occurs in closed, *ï in open syllables, either of which would make a plausible environment for a vowel shift.
    However, there is circumstantial evidence against *ï as a starting point. The PU close *i and *u are split in Samoyedic as well: either retained as *i, *u, or reduced to *ə. Yet, they are not lowered to the corresponding mid vowels *e, *o. If PU *ï were reconstructed, the expected Samoyedic split would therefore be *ï ~ *ə, not *ï ~ *ë. [5]
    There are moreover some cases of *ï or *ë of irregular/unclear origin. These include no examples of close to mid development (*u > **ë or the like), but at least one mid to close development: *joŋsə > *jïntə “bow” — perhaps a case of glide-induced coloring. One parsimonious explanation would be to assume here first *o > *ë, then *ë > *ï along the other cases. This depends on how we model the *ë/*ï split exactly, though, and since there are also examples of *u > *ï (at least *kuŋə > *kïj “moon”), it’s also entirely possible that the history here has been *o > *u > *ï instead.
  • The Khanty situation is complicated and does not seem to allow clear conclusions. The main reflexes seem to be *ïï and *aa (in a largely similar distribution as in Samoyedic). The other PKh close tense vowels [6] *ii *üü *uu generally go back to PU open or mid vowels, so the first reflex could be seen as a point in favor of original mid *ë. PKh open tense *ää *aa in other positions likewise also mostly go back to PU open or mid vowels.
    On the other hand: the PU close *i yields PKh mid tense *ee by default, and PU close *u can yield PKh mid tense *oo and *ɔɔ under certain conditions. PKh also conspicuously lacks a mid back unrounded *ëë. If PU *ë/*ï >> *aa went thru an intermediate *ëë stage (the lowering *ëë > *aa has direct parallels in the Mansi dialects in contact with Khanty), then this reflex could suggest an original close value.
    — Some newer reconstructions of Proto-Khanty posit *ä and *a in place of *ee and *oo, though. It would be possible to suggest that the last was actually labial *å and to argue that *aa < *ëë < *ë < *a, to maintain my previous idea in place. But alternately, deriving /oo/ from *a, as found in the generally fairly conservative Far Eastern and Far Northern Khanty, would seem to make better sense, if the development were *a > *aa > *oo; and if so, original “*aa”, unaffected by these changes, would have to have been *ëë at this time. And we’d then be back to a similar argument as seen with Mansi: PU close > PKh lax, vs. PU open/mid > PKh tense.

The evidence thus seems quite clear: reconstructing mid *ë is preferrable to reconstructing close *ï. Some of the Khanty evidence may point to *ï, but given the complicated history of Khanty vowels, this should not count as decisive.

Typologically, the reconstruction of mid *ë without a close counterpart *ï is also unproblematic. A similar situation can be observed in e.g. Votic and Estonian, with only ‹õ› /ɤ/ [7]; many dialects of modern English, with an open-mid /ʌ/ for ‹u› in words like strut, and even a rhotic counterpart /ɚ/ in words like nurse; or Bulgarian, with ‹ъ› /ɤ/, descending from Proto-Slavic *ъ /ʊ/.

True, there are also a great many languages with a superficially unpaired non-open back unrounded vowel. Yet such languages tend to have simple vowel inventories along the lines of /i ɨ u e a o/, where /a/ can be analyzed as the open counterpart of /ɨ/! The same applies to the pan-Turkic vowel system /i ü ı u e ö a o/ (/ı/ might vary from [ɨ] to [ɯ] I think; any Turkologists passing by are welcome to set me straight). OTTOMH the only language that would have both a three-degree height contrast, and an ï-type vowel without an ë-type one, is precisely Eastern Khanty.

Proto-Mansi long mid vowels

The PMs vowel system is normally reconstructed as contrasting two degrees of both height and length. The long vowels comprise five units: the open vowels *ää, *aa, and the non-open vowels *ee, *ëë, *oo.

What I write here as *ee and *oo have been traditionally reconstructed as close *ii and *uu. *ëë has moreover been reconstructed as *ïï since Honti ’82, including many default reference works such as Sammallahti ’88.

While I agree with the idea that these three vowels should be treated as a single set, I belive Honti got this adjustment the wrong way around. This is because the majority treatment seems to be mid values:

  • *ee: Reflected as mid /ee/ in most varieties of Mansi. Close /ii/ is found in most positions in Southern Mansi, and in a couple of words also Western and Eastern Mansi.
  • *ëë: Reflected as mid /ëë/ or open /aa/ in all varieties of Mansi.
  • *oo: Reflected as mid /oo/ in Southern Mansi, but as close /uu/ in the Core Mansi varieties (West+East+North). I assume the latter value is due to a chainshift: PMs *aa shifts to /oo/ in these same varieties.

Etymology also supports mid values for these vowels. *ee is a reflex of PU *e under unclear conditions; *ëë is the main reflex of PU *ë (which I hope to have just now established as indeed a mid vowel); and *oo is a reflex of PU *a and, probably under some conditions, PU *o. It strikes me as terribly inefficient to assume that these vowels first became close, then proceeded to again become mid vowels widely across the Mansi varieties.

Then there are the known general principles of length/height interaction in vowel shifts:

  • Long vowels tend to be raised
  • Short vowels tend to be lowered
  • Open vowels tend to be lengthened
  • Close vowels tend to be shortened

…which come into action particularly well in vowel shifts involving general restructuring of the vowel system. [8] I can think of tons of examples (e.g. pretty much everything relevant that happens during West Uralic > Proto-Samic), while counterexamples are much rarer. [9] These in mind, it is already a priori preferrable to reconstruct any unconditional /ii/ ~ /ee/ or /uu/ ~ /oo/ correspondences from original *ee or *oo, not *ii or *uu.

Summing up, everything seems to check out: *ee, *ëë, *oo is a superior reconstruction equally well from the viewpoint of the attested Mansi varieties; the viewpoint of Proto-Uralic; and the viewpoint of typology of sound change.

—Note however that I am only arguing about phonetical reconstruction here. Phonologically speaking, I have nothing against an analysis according to which these vowels would have been distinguished from *aa and *ää by being simply [+close]. Yet, seeing how the Latin letters ‹i u› very much suggest non-mid values, we’d be better off using the available mid vowel base symbols ‹e o› instead. In my opinion broad transcription generally ought to be user-friendly rather than maximally adherent to any particular theory.

[1] Please cf. the newly published Bibliography page!
[2] E.g. Proto-Germanic *wētjō- > Proto-Scandinavian *wātjō- → pre-Proto-Samic *waććo > Proto-Samic *vōććō > Northern Sami vuohčču “bog”.
[3] Hungarian also has some long close vowels representing older *VwV sequences: eg. *ńomala > *ńowɜl/*ńuwɜl (?) > nyúl “hare”, *täktɜmɜ > *tätɜw > tetű “louse”.
[4] Technically, a labiality detour could also be arranged: *ɨ *u > *ɯ *ʉ > *u *ɨ? But this seems contrived — not the least for requiring an intermediate stage during which there are two non-front close vowels around, neither of which is [u].
[5] There is some uncertainly in this argument though, since no lowered reflex of the 3rd PU close vowel *ü is found — neither as *ə nor *ö. For that matter, the other PU mid vowels *o and *e don’t quite match the behavior of *ë either: *o splits “downward”, to yield *å~*o; while *e stays around as is (becoming /i/ later on in most Samoyedic languages, but per the evidence of Nganasan, not yet in Proto-Samoyedic).
[6] I will remind that although I use “single”/”double” transcription for Proto-Khanty vowels, just as also for e.g. Mansi and Finnic, this does not indicate a length distinction, but instead one of tenseness: the more numerous “double” vowels are the unmarked ones.
[7] A corresponding close y /ɯ/ has developed in South Estonian, but this is a later innovation. Livonian has similarly later expanded its set of vowels by ȯ, described as /ʊ/ or /ɯ/ in different sources.
[8] Conditional splits in the vowel system: umlauts, coloring effects, length changes due to prosodic factors… are a different issue.
[9] Though not nonexistent: two cases that come to mind are Northwest Germanic, where *ē > *ā but *e ≡ , and late Proto-Slavic, where *a > *o but *ā > *a.

Depalatalization: common East Uralic after all?

Recently I’ve gotten one project underway to a usable shape: the assembly of a database of Proto-Samoyedic vocabulary. So far this includes the PSmy roots listed in main source on the topic: Janhunen’s Samojedischer Wortschatz (1977, Castreanumin toimitteita 17), their distribution (though not reflexes) in the individual Samoyedic languages, as well as addenda from works by other researchers, currently mainly Helimski and Aikio. (Literature recommendations are welcome.)

One thing that’s drawn my attention so far has been PSmy *ns. This is an interesting cluster, as it at first glance seems to refute one of the points in favor of the East Uralic hypothesis: the depalatalization of Proto-Uralic *ś. Though the development *ś > *s is default in all four eastern branches, it seems that in Ugric, an affrication development *nś > *ńć had interfered before this; e.g. Hungarian húgy, Mansi *kuńć-, but Samoyedic *kunsə “(to) pee”.

However… it seems that PSmy “*ns” may also have been a similar palatal cluster, rather than a simple dental/alveolar one. This yields palatal and/or affricate reflexes in three of the six Samoyedic languages:

  • *nc in Nenets (e.g. PSmy *tånsə > Tundra Nenets tānc “lizard”)
  • *š in Selkup (e.g. *tånsə > *töšə “id.”)
  • ndž in Mator (e.g. *tånsə > tandžə “id.”)

The first change of these could certainly be plausibly a later development. However, the palato-alveolar reflexes in the latter two are quite unexpected, if we start from regular *s. And in light of these, perhaps the (Tundra) Nenets value is also best analyzed as an archaism, deriving from PSmy *ńć? Which would then allow dating the affrication here, as well as the following *ś > *s, already to the East Uralic level!


As for the plain /ns/ in Enets and Nganasan, this could turn out to be an areal development. Yakut (& its dialect/sister Dolgan), the Turkic eastern neighbor of the northernmost Samoyedic languages, is known to have undergone a development *č > *s. The Yakut reflex of Proto-Turkic *nč is however a cluster transcribed which does not sound likely to have gone thru an *s stage. Still, perhaps the Taimyr Samoyeds picked up only the change *č > *s per se and applied it in the context where they were able to?

This discovery also raises the question if Proto-Samoyedic *s might have had an allophone *ć in other positions as well. The Nenets affricate allophone *[c], at least, turns up predictably after other consonants as well. So how do the other languages fare here?

  • *ls: Selkup *ls (1 example). No reflexes in other Samoyedic languages.
  • *rs: Selkup *rs (1 example). No reflexes in other Samoyedic languages.
  • *ps: Nganasan /ps/, Enets /č/, Kamass /ps/, Mator ps. Selkup has *ćaapsə “skewer”, *ćops “cradle” vs. *qapšə “shaman’s spoon”.
  • *t³s: Nganasan /s/, Enets /t/, Kamass /š/, Mator. Selkup has *sëësan “storage shed” vs. *täšə- “to be cold” vs. *tïsat ~ *tïšat “tongs”.

Preliminarily, this does not seem like particularly strong evidence for original affricates here, although I am tempted to dismiss the poorly attested cases of *ls and *rs as possibly areal rather than inherited roots. (Liquid+sibilant clusters were not permitted in Proto-Uralic at all.) The occasional Selkup forms with unexpected *š still suggest though that there might be something going on here after all. Likewise Enets /ličo/ “cradle”, which is of Uralic origin (cf. Mordvinic *lafś “id.”) — and probably related to Finnic *lapci “child”, which indicates specifically *ć rather than *ś as well. Perhaps more digging for unexpected cases of *š in Selkup would be fruitful.

Links at last

I’d been long putting off updating from the WordPress default links on here for some reason. But no longer! Gaze at the blog’s sidebar and its gradually growing collection of more or less relevant information from around the web.

My humble About page, which I thought I had published months ago (what do you mean there’s no default link to it?), is also now accessible.

— Edit @ 0:48: there is now also a Notation page in case anyone is getting confused about all the abbreviations etc. thrown around here.

*ŋ in Ugric and Uralic: A Proto-Phoneme in Need of Cleanup

I’ve previously posted about the Proto-Uralic “dental spirants”, and on the problems concerning their reconstruction. These are however far from the only PU segments whose reconstruction involves unsolved difficulties.

The velar nasal *ŋ provides examples of some different types of problems. Thanks to direct retention in a good number of Uralic languages (most consistently in Samic, Mari, Khanty, and most parts of Samoyedic, but also in dialects of Erzya and Udmurt), there can be little doubt that a phonemic velar nasal occurred in Proto-Uralic. However, the historical development of the consonant is speckled with irregularities, and uncertainty on where it is and is not to the reconstructed.

A major part of the puzzle is the Ugric treatment. No single regular reflex can be identified in these languages. Instead the consonant appears to bifurcate, with no clear conditioning. One portion of words excretes a following homorganic plosive: *ŋ > *ŋk, further > Hungarian g. Clear examples include e.g. the word for “mouse”: PU *šiŋərə > H egér, Mansi *täŋkər, Khanty *ɬööŋkər; contrast Finnic *hiiri, Mordvinic *šeŋəŕ > šejər ~ ševər, Permic *šɨr, etc.
Another portion however fails to participate in this development. In Khanty, a plain nasal *ŋ remains in such words, while in Mansi and Hungarian, the initial reflex seems to have been a spirant *ɣ (which may later have vocalized). For these words, we might posit a distinct Proto-Ugric *ŋ. This split, although never explained, has been one of the traditional arguments for establishing a distinct Ugric group in the first place.

One front of attack that I think could help make some sense of the Ugric mess is to extend the scope of analysis to the other Uralic languages. I’ve conducted an initial survey on the matter, and one result appears to be that none of the > *ŋk words have secure Samoyedic reflexes. Which may sound uninteresting, but does have interesting consequences: as is the case for most of the other supposedly Ugric phonetic innovations (*s > *ɬ, *ś > *s, *k > *ɣ, etc.), there is here, too, no explicit reason to exclude Samoyedic! That is, *ŋk in these words might as well also be an older, East Uralic feature? At least one unclear comparision, PU *jäŋə “ice” (> Ms *jääŋk, Kh *jööŋk id.) ~ PSmy *jåŋkå “hole in ice” possibly even shows the exact same change (unless this is to be analyzed as a suffixed derivative: *jåŋ-kå < *jAŋə-kA). In all cases where PU *ŋ is reflected as PSmy *ŋ (e.g. *suŋə > *təŋə “summer”), the Ugric languages, too, point to “unbroken” *ŋ (here: Ms *tuj, Kh *ɬoŋ id.)

The Permic evidence offers some interesting views as well. More on this particular front of investigation later as the research develops, though — I’ve several new ideas, but the details remain in flux by this point. Ask me again in a couple of years if I have not found the time to return to this subject before then. :)


A look closer inward, that is investigating the vocabulary particular to Ugric, reveals further complications as well. The relatively clear split between *ŋ and *ŋk that is present in the oldest Uralic material dissipates here into chaos. [1] In many examples Hungarian points to *ŋ while Ob-Ugric to *ŋk, or vice versa; sometimes Mansi and Khanty disagree with each other, too. Multiple explanations would be possible in this situation… My own working hypothesis is that “Ugric” is merely a western areal subset of the East Uralic branch, and that much of the vocabulary shared between the three branches is to be analyzed as later diffusions, not as common inheritance. Perhaps the change *ŋ > *ŋk is to be dated as an areal innovation as well — but this remains to be seen.

There is also at least one correspondence where positing original *ŋ seems simply mistaken. This is a decent-sized set of words where Hungarian g corresponds to Ob-Ugric *ɣ. To run a small case study, examples include at least:

  • H ág ~ Ms *taɣ ~ Kh *ɬaɣïï “branch”
  • H fog- ~ Ms *puw- “to grasp”
  • H nyereg ~ Ms *naɣr ~ East Kh *nööɣər “saddle”
  • H szaguld- “to rush” ~ Ms *šoom- ~ Kh *saaɣəL- “to gallop”, Kh *suuɣəm “jump”
  • H tegez ~ Ms *täwt ~ Kh *tüüɣət “quiver”

First off, note that none of these reflexes shows any direct evidence for an original nasal. Usually at least Khanty retains *ŋ as is, and yet here we get *ɣ instead. There also seem to be no examples of the “next-most regular” correspondence: H g ~ Ms *ɣ ~ Kh *ŋ.

I suspect that these words are to be explained, not from a Proto-Ugric *ŋ that was expanded to *ŋk only in Hungarian, but as relatively recent parallel loans, and that the loan originals featured the voiced stop *g. In Hungarian the option to carry this over directly was available; while in Ob-Ugric, where phonemic voiced stops seem to have remained alien ever since Proto-Uralic, the closest substitute was *ɣ.

Numerous other phonetic irregularities also appear here, some of which could also have resulted from parallel loaning: most prominently Mansi *š in “to gallop”; Hungarian ny and Core Mansi *a (but Southern Mansi näwrää!) in “saddle”; Khanty *üü in “quiver”.

Note moreover how the meanings “gallop”, “saddle” and “quiver” are all specialized cultural terminology, and hence these are likely to be loanwords related to the Ugric peoples’ adoption of a steppe nomad lifestyle.

In some cases, loanwords of this layer might have reached Permic, too. As a particularly clear case, Hungarian reg “morning” has been considered related to Permic *rög “warm” — yet, Ob-Ugric #reɣ “warm” seems equally relevant. Equating all three is however not possible if we insist on common inheritance: on one hand, the change *ŋ > *ŋk clearly never extended into Permic, and on the other hand, there is no evidence for a reduction *ŋk > *ɣ in Ob-Ugric. Loaning from Hungarian to Permic could be posited, but if so, why not loaning from “Language(s) G” to Permic directly? The latter scenario seems slightly preferrable in light of another example: Permic *mög “riverbend” ~ Ob-Ugric #mVɣəɬ “around, circle”, for which no Hungarian cognates are known. [2]

Even further evidence for a non-nasal origin for this correpondence can be teased, I believe, out of its occurrence in an apparent consonant cluster:

  • buzog (buzg-) “to seethe” ~ Ms *pëësɣ- ~ Kh *paasəɣ- “to drip”

If we were to be consistent with the earlier view, and to reconstruct Proto-Ugric *pësŋ-, this would be the sole example of an obstruent + nasal cluster in the inherited Ugric lexicon! [3] Additionally, the g/ɣ issue is not the only suspicious feature — no, literally every sound correspondence between Hungarian and Ob-Ugric (z ~ *s, u~ *ëë, b ~ *p) is irregular here. The first of these seems especially telling: if an original *g is assumed, it would be plausible to also assume that the first member of the consonant cluster was not *s, but the likewise voiced *z. This then can be assumed to have been substituted by voiceless *s in Ob-Ugric, but by *z in Hungarian… which would bring us to come one step closer to eliminating the sometimes supposed “sporadic” voicing of medial *s in Hungarian. [4]

Theories involving unattested substrate languages offer, of course, an easy way to explain whatever one wishes. Does my new explanation actually offer any advantage over the traditional view of leaving the origin of “areal” words open? Certainly, much of this hangs in the air so far, but it should be possible to seek further evidence. Perhaps known Turkic / Mongolic / Iranian loanwords can confirm or deny my supposed substitution pattern *g > Hungarian *g ~ Ob-Ugric *ɣ. Closely combing thru the vocabularies of these families (and, perhaps, Tocharian?) might even be able to turn up cognates for some of these words… though don’t hold your breath.

[1] I know there’s a handy paper covering this topic out there… but alas, I am failing to relocate it right now. Please drop me a hint in the comments if you have the reference at hand?
[2] Though there’s moreover Samic *moaŋkē “bent”, which seems difficult to integrate into any southeasterly loanword scenario.
[3] There are examples in the common Ob-Ugric lexicon — e.g. #asma “pillow” — but these seem to be generally derivatives.
[4] This explanation does raise one question. Hungarian /z/ generally originates from Proto-Hungarian *ð, and a substitution *z > *ð would not seem particularly expected. Should we assume that [z] actually first originated as an allophone of /s/ before voiced stops in loanwords? There seem to be quite a few words in Hungarian with the cluster -zəg : -zg-, and various similar ones such as -zd-, though I do not know if any of these are a part of the language’s oldest loanword layer.