The fate of *w in Altaic

A fairly striking typological commonality between the “micro-Altaic” language groups: Turkic, Mongolic and Tungusic (Tk, Mg, Tg) is the lack of a labial glide such as /w/.

This is clearly out of line among both the world’s languages in general, and Eurasia in particular. /w/ is one of the most common phonemes in the world’s languages, that can usually be found even in languages with seriously impoverished consonant inventories such as Hawai’ian (at ZBB we [1] once compiled stats on this thing), and there is no shortage of *w in any of the other major language families hanging out nearby: IE, Uralic, Semitic, Dravidan, Sino-Tibetan, Austronesian, Eskaleut, you name it. Even in languages that lack /w/ precisely (Finnic, Slavic, most non-English Germanic…), it has usually not gotten too far off-field and has merely become a more frontal labial continuant such as /β/, /v/, /ʋ/. Yet none of these can be found in Turkic / Mongolic / Tungusic either. This clearly means that any long-range relationship hypotheses like Nostratic, Eurasiatic, Ural-Altaic will need to explain whatever happened to *w in Altaic.

There are two main hypotheses going around that I know of: *w > ∅ versus *w > *b. The former is the stance of some old-school Ural-Altaicists like Räsänen, among Nostraticists apparently Bomhard [2] and I gather also Illič-Svityč. The latter is the stance of, at minimum, Dolgopolsky. (He proposes also *w > ∅ before labial vowels in Turkic. [3])

I think the actual answer is neither of these, and the demise of *w is only post-common Altaic (if such a thing existed at all) — since comparison with Uralic seems to be able to show a fair number of good examples of both developments, yet strongly split according to their distribution. It does not really matter for this purpose if the comparanda are real cognates or loans … but see below for a hypothesis.

In the following, I have stuck to the clearest data, where comparison with Uralic seems, usually on semantic grounds, preferrable to or at least equally good as the proposed Altaic connections. Checking up on the non-EDAL lexicon of the languages would probably also turn up something, but I will leave that for later.

1. Turkic: *w > *b

(1.1) *bāj ‘rich, noble’ ~ Samic-Finnic *wäjä- ‘to be able, have power’, Hung. vív ‘to fight’
(not worse than ~ Mg ‘strong’, Tg ‘many’, Jp ‘to surpass’)

(1.2) *bakɨr ‘copper’ ~ PU #wäśkä ‘(reddish) metal, ? copper’ > Khanty *wăɣ ‘iron’
(rather than ~ Mg ‘patina’, Jp ‘dust’)

(1.3) *balk- ‘to shine’ ~ PU *wëlkəta ‘light, white’
(rather than ~ Mg *mel-, Tg *mial- with no **-k-; Ko *mark- may or may not belong; maybe here also Tg *beli ‘pale’, rather than ~ Mg ‘dark’)

(1.4) *bań ‘fat’ ~ PU *wajə ‘id.’
(rather than ~ Mg ‘churn’, Tg ‘storage’)

(1.5) *bek ‘firm, stable’ ~ Samic-Finnic *waka ‘id.’
(rather than ~ Mg Tg ‘big’)

(1.6) *bejŋi ‘brain’ ~ PU *wajŋə ‘breath, spirit’ > Selkup *kȫŋə ‘brain’
(rather than ~ Mg ‘forehead’)

(1.7) *bij- ‘sharp edge’ ~ Samic-Finnic *wijə- ‘to be sharp’
(rather than ~ Mg ‘to crush’, Tg ‘to mince’)

(1.8) *(b)ōl-, Mg *bol- ‘to become’, Japonic *wər- ‘to be’ ~ Uralic *(w)alə- ‘to be’ > Ob-Ugric ‘to be, to become’

(1.9) *burun ‘nose’ ~ PU *wara ‘mountain’ > Hung. orr ‘nose, †peak’
(rather than ~ Jp Ko ‘beak’)

(1.10) *būt ‘leg’ ~ Samoyedic *utå ‘hand’
(Tg *begdi may or may not belong)

(1.11) *dabul ‘wind’ ~ PU #tɜwlə ‘id.’
(rather than ~ Mg ‘typhus’, Tg ‘to be infected’)

(1.12) *debe ‘camel’ ~ Samoyedic *tëə < ? *tëwə ‘(tame) reindeer’
(rather than ~ Mg *temeɣen; long compared also with isolated Karelian tevana ‘elk cow’ (often mis-cited as Finnish))

(1.13) *sib- ‘to spin thread, pull out fibre’, Tg *sib- ‘id.’ ~  PU *siwə ‘fibre’
(rather than ~ Mg ‘to tuck up’)

I suspect most of these to be loans into Turkic from early Ugric, and in the case of *bōl-, thence into Mongolic. At least #wūta is probably better taken as a loan in the opposite direction, since this is innovative vocabulary replacing PU *kätə (and Samoyedic does not tolerate **wu-). Perhaps likewise for #dewe.

For a few IE parallels, I can moreover mention e.g. Tk *basu ‘hammer’ ~ II *wadźra- ‘hammer, mace’; Tk *ebin ‘grain’ ~ IE *yewo- ‘id.’; Tk *gēb- ‘to chew’, Mg *gebi- ‘id.’, Tg *keb- ‘to bite’ ~ IE *ǵyew- ‘to chew’. Comparison with Japonic would also immediately provide examples for *w > *b. There has been some debate on if *b or *w should be reconstructed for Proto-Japonic, but as far as I gather, *b has been assumed for ease of Altaic comparison, while most of the actual data clearly sides with *w. [4]

*w > *b also has good areal parallels, being found in both the north(west)ern and south(west)ern neighbors of Turkic: on one hand widely in Samoyedic, viz. in Enets, Nganasan, Kamassian and Mator (partly even in Yurats and eastern dialects of Tundra Nenets), on the other, in East Sakan (Khotanese and Tumshuqese).

There is also one notable exception where Turkic seems to have *w > ∅ instead: *öl- ‘to die’ ~ PU *widə- > Hung. *ül- > öl- ‘to kill’. This isolated example could be, however, merely an accidental similarity, esp. since the semantics are off. (‘Die’ and ‘kill’ are close enough concepts, but usually do not interchange without causative / anticausative morphology.) Contrast also ‘nose’, where we seem to have *wu > *bu in Turkic but *wu > *u > o in Hungarian.

All in all, the details may use further fine-tuning, but I think there is good evidence to assume that earlier *w develops into *b in Turkic. Contrary to what I earlier commented on this topic though, it is also easy enough to find equally good-looking cases of Turkic *b ~ Uralic *p (e.g. *bas- ‘to press’ ~ *puńćə- ‘to press, squeeze’, *beliŋ ‘panic’ ~ *pelə- ‘to fear’, *bɨč- ‘to cut’ ~ *päčkä- ‘id.’, *bulun ‘cloud’ ~ *pilw/ŋə ‘id.’), so probably this was still a merger with a pre-existing *b.

2. Mongolic: *w > ∅

Supported by less data, but even fairly tight reins on semantics still allow finding some evidence.

(2.1) *oŋgi ‘hole’, Tg *uŋgV ‘id.’ ~ PU *woŋkə ‘id.’
(rather than ~ Tk ‘to dig’)

(2.2) *ök/g- ‘to give’ ~ PU *wexə- ‘to take somewhere’ > Samoyedic *ü- ‘to drag’
(rather than ~ Tk, Tg ‘to heap up’; maybe here better Tg *bū- ‘to give’?)

(2.3) *udu- ‘to lead’ ~ PU *we/ätä- ‘to pull, lead’, PIE *wedʰ- ‘id.’
(rather than ~ Tk ‘to send’)

(2.4) *usu ‘water’ ~ PU *wetə ‘id.’
(rather than ~ Tk *sɨb)

(2.5) *üdže- ‘to see’, Tg *edže- ‘to understand’ ~ PU *weńćä- ‘to look, watch’ [5]
(rather than ~ Tk ‘to think, understand’; ‘understand’ is surely secondary in both etymological groups, and ‘think’ ~ ‘see’ does not match)

(2.6) *ündü-sü ‘root’ ~ PU *wanča ‘id.’
(possibly suggests that PU *č < *ts or *tU; Tg *ŋǖŋte may or may not belong)

A possible IE parallel that looks like it could have been transmitted thru Uralic: *ös ‘revenge, hate’ ~ II *dwiša- ‘hate’ (→ Permic, Finnic #wiša) (not worse than ~ Tk, Tg ‘bad, evil’). This is not attested in Ugric or Samoyedic, though, unlike all of the above examples.

The different treatment here is possibly however simply due to geography / relative chronology and not due to an actually different native development. Mongolic is a more eastern family, and may have gotten rid of *w already before contact with Uralic or some flavor of Para-Uralic — perhaps still indeed by > *b as per comparison with Turkic. So the correspondence here might indicate that in later loans, *w was substituted as zero.

I have not managed to find any reasonable-looking cases of Mongolic *b ~ Uralic *w (other than ‘to become’, see under Turkic).

The loanword layer interpretation can be also supported by how for Tungusic I cannot on a quick look-around find any clear etymologies of either type at all (i.e. where comparison with Uralic would be clearly preferrable to supposed Altaic origin). You can find some Tg cognates above under both my Turkic and Mongolic comparisons, but they might be loans. I could still add a few word-internal cases suggesting *w > *b, though: *dolba ‘night’ ~ Samoyedic *tålwə ‘dark(ness)’ (no worse than ~ Mg ‘to stay up overnight’); *nebi ‘new’ ~ PIE *new-.

[1] “We” being at least 90% the OP “Nortaneous” (lingblr yeli-renrong); myself probably not more than 5%, and a handful of remaining people suggesting single datapoints.
[2] He does not explicitly say so, and in his book leaves the Altaic column empty in the overview of Nostratic sound correspondences; but the few examples he has of a root with *w- being reflected in Altaic show zero onset.
[3] Well, “with rounding of the adjacent vowel”, but I would not buy any current claims about Proto-Nostratic vowel reconstruction with a nine-feet pole.
[4] As for Korean, the modern language has /w/, but I have the impression it mostly occurs due to vowel breaking or in loanwords from Chinese. I admit knowing very little about Middle or Old Korean though, and hence I am skipping over Korean in this post entirely.
[5] IMO better thus than UEW’s *wića-. Permic *dź ~ Hung. gy clearly proves *ńć, and front-harmonic cognates in these clearly prove *-ä and not *-a. Hung. front-harmonic í is also almost always from *e, not *i. Finnic can be routed as “*weŋ́śä-” > *wejśä- > *viisä-, and for Permic I suspect early *e > *i next to palatals in certain cases.

Yurats Addenda

One step up from the likes of Meshcheran, probably the most obscure Uralic language to have still been rudimentarily documented is Yurats: a Northern Samoyedic language recorded in one wordlist by G. H. Müller in the mid-1700s. As far as I know, we have zero other information about the language, not even any clear idea on when it might have gone extinct. A century later Castrén did not record it, but to my knowledge he also did not really search for it either; unlike Mator, which we can be pretty sure was indeed extinct by 1845.

Some parts of the data were reprinted by Pallas in the late 1700s and Klaproth in the early 1800s (a reproduction of the latter can be found in Donner’s Samojedische Wörterverzeichnisse, pp. 36–50). Janhunen’s Samojedischer Wortschatz (1977) only takes these secondary editions into account when listing Yurats cognates. Just the year before in 1976, though, Helimski had put out an article that actually reviews Müller’s original data instead (but presumably back in the 1970s article collections published in Tomsk were not yet in the habit of diffusing to Helsinki within one year). He also includes a transcript of the vocabulary. This article has by now been conveniently reprinted in Helimski’s 2000 compilation book Компаратистика, уралистика (Moscow: Языки русской культуры).

This is somewhat corollary-snipey, but I might as well still put this out there: a comparison of Janhunen’s Yurats coverage with the original data. Several additional etymologies can be easily noted, at least.

  • áddinelma‘ < PSmy *ånčɜ (perhaps a loan from Enets due to lack of ŋ-?)
  • cháru ‘larch’ < PSmy *kårwï (not in SW)
  • ja ‘flour’ < PSmy *jaə (not in SW; loan from Indo-Iranian *yawa- ‘grain’)
  • jur ‘fat’ < PSmy *jür (loan from Turkic *ür₂)
  • kírwa ‘bread’ < PSmy *kïrɜwå (not in SW)
  • módi jarra ‘I cry’ < PSmy *jåru-
  • maraga ‘cloudberry’ < PSmy *məråŋkå (not in SW, but cf. PU *mura)
  • mug ‘arrow’ < PSmy *muŋkə (not in SW, but cf. PU or maybe better a west Siberian Wanderwort #muŋkɜ)
  • nócha ‘arctic fox’ < PSmy *nokå
  • ngóde ‘berry’ < PSmy *wota (with *wo- > *o-, as also in Ne En)
  • pi ‘aspen’ < PSmy *pi
  • pimà ‘boot’ < PSmy *pajmå (loan from Turkic *bal₂mak)
  • poiju ‘alder’ < PSmy *pəjɜ (not in SW; misglossed by either Müller, Helimski or some intermediate editor as ‘almus’ pro ‘alnus’, but it’s in the middle of the tree names section)
  • pämesúma ‘darkness’ < PSmy *pəjmä (not in SW, but cf. PU *pid₂mä)
  • túa ‘wing’ < PSmy *tuəj

There would be more cases that only go back to Proto-Northern Samoyedic or perhaps just Proto-Nenets (e.g. sárnu ‘egg’, wuing ‘sea’ ~ Tundra Nenets sar°ʔńu, wī̮ʔ < *sarəʔnü, *wïəŋ), but I cannot claim to have put together any reasonably good coverage of these.

A small etymological puzzle is múde. Janhunen lists this under two different roots: from Pallas under *mərkä ‘shoulder’ (with the comment “(? < En)”), and from Adelung under *utå ‘hand’. Müller only has the sense ‘arm’, which could be a semantic shift from either, but also suggests there is only one word here, not two homophones. Straightforwardly we’d probably expect ˣmarze, ˣŋuda, so maybe contamination is however possible. — A reflex of ‘hand’ with ŋ- indeed appears in ngudéesse ‘ring’ (‘hand-iron’), but (j)ésse ‘iron’, with *ẃ > j, is clearly a loan from Tundra Nenets, and so maybe the first part of the compound is as well. Actually, nothing rules out even a third interpretation: that in Yurats *ŋ > m / _u regularly?

Another intriguing case is ngä́mme ‘breast’. This seems related to PSmy *ńimmä, but not as a direct descendant: it points to something like *əjmmä instead. *ə- rather than *ńi- could be perhaps by analogy from *əm- ‘to eat’ … but it could be also an archaism, since ‘breast’ is derived from *ńim- ‘to suck’, which in turn has also a variant *imə- in Proto-Uralic (> Fi. imeä, Hu. emik etc.). I believe that if a derivative *imə-mä > *immä had been formed already in PU, then this would regularly develop into *əmmä in PSmy, reaching quite close to the Yurats form. But I still have no good explanation for palatalization to ä.

The comparison also reveals a few words in SW sourced from Pallas that do not seem to appear in Müller. These are mainly anatomical terms: лы ‘bone’, пулы ‘knee’, хоба ‘skin, bark’, хыва ‘blood’. Pallas’ materials have elsewhere also an issue with words from a single source being duplicated under multiple languages, so maybe here as well? On the other hand, at least the last still looks phonologically clearly like Yurats specifically: *k- > x- and *-m- > -w- rule out Enets (which has kiʔ : kio- for ‘blood’), while *ë > ɨ seems to rule out Tundra Nenets (which has xe̮m ‘blood’; xe̮wa- ‘to bleed’).

Altogether, give or take some unclear cases like this, the number of Yurats words with a Proto-Samoyedic etymology seems to be some 140±5. This already suffices to work out the main points of historical phonology. Even already among the above examples you can note a few repeating correspondences: *å > a, *ŋk > g and various trivial identities. The big picture seems to be of a language with a vowel system close to (Proto-)Nenets (*ə > a, some apocope, *a > ä and *ä > e kept apart, almost no vowel clusters), but with a few quirks in the consonant system that instead align with Enets (chiefly *mp *nt *ŋk > b d g). Basically everything seems to be also derivable from Proto-Nenets-Enets without reference to the other Samoyedic languages.

There are at least a few individual quirks however. One is the development *w > b, which in Yurats only seems to happen before front vowels: bedu ‘intestine’ < *wätə; behánna ‘sturgeon’ < *wäkånå; bi ‘water’ < *wet, bidímat ‘to drink water’ < *wetɜ-; ’10’ < *wü(ə)t. Before back vowels, w remains: uáddu ‘root’ < *wånčå; wark ‘bear’ < *wərkə; wéneku ‘dog’ < *wën. So at least allophonic palatalization of labials for some time existed in Yurats. Having /bʲ/ but no /b/ would be weird though, and I suppose the split may have been one where *ẃ simply drops its velar component to yield *β > /b/.

Another distinctive conditional shift is that normally *a > ä, but *ja instead > ja, as in ja ‘flour’ < *jaə; jákki ‘smoke’ < *jačkə; jálle ‘light, sun’ < *jalä. Since the fronting of *a is a common Nenets-Enets (“Northwestern Samoyedic”?) feature, I would think this is probably a back-development similar to *Ca > *Ćā in Nenets. This is also suggested by two examples of *jü > ju (jur ‘fat’, jur ‘100’) versus retention otherwise ( ’10’, tükǘjalle ‘today’).

As you may have noticed, Müller also marks stress most of the time. This seems to be primarily on the penult (cháru, nócha etc.; behánna etc.) but there are also smaller groups of words with final stress, invariably marked with a grave accent and not an acute one (e.g. pürrè ‘pike’), or with initial stress on a trisyllable (wéneku). Tetrasyllables are rare, compounds aside, but seem to most commonly (6 out of 10 cases) have antepenult stress (e.g. tehánuda ‘wolf’). I have no idea if any of this has comparative significance.

Lastly, under the cut: the full wordlist itself (in Helimski’s transcription).

Read more ›

An Attestation of Meshcheran

Slowly poking around digitized back issues of Studia Orientalia, I recently ran into Kecskeméti (1968), an article indexing PallasZoographie (1811). This is a notable early source of animal names from several languages of Russia, collected since the late 1700s. Some of these languages would not be otherwise substantially attested until the 1900s, and for a few it is just about the last source available before extinction. (Pallas’ consistency in transcription and coverage are both poor, but we’ll take what we can get.)

During a closer look, for checking some Samoyedic data, I however had to do a double-take upon reaching the heading Mᴇsᴛsᴄʜᴇʀᴀᴇᴄɪs. This is obviously Meshcheran, one of the extinct more western Uralic languages. (Interestingly also with /-sč-/ and not the evidently Russified /-šč-/?) Except, all sources I have seen so far have claimed that Meshcheran went extinct already somewhere around the 1500s…

OK, Pallas only records four “Mestscheraic” words, and a distinct Meshcheran ethnicity is reported to have lingered long after Russification — in at least one case even into the current century! [1] So fairly likely we are dealing here not with a living language, only with substrate loan vocabulary, a natural enough fate for animal names. Yet this is still interesting due to being an attestation securely flagged as Meshcheran. There are two competing theories on the affinity of this language within Uralic — one sees it as a branch or sister of Mordvinic, the other, Permic. To my knowledge both of these build mainly on evidence such as toponymy found in the traditionally Meshcheran region, which is susceptible to errors from pre-Russification population movements.

The list comprises bird names entirely, all given with obsolete binomials:

  • Büdaenae ‘Tetrao coturnix’ (= hazelhen, Coturnix coturnix)
  • Kagau ‘Accipiter milvus’ (= red kite, Milvus milvus)
  • Kuki ‘Cuculus borealis’ (= a cuckoo sp., probably the common cuckoo Cuculus canoris)
  • Schibirtschik ‘Motacilla albeola’ (= common wagtail, Motacilla alba)

The third is obviously undiagnostic of anything, but the others may be worth something.

I cannot make much of the first on a quick lookaround: it would be a hard match for the common Mordvinic term for the hazelhen (Erzya /povo/, Moksha /pova/ < PU *püŋə) and even poorer for the common Permic term (Udmurt /śala/, Komi /śɤla/) — at most it has some very vague similarity to Komi /bajdɤg/ ‘partridge, tarmigan’. [2] The second is however a good match with Mordvinic /kaval/ ‘kite’ (? < *kaɣal), though the implied vocalization of final *-l meanwhile looks amusingly Permic. The last has vague and probably insufficient similarity with Moksha /šäjgiča/ ‘wagtail’ (← /šäj/ ‘valley’ + /kiča/ ‘gull’) on one hand, Russian шибать ‘to hit’ (< Proto-Slavic ‘to whip’) on the other.

I do not feel like rooting around for possibly related names of similar birds; but per ‘kite’ I would at this point lean cautiously towards a Mordvinic-ish affiliation for Meshcheran.

[0] In a blog post this short you’ll probably manage without me doing hyperlinks for the footnotes.
[1] Thus per V. Patrušev apud Rahkonen (2009) in a single village “in a Mordvin area”.
[2] This is probably a loan from early Hungarian or some common source though — cf. Hu. fajd ‘grouse’, from earlier #paďt- per Mansi *paľta ‘black grouse’. If this really were a common Uralic root, I would expect instead **poľt- in Permic (and the cluster *-ďt- would be also unprecedented). OTOH Komi seems to show *p-d > /b-d/, which may allow dating the word to the common Permic era regardless.

Stop voicing across Uralic: some musings

Finnish often gets used as an example of a language that does not contrast voiced and voiceless consonants. While this is not really correct for Standard Finnish (which at least prescribes all of the voiced stops /b d g/), it’s true for many dialects, especially in pre-modern times. [1] The same also holds for most reconstructions of Proto-Uralic and Proto-Finnic. A few times I’ve seen this even given as a typical feature of the Uralic languages. This much is not the case, though. The presence of voiced stops in the recorded Uralic languages varies, but generally tends towards inclusion.

  • No voiced stops:
    • Most spoken Finnish; Northern Karelian
    • Most of Ob-Ugric
    • Forest Nenets, Northern Selkup
  • Allophonic voiced stops:
    • Estonian (short stops optionally voiced medially)
    • Ingrian (voiced before sonorants)
    • older Mari (voiced after nasals)
    • some varieties of Ob-Ugric, at least Southern Khanty per some descriptions (voiced medially)
  • Phonemic voiced stops:
    • most Samic languages
    • most of Finnic: Standard Finnish, Livonian, Votic, Southern Karelian incl. Livvi, Ludian–Veps
    • all of Mordvinic
    • newer Mari
    • all of Permic
    • Hungarian
    • most of Samoyedic: Tundra Nenets, Enets, Nganasan, Southern Selkup, Kamassian, Mator

(The distribution of voiced sibilants such as /z/ is very similar, though they are additionally lacking in standard Finnish and in northern Samoyedic. They are however less important for the forthgoing points, so I will focus on the voiced stops.)

This might still be a higher proportion of languages without voiced stops than within most language families of the Old World, though. Within Indo-European I can only think of Tocharian; Icelandic and varieties of High German; Scottish Gaelic; and, per some views, much of Anatolian. Maybe one of the Eastern Iranian languages that are heavy on spirantization? Even outside IE, the only other national language examples I know of are Chinese (not even in its entirety; at least Wu and Min still preserve the Middle Chinese voiced stop series) and Mongolian. Continental Southeast Asia has plenty of languages that are short on voiced pulmonic stops proper, but these often “compensate” by having instead implosives or prenasalized voiced stops; e.g. Vietnamese with /ɓ ɗ/, Hmong with a full series from /ᵐb/ to /ᴺɢ/.

Reconstructions could be added to the picture as further data points of their own, e.g. Proto-Samic and Proto-Samoyedic are both reconstructed without any voiced stops. However, when we move from synchrony into history, it is probably more important to consider the origin of voiced stops. This shows variation as well, but some particular pathways crop up repeatedly:

  • *-P- > -B- (general voicing of original singleton stops):
    • Southern Sami
    • Finnic: Livonian, Ludian–Veps
    • Mordvinic
    • Tundra Nenets, Kamassian, Mator
  • *-P- > -P- ~ -B- (voicing of singleton stops through consonant gradation):
    • Kola Sami
    • probably Proto-Finnic
    • ? Southern Selkup
  • *-Đ- > *-B- (hardening of earlier voiced spirants):
    • Standard Finnish (*ð > /d/ only)
    • Votic (*ɣ > /g/ only)
    • newer Mari (†ð, †ɣ > /d/, /g/)
  • *-NP- > -B- (simplification of stop+nasal clusters):
    • most of Samic (Southern through Skolt); usually as geminate -BB-
    • Permic
    • Hungarian
    • Enets

You might notice that all of these apply word-medially only. I have also left some more complicated cases off the list for now.

One wildcard approach is Nganasan, where the two most widely established phonemic voiced stops /b/, /ď/ come typically from *w, *j and are unrelated to the original stop consonants. [g] only occurs natively as the equivalent of /k/ under consonant gradation; [d] is even more limited, found as the weak grade of /t/ in the cluster [nd], while between vowels the result is [ð]. (Due to loanwords both could be probably now considered phonemic in modern Nganasan, though strangely enough these kind of inventories seem to then call the dental phoneme /ð/ per its intervocalic allophone and not, as could be expected, /d/.) Also some /b/, /ď/ come by gradation from PU *p, *ś. Their strong grades though are not the corresponding voiceless stops, but instead, a few sound changes later, /x/ and /s/. [2]

A similar setup *w *j > /b dž/ is found also in Kamassian and Mator, in these accompanied though by regular medial voicing. *j > /ď/ alone is more common yet: this is standard in Southern Karelian and Ludian, and found also in some varieties of Veps, Mari, Udmurt and Enets, at least. I even recall reading about a dialect of Hungarian that does this, but I don’t have any good overviews of Hungarian dialectology on hand to double-check with.

This is all also from the POV of synchronic voiced stops. Medial voicing, gradation-related or not, has likely happened at some point in by far most Uralic languages, but this often continued on with further lenition. E.g. in Permic, intervocalic *-p- *-t- *-k- are all continued as zero, most likely with intermediate > *[b] *[d] *[g] > *[β] *[ð] *[ɣ]. In at least one case, two separate rounds of medial voicing have been involved: thus in Southern Karelian, which has both consonant gradation and general medial voicing, so that original singleton stops yield the alternations b ~ v, d ~ ∅, g ~ ∅. This continues earlier stop/spirant gradation: *p ~ *v, *t ~ *ð, *k ~ *ɣ, [3] which in turn is probably from even earlier voiceless/voiced gradation: *p ~ *b, *t ~ *d, *k ~ *g.

Something similar may be actually the case in Permic. There’s reason to suspect that the full *-NP- to -B- shift was later than the lenition of medial single stops. Insted of filling in new voiced stops after the lenition of medial single stops to spirants, these clusters may have instead, in the first phase, filled in new voiceless stops already before the simplification of the original geminates. This is suggested by how a few late loanwords from Iranian still show *-NP- > -B- (/pad/ ‘crossroads’ ← Ir. *panta- ‘path’) but also seem to retain voiced stops as is (Udm. /vudor/ ~ Komi /vurd/ ‘otter’ ← Ir. *udra-); even Indo-Iranian voiceless stops can be continued as voiced (Udm. /kureg/ ~ Komi /kurɤg/ ‘hen’ ← Ir. or II *karka-; per *a > /u/ this must be an older loan than the previous two). So perhaps words of this group were all originally borrowed with simple voiceless stops (*pantɜ or *päntɜ > *patɜ, *vutrɜ, *karäkɜ > *kurekɜ), and they then went through a second round of medial lenition in late Proto-Permic, before the fall of final vowels (> *padɜ, *vudrɜ, *kuregɜ > *pad, *vudr, *kureg)? On the other hand, loaning from some Iranian variety with medial voicing is also conceivable, in the last case even an alternate analysis with *-eg as a suffix, and *rk > *r as in native vocabulary. (The epenthesis to *karäkɜ that would need to be assumed otherwise looks very sketchy, actually.)

I have even wondered if this could have been the same voicing process that affected Proto-Permic single voiceless stops after an unstressed syllable in mainline Komi, but not in Udmurt or Komi-Permyak (e.g. in the adjectival ending Udm. /-et/ ~ K /-ɤd/ ~ KPerm. /-ɤt/). But the fate of the original geminates suggests this is unlikely: since they yield modern Permic simple voiceless stops, same as everywhere from Veps on east, their shortening would have to be later than the voicing of any transient secondarily introduced medial voiceless stops. And it seems rather unparsimonious to assume geminates were still maintained as late as Proto-Komi.

Hungarian also has both *-P- > *-Đ- (*-p- *-t- > -v- -z-; *-k- > †-ɣ- > -v- ~ ∅) and *-NP- > -B-, but here we likely only need a single common round of medial voicing, followed by a chainshift of sorts of *-B- *-NB- to *-Đ- *-B-. Unlike Permic, new /-NP-/ or /-NB-/ clusters are established early-ish; though in loanwords from Iranian the only example seems to be kincs ‘treasure’ < pre-Hu *kenčɜ ← *gandz-. [4] Several others have correspondences elsewhere in Uralic, but I suspect these cases to be mostly loans / Wanderwörter rather than proper native inheritance. (They probably deserve to be more carefully looked at at some point, though.)

This big picture, I think, also raises some questions about the supposed retention of voiceless stops in a few languages.

I am not talking about any kind of a spin on the alternate reconstruction by Steinitz — who outright posited an original stop versus spirant contrast *-t- : *-ð-, instead of a gemination contrast *-tt- : *-t- (and, among the dentals, shunting *d₁ = traditional *ð then off as an absurd “retroflex spirant *δ̣”). This remains conclusively debunked by loanwords from Indo-European, whose voiceless stops turn up with traditional *-t- etc. (Indo-Iranian *ćata ‘100’ → *śëta > Hungarian száz, Erzya сядо /śado/, etc.), instead of Steinitz’ *-t- = traditional *-tt-. A weaker version of this could be perhaps still entertained: medial *-tt- : *-d- etc., but I don’t really see any particular benefit to it. In my opinion the situation found in Samic, Finnish–Karelian, Nganasan and perhaps Selkup can be still considered archaic, with all stop consonants voiceless by default, voiced (> lenited to non-stops in Finnish, Karelian and the immediately adjacent Sami varieties) at most under consonant gradation.

But the other four cases of Uralic languages without any voiced stops seem more dubious. To reiterate: (most of?) Mansi, (most of?) Khanty, Forest Nenets, Northern Selkup. These are all bundled together in western Siberia; the two latter have close relatives that do show medial voicing (i.e. Tundra Nenets and Southern Selkup); and even the former two are usually considered somewhat closely affiliated with Hungarian. Unlike Finnic and Samic, they also all show general shortening of geminates. In most Uralic languages this has been associated with earlier medial voicing, i.e. *-tt- : *-t- > *-tt- : *-d- > *-t- : *-d-, with the length contrast transphonologized as a voicing contrast, as is more common worldwide.

The languages have also gone through some non-general medial lenition: *-k- > *-ɣ- in Ob-Ugric (including even clusters such as *sk), and in Samoyedic *-k- is lost at least in *ə-stems (though not in all cases in *A-stems; established examples of retention include *pirkä < *pid₁kä ‘high’, *kåjkə < *kod₂ka ‘spirit’). In Far Eastern Khanty also *-p- > /-w-/. There is also some limited direct evidence of stop devoicing: like Nganasan, Kamassian and Mator, Selkup also fortites *w and *j — but all the way to voiceless *k, *ḱ.

So I suspect that voicelessness of all stop consonants, as could be proposed for Proto-Uralic, is not actually directly continued in these languages. This looks more like an areal feature, either an innovation wave that crossed a few language boundaries on its way, or subtrate influence. Direct influence from Forest Nenets or some extinct related variety seems possible for Northern Selkup, while in the case of Ob-Ugric, this is maybe more likely to to have been taken up from the original pre-Uralic substrate languages of the region.

This would also mean that degemination and medial voicing could be reconstructed as common Ugric features, if desired; with voicing developing further into spirantization in Hungarian, but eventually mostly reverted in Ob-Ugric. If so, this continues undermining further the notion of Ob-Ugric as a genetic subgroup within Uralic. Previous surveys by Honti and Viitso have not found any common innovations in the languages’ consonant systems other than the nearly trivial degemination, and several trivial shared retentions such as the maintenance of *w- as still /w-/. The evidence of Hungarian-Mansi isoglosses (e.g. *wi > *wü- > *ü-) and even Hungarian-Khanty ones (e.g. *d₂ > *j, further shared by Samoyedic) should also be weighed here: perhaps it is rather some of these that are old common inheritance after all, as has been suggested by various people at various times.

[1] Note that /f/ versus /v/, a contrast fairly widely established in western dialects, does not count as a voicing distinction: the latter is the approximant/semivowel [ʋ]. This is even treated as further equal to /u/ in some generative models of Finnish phonology. I write this as /v/ in broad transcription both for simplicity & following the traditional Uralistic transcription (which itself follows Finnish standard orthography), much like I also generally use /a ä/ instead of the IPA-compliant [ɑ æ].
[2] This also has, I think, implications for the reconstruction of the history of consonant gradation, since *z > /ď/ does not seem plausible. Either we have to date the emergence of consonant gradation between voiceless and voiced grades already into pre-Proto-Samoyedic (= effectively Proto-Uralic), with further ramifications; or, if we want to consider this pattern an innovation specific for Nganasan that never occurred in its close relatives (note in particular that while medial stops are generally lenited in Enets and Tundra Nenets, the same does not apply to /s/), then it is instead the loss of palatalization in *ś that must be also dated as post-Proto-Samoyedic. We would not need to assume an outright palatalized stop or affricate, though: a conceivable route to the modern situation would be *[ś] > [ś] ~ [ź] > [s] ~ [j]  > [s] ~ [ď]. Note also that while palatalized *kʲ > *ć in early common Samoyedic merges with /s/ in northern Smy, in southern Smy these have distinct reflexes /š/ and /s/, suggesting *ś > *s rather soon after PSmy, at the latest.
[3] Traditionally the labial spirant stage is given as [β], but to my knowledge, there is no evidence whatsoever anywhere in Finnic for a distinction between this and regular /v/ < *w; only for retained /b/ in Livonian and Ludian–Veps. Setälä conceived of the latter as a re-fortition from [β], but to me a marginal archaism that never went through a spirant stage seems more likely. It’s conceivable that the shift from *w to labiodental /v/ was not yet completed by the time of [b] > [β], and so this may have been immediately a merger, with [β] > [v ~ ʋ] only following later. The fact that several Finnish dialects are reported to have [w] for /v/ next to rounded vowels (e.g. in SE Tavastia [wuos] for vuosi ‘year’, [sywä] for syvä ‘deep’) may even support reconstructing [w] still for Proto-Finnic in some positions at least.
[4] Judging by the voiceless k and cs, this looks like one of those early loans where Proto-Iranian *c *dz (later > /s z/) were substituted by *č in Uralic, instead of anything directly related to the unexpected appearence of /dž/ in Persian گنج ganj.

Three observations on Bactrian

As a part of my ongoing quest to get a better handle on the Indo-Iranian languages (mostly, yes, but not only due to their important early contact influence on the Uralic languages), some time ago I caught wind of Saloumeh Gholami’s PhD thesis Selected Features of Bactrian Grammar (2010) and have given it a thorough-ish read. Bactrian has been and probably continues to be one of the more poorly documented Iranian languages, and Gholami provides what seems like a good summary of the newer ongoing research.

Already at this point there are a few interesting observations to be made. And I hope you will not be too disappointed to find out that my thoughts so far mostly involve the historical phonology of Bactrian — the syntax and morphology no dout have interesting phenomena going on too, but I probably won’t be able to say anything intelligible about those before knowing much better how they work also in the other Iranian languages from the same period and/or area (Sogdian, Xwareshmian, Middle Persian, Pashto etc.)

Gholami’s overview of the phonology of Bactrian is introductory in nature but still very historically grounded: she gives a pre-Bactrian etymology for almost every example word mentioned. These are not sourced, so it is hard to tell how far back they are supposed to go (all the way to Proto-Iranian?), but I get the impression that they’re based on earlier groundwork on Bactrian by Nicholas Sims-Williams, whom she mostly refers to for basics.

The thesis also does not contain any kind of a word index, so I’ve had to comb the initial chapters by hand for examples, getting a bit over 400 of them together. Further vocabulary would appear in the grammatical chapters with their extensive interlinear glosses, but generally without proto-forms. If we regardless suppose her given pre-Bactrian reconstructions to be reliable, they seem to allow for the following observations.

One: there seems to be a rule of non-open vowel shortening.

Middle Iranian *ē (in Bactrian from Proto-Iranian *ai, *aya, *iya, *ā-i) is in Bactrian spelled varyingly as ‹η› (likely /eː/) or ‹ι› (likely either /i/ or /iː/). Gholami suggests that *ē develops to ‹ι› before a nasal, on the basis of the following data: *waina- > ‹οιν-› ‘to see’, *kainā- > ‹κινο› ‘revenge’, *abi-dayanā > ‹αβδδινο› ‘custom’, *abi-dayana-ka > ‹αβδδιγγο› ‘way, manner’, *xrayanā > ‹αρχινο› ‘purchase’. Raising of long vowels before nasals is common across Iranian, sure enough. However, Bactrian shows no signs of the parallel developments *ōN > **ūN (*gauni-čiya- > ‹γωνζο› ‘basket’, *čiyāt-gauna > ‹σαγωνδο› ‘as, like’) or *āN > **ōN (*bāmušn > ‹βαμοϸνο› ‘queen’, *gawāna > ‹γαοανο› ‘fault’, *nāma > ‹ναμο› ‘name’, *fra-māna > ‹φρομανο› ‘command’, *fšupāna > ‹χοβανο› ‘shepherd’…)

An assumption of pre-nasal raising also does not exhaust the cases with *ē > ‹ι›: this also occurs in *ziyakā > ‹ζιγο› ‘damage’, *waignā > ‹οιγνο› ‘famine’ (unless phonetically with [-ŋn-]?), *-iyaθwa > ‹-ιλφο› ‘a suffix’ (thanks Gholami, very illustrative glossing).

I would instead suggest the following rules:

  1. *ē gives ‹ι› before an original unstressed *ā. This handles ‘damage’ and ‘famine’, but also ‘revenge’, ‘custom’ and ‘purchase’. This is likely primarily also shortening *ē > *e, with raising *e > /i/ only following secondarily.
    • This does not seem to apply to /ē/ from i-umlaut of *ā: *dāraya- > ‹ληρ-› ‘to have’, *wādaya- > ‹οηλ-› ‘to lead’, *wādžaya > ‹οηζο› ‘ability, power’, *wi-čāraya- > ‹οισηρ-› ‘to purchase’. These could suggest either that implicit intermediate unstressed *ē (*dārē- > *dērē-, *wādē- > *wēdē- etc.) did not trigger shortening; or, alternately, maybe i-umlaut of *ā initially led to a distinct low front vowel *ǣ, which was only raised to ‹η› after the shortening/raising of *ē from *ai, *aya, *iya. The latter might be preferrable in light of one case with *au > *ō > ‹ο› (rather than ‹ω›) before *aya > *ē: *tauxmaya > ‹τοχαμηιο› ‘relationship’ (here *ē is not lost; thanks to further suffixation?). As a vowel, ‹ο› probably mostly stands for /u/, as is suggested by its use also for /w/ (‹οηζο› = /wēdz/, etc.) and the general typology of vowel systems across Iranian: Old and Middle Iranian languages mostly do not have short /o/. [1]
  2. *ē gives ‹ι› also before word-final consonant clusters. (NB: ubiquitous final ‹-ο› is thought to be only a Greek-derived orthographic device.) This handles ‘way’, as well as the ‹-ιλφο› suffix, and maybe also *fšuyantīčī > ‹φινζο› ‘lady’ (though here we instead have *-uya-, which I suppose could have contracted to *ī rather than *ē already to begin with).
    • This is again applicable also to the development of *ō: *aitat-gaunaka > ‹δαγογγο› ‘such, in this way’, *bawanta > ‹βονδο› ‘completely’.

These rules only seem to leave the verb root ‘to see’ unaccounted for. However, a more general version of rule 1 might cover some inflected forms (*wēn-ēd > ‹οινηδο› ‘see.2PL’), and actually also an allomorph with retained *ē exists (*wēn-an > ‹οηνανο› ‘see.subjunctive-1PS’). Gholami thinks these are chronologically separated versions before and after the sound change from ‹η› to ‹ι› (early /wēn-/ > late /win-/?), but if there is a chronological difference, maybe this rather involves levelling-away of the /wēn-/ allomorph.

Rule 1 then suggests that before the onset of root stress and the reduction of all suffix and prefix syllables, Bactrian went through a stage of mobile stress attracted rightwards by long vowels, as I believe occurs in several other Indo-Iranian languages (though don’t ask me about the exact details on this).

Two, a few notes on vowels in prefixes. These are mostly reduced heavily, and are spelled varyingly with ‹α› or ‹ο›, which Gholami interprets as [ə]. E.g. *fra-gāwa > ‹φρογαοο› /frəɣāw/ ‘profit’, *ni-kanta- > ‹νακανδο› /nəkand-/ ‘to dig’, *uz-bara- > ‹αζβαρο› /əzvar-/ ‘to bring forth’. There is also epenthetic /ə/ before some consonant clusters: *spāsV > ‹σπασο ~ ασπασο› /spās ~ əspās/ ‘service’. Despite some cases of variation like this, schwa seems to be still an underlying phoneme, however: consider *xšayanta- > ‹αχανδ-› /əxānd-/ ‘to control’, with first *xš- > *əxš-, followed by *š > ∅ (if not rather > *hx > /xː/, spelled simply as ‹χ›?); and *upa-stāna > ‹αβαστανο› /əvastān/ or /əvəstān/ ‘support’. There doesn’t seem to be much evidence against considering [ə] an unstressed allophone of /a/, though. (Gholami takes no stance on questions about the phoneme inventory of Bactrian and operates only with orthographic vs. surface phonetic levels of analysis.)

There are also some cases where *ni- is still spelled as ‹νι-›. Gholami suggests that these would be retentions. I think they might be however secondary umlaut developments: in the data given, they occur mostly preceding a palatal root vowel ‹ι› or ‹η›, as in *ni-štaya- > ‹νιττι-› /nihti-/ ‘to send (a message)’; or preceding a palatal sibilant (possibly itself originally conditioned by *i through RUKI), as in *ni-šadman > ‹νιϸαλμο› /nišalm/ ‘seat’. There are also examples of ‹ι› continuing earlier prefixal *a in a similar context: *waz-antiyaka > *wəzindēg (with umlaut in the root: *a-i > *i) > ‹οιζινδδιγο› /wizindiɣ/ ‘current’. Gholami attributes this last example to a supposed development of *a to ‹ι› before /s z/, which would also be seen in *dasta > ‹λιστο› /list/ ‘hand’. There are however plenty of counterexamples, say *aspa > ‹ασπο› ‘horse’, *ā-xasa- > ‹αχασ-› ‘to quarrel’, *basta- > ‹βαστο› ‘to bind’, *dasa > ‹λασο› ’10’; *azam > ‹αζο› ‘I’, *azdā > ‹αζδο› ‘knowledge’, *gazna > ‹γαζνο› ‘treasury’, *waza- > ‹οαζ-› ‘to use’. I don’t know what is up with ‘hand’; theoretically, some kind of suffixation to *dasta-ya- would work. [2]

Lastly, one case with the development of *fra- ‘pre-‘ suggests that vowel reduction actually has been fairly early, resulting in this prefix first in *fr̥-, which then in unstressed position mostly unpacks again to *frə-. Consider *fra-stāya- > ‹φοϸτιι-› ‘to send’: this exemplifies the sound change *rs > /š/ (compare e.g. *kr̥sta- > *kirsta- > ‹κιϸτο› /kišt/ ‘to detain’), and therefore requires *fr̥stēy- > *fštīy- > /fəštīy-/.

Three, the development of *š shows double treatment. Gholami notes that in some cases, *š is retained as ‹ϸ› /š/; in others, it developes to ‹υ› /h/, which can be further lost (or perhaps only unwritten in various consonant clusters, I wonder?). This does not appear to be a simple case of dialect mixture or whatever, since both outcomes can sometimes occur in the same word: *ni-šašta- > ‹ναυαϸτο› /nəhašt/ ‘to settle’.

Examining the data, to me the distribution does not appear to be entirely unpredictable, though. *š > *h seems to be the main development for *š originating by RUKI:

  • *is, *us > *iš, *uš > *ih, *uh
    • *awa-gta > ‹ωγοτο› /ōɣu(h)t/ ‘to conceal’
    • *d-manyu > ‹λρουμινο› /lruhmin/ ‘enemy’ [3]
    • *fra-ta-ka > ‹φρητογο› /frē(h)təɣ/ ‘messenger’
    • *kasta- > ‹κισατο› /kisə(h)t/ ‘youngest’
    • *ni-gaa- > ‹ναγαυ-› /nəɣāh-/ ‘to hear’
    • *ni-šašta- > ‹ναυαϸτο› /nəhašt/ ‘to settle’
    • *ni-štaya- > ‹νιττι-› /nihti-/ ‘to send (a message)’
    • *snā > ‹ασνωυο› /əsnōh/ ‘daughter-in-law’
    • *wi-šmāra- > ‹οαυμαρ› /wəhmār/ ‘to account’
    • *wrta-ka > ‹ροτιγο› /ru(h)tiɣ/ ‘rope’
  • *rs > *rš > *r(h)
    • *ā-pr̥št- > ‹βαρτ-› /var(h)t-/ ‘to be necessary’
    • *gr̥šta- > ‹γιρτο› /ɣir(h)t/ ‘to complain’ (past stem)
    • *hr̥šta- > ‹υιρτο› /hir(h)t/ ‘to leave’ (past stem)
    • *wi-xwata- > ‹οοχορτο› /wəxur(h)t/ ‘to quarrel’
  • *ḱs, *k⁽ʷ⁾s > *ćš, *kš > *š, *xš > *h, *x(h)
    • (PII *ćš >) *pašman > ‹παμανο› /pa(h)man/ ‘wool’
    • *āθriya > ‹χαρο› /x(h)ār/ ‘ruler’
    • *ayant- > ‹χανδ-› /x(h)ānd-/ ‘to control’
    • *apā- > ‹χαβρωσο› /x(h)avrō(t)s) ‘night-and-day’
    • *nauθra > ‹(α)χνωρο› /(ə)xnōr/ ‘satisfaction’
    • *wašti > ‹χοατο› /xʷa(h)t/ ’60’ [4]
    • (PII *ćš >) *xšwašti > ‹χοατο› ’60’
    • *waa > ‹οαχο› /wax(h)/ ‘interest’

In one case I’m not sure if RUKI or *ćt > *št is involved: *paršti-čī- > ‹παρσο› /parts/ ‘backwards’.

Meanwhile, retention of *š seems to be entirely regular in the position *a_V, *ā_V. In these positions *š would be maybe the most likely to continue PII *sč < *sk(e), though *ćš is also an option, and some could be innovative Iranian vocabulary from somewhere else entirely:

  • *dāšinV > ‹λαϸνο› /lāšn/ ‘gift’
  • *fra-xāšaya- > ‹φριχηϸ-› /frixēš-/ ‘to seduce’
  • *paga-šaka- > ‹παχϸιιο› /paxšiy/ ‘in-law’
  • *uz-gaša- > ‹αζγαϸ-› /əzɣaš-/ ‘to dissent’
  • *xāša-ka > ‹χαϸιγο› /xāšiɣ/ ‘clothing’

A few clear cases of retained /š/ from RUKI also appear:

  • *kr̥šāka > ‹κιϸαγο› /kəšāɣ/ ‘plough-ox’ (<< PIE *kʷels- ‘to plough’)
  • *ni-šādman > ‹νιϸαλμο› /nišalm/ ‘seat’ (<< PIE *sed- ‘to sit’)

In most cases of retention I am not sure about the pre-Iranian origin of *š (but RUKI is conceivable in many of them):

  • *a-xwašn- > ‹αχοαϸνο› /axwašn/ ‘unpleasantness’ (any relation to Ir. *xwad- ‘to make pleasant’ < PIE *sweh₂d-?)
  • *bāmušn- > ‹βαμοϸνο› /vāmušn/ ‘queen’ (any relation to Persian بانو /bānu/ ‘lady’?)
  • *daxštana > ‹λαχϸατανιγο› /laxšətaniɣ/ ‘crematory’ (from pseudo-PIE *dʰegʷʰ-sth₂no-?)
  • *hāwišta-ka > ‹υαϸκο› /hāšk/ ‘pupil’
  • *pitr̥-šti- > ‹πιδοριϸτο› /piðurišt/ ‘ancestral estate’ (from pseudo-PIE *ph₂tēr-steh₂-?)
  • *ni-šašta- > ‹ναυαϸτο› /nəhašt/ ‘to settle’ (maybe *š-s > *š-š, if from PIE *steh₂-?)
  • *škara- > ‹αϸκαρ-› /əškar-/ ‘to follow’
  • *wi-xwarša- > ‹οοχωϸ› /wəxōš/ ‘quarrel’
  • *xšāya- > ‹ϸιι-› ? /šīy-/ ‘to be able’ (< PII *kšaH-? but cf. /x(h)/ in the derivative ‘ruler’)
  • *xšidža-ka- > ‹ϸιζγο› /šidzɣ/ ‘good’

I could suggest at least that before a vowel, *rš >/š/ (‘plough-ox’, ‘quarrel’), while before a consonant, *rš > /r(h)/ (‘to be necessary’, ‘to complain’, ‘to leave’, ‘to quarrel’ and ‘backwards’).

The cases with *št from PII *ćt seem to be rather evenly split. *š > *h appears in:

  • *aštā > ‹αταο› /a(h)tā/ ‘8’ (<< PIE *oḱtōw)
  • *ham-gašta- > ‹αγγιτι› /angi(h)ti/ ‘to receive’ (past stem) (< East Iranian *gādz- ‘to receive’, of unknown earlier origin per Cheung)
  • *ni-pixšta- > ‹νιβιχτο› /nəvix(h)t/ ‘to write’ (past stem) (<< PIE *peyḱ- ‘to paint, decorate’)

while retention appears in:

  • *pašti- > ‹παϸτο› /pašt/ ‘agreement’  (<< PIE *peh₂ḱ-; cf. pact)
  • *rašta- > ‹ραϸτο› /rašt/ ‘true, loyal’ (<< PIE *h₃reǵ-; cf. right)

Same goes for cases with *fš, though examples are rather rare:

  • *š > *h in *pati-fšarV > ‹πιδοφαρο› /piðəf(h)ar/ ‘honour’; *fšuyantīčī > ‹φινζο› /f(h)indz/ ‘lady’
  • retention in *kafši > ‹καφϸο› /kafš/ ‘shoe’
  • and even: *fš > /x/ in *fšupāna > ‹χοβανο› /xuvān/ or /xəvān/ (or /xʷvān/?) ‘shepherd’.

Is it perhaps relevant that ‘shepherd’ comes from PII *pću-, while the others are more likely to be from *ps with Iranian “second RUKI” to *fš? Maybe additionally *fš- > /f-/ root-initially versus retained medially.

It’s also worth pondering that *š > *h fits somewhat poorly into the phonological big picture of Bactrian. Usually *š >> /h/ correspondences go through *x (thus in e.g. Finnic or Spanish; Pashto remains at the /x/ stage), but Bactrian retains Proto-Iranian *x just fine. Two other possibilities come to mind, but they both would require Bactrian to have split off from the other Iranian languages relatively early:

  1. Perhaps *kʰ > /x/ and *kC > /xC/ are fairly late in at least some parts of Iranian: they are, after all, not reflected in some of the languages, such as Balochi and Wakhi. *š > *h could then have passed through a transient *x state already earlier.
  2. Perhaps the path was here rather *š > *s > *h, the second change being common (but not Proto-) Iranian. But this leaves many cases unexplained: original *-st- for example does not develop into **-ht-, but *-št- still does in many cases (‹νιττι-›, ‹φρητογο›, ‹ωγοτο›, etc.)

Likely having more clarity on this issue would require examining also the cognates elsewhere in Iranian, and not necessarily taking Gholami’s pre-Bactrian reconstructions as a given. But this remains difficult as long as there is no general Iranian Etymological Dictionary to consult.

[1] Gholami suggests /o/ for cases with *a-u > ‹ο›, such as *madu > ‹μολο› ‘wine’. Other eastern Iranian languages with this assimilation, though, end up with *u, e.g. Ossetian муд. I-umlaut of *a-i also gives ‹ι›, not ‹ε›, e.g. *kanyā > ‹κινο› ‘canal’.
[2] Sometimes it is proposed that ‘hand’ in Iranian would be native only in Persian, and borrowed from there to most of the other varieties, since this has PIE *ǵʰ- and is expected to give /d-/ only in Persian, but /z-/ elsewhere (and Avestan indeed has that). In this case the widespread Middle Iranian fronting of short *a to *æ, which appears to be absent from Bactrian, might result in *destV > *ðistV > /list/ in Bactrian. However I think that dissimilation before syllable-final *s is perhaps more likely: PIr *dzast- > *dast- (this proposal I’ve seen from Martin Kümmel). — There is however the fact that ‘hand’ contains original PIE *s, while my counterexamples like ‘horse’, ’10’ and ‘I’ mostly have secondary *s *z from PIr. *c *dz < PII *ć *dź⁽ʰ⁾ < *PIE *ḱ *ǵ⁽ʰ⁾. This could be perhaps leveraged, if wanted, but I don’t see what phonetical sense this would make, and so I don’t feel like doing a full check-up on the matter.
[3] The (rather funky!) consonant cluster /lr-/ presumably by folk etymology from *drauga > ‹λρωγο› /lrōɣ/ ‘false(hood), wrong’.
[4] In principle pre-epenthesis *swašti > *šwašti could also work, with *š > *h then feeding into common Iranian *hw > *xw?

Were there Proto-Samic *š-stems? Some issues of Samic-Finnic chronology

Despite ongoing disputes about the subgrouping of the Uralic family, it is clearly the case that the Finnic and Samic languages have been at least neighbors for several millennia now, exchanging linguistic features and material back and forth. With care, this allows teasing out substantial facts about the relative chronology of the history of the two families. (Germanic can be also added to the bundle, though the evidence from here is much more unidirectional.)

The sibilant system shows several good examples. While Finnic /s/ and Samic /s/ correspond to each other consistently all the way from Proto-Uralic to the present day, the “shibilants” have a more complex history. In old inherited vocabulary the main correspondences for these are Finnic *s ~ Samic *č (from original *ś ~ *ć, be it Proto-Finno-Samic or all the way from Proto-Uralic) and Finnic *h ~ Samic *s (from original *š). The latter correspondence can also appear in old (perhaps mostly either parallel or Finnic-mediated) loans from Germanic, whose *s was substituted as *š at least on the Finnic side; no way to tell if also in Samic.

(This probably indicates that pre-Finnic *s was, following the merger of *s and *ś, realized as laminal [s̻], while *š was (sub?)apical [ʂ]. Germanic *s was likely apical [s̺], and therefore matched better with pre-Finnic *š. I am not sure how far back the modern Finnish realization of /s/ as apical [s̺] dates, but at least the Northern Karelian shift of *s to an apical postalveolar [s̱] š most likely starts from this same value.)

The correspondence Finnic *s ~ Samic *š appears in a small number of native-looking cases, where they seem to represent original preconsonantal *ś (PF *laskë- ~ PS *lōštē- < *laśkə- ~ *laśk-ta- ‘to let out, pour, etc.’; PF *kisko- ~ PS *këškē- < *kiśka(w)- ‘to tear, pull’; PF *vaski~ PS *veaškē < *wäśkä ~ *waśka ‘copper’). It is however more common in loanwords between the two. E.g. Finnic *s before *i and *ü seems to be fairly regularly substituted as *š in Samic; the YSS data has 5 examples of this out of its 11 examples altogether of PS *š-. [1] All late loanwords from Samic into Finnish also show *š → /s/, for the obvious reason that Finnish has had no other sibilants for most of its independent existence. (Even the modern loanword phoneme š or sh is still limited to educated speakers. Probably a rather large proportion of Finns counts as “educated” by typical contact linguistic standards by now, though…)

Lastly, also the fourth theoretically possible correspondence between plain sibilants is attested: Finnic *h < *š ~ Samic *š. (I will not be treating the various affricates in this post.) This might be the group that has the most value for establishing chronology, since it is bounded both from above and below: prevocalic *š only occurs in loanwords in Proto-Samic, but any such loanwords from Finnic must then pre-date the pan-Finnic change *š > *h.

Some of the data in this group suggests that it stretches beyond the breakup of Proto-Samic. One example is the word for ‘coal, ember’; in Finnic *šiili > *hiili (Fi. hiili etc.), which then appears as pseudo-PS *šilë in Southern, Ume and Pite Sami (SS sjïjle etc.); as pseudo-PS *hilë in Lule and Northern Sami (NS hilla); and pseudo-PS *ilë in Eastern Sami (Inari illâ etc.). I’ve sometimes seen also the explanation that these kind of cases would not be parallel loanwords, but rather several layers of re-loaning, with each new loanword then flushing out the previous one. This however seems unlikely to me, especially when dealing with a non-cultural term like ‘coal’ that has no reason to be repeatedly loaned from Finnic, and when the distribution of the different variants is perfectly complementary. [2]

Meanwhile, *š > *h is usually taken to be late Proto-Finnic, i.e. at least Proto-Core Finnic (probably later than at least the splitting of South Estonian though). Does this mean that Proto-Samic is therefore younger than even Core Finnic? And how does this measure up with how e.g. Jaakko Häkkinen (Jatkuvuusperustelut ja saamelaisen kielen leviäminen, osa 2: see table on p. 19) comes out with the opposite result: Proto-Samic would have broken up earlier than Proto-Finnic?

One option would be to sigh and concede that apparently words like ‘coal’ are multiple layers after all. But I would hold out for a different explanation: we can probably shift the dating of *š > *h ahead quite a bit beyond its various termini post quem. E.g. the introduction of *h → *h in Germanic loanwords into Finnic does not have to be enabled by the development of a native /h/ in Finnic; it can represent also the taking-up of a new loanword phoneme, which besides probably already existed as an allophone in the clusters *kt [ht] and/or *sl *sr *sn [hl hr hn]. In fact, since Proto-Finnic also had all four of *st *kl *kr *kn, then the introduction of [h] in both *kt and the *sR group would have already been sufficient to phonemicize it: it could be no longer identified uniquely as either /s/ or as /k/. — Again, I plan on writing a full article on this topic in the future.

This finally brings me to the topic I mention in the title. The Samic languages have borrowed *š from early Finnic also in several consonant-stem nominals. However, while these have consistently /-š/ in Western Sami, they seem to have dual representation in Eastern Sami: sometimes they surface with /-s/, sometimes with /-š/. At first sight this sounds like it might be related to the fact that some of these cases are loaned from PF *-is and not *-eš — but no, that contrast appears to be completely orthogonal.

Let’s roll out the data:

(1) Eastern Sami /-š/ ← Finnic *-eš

  • F *imeš (> Fi. ihme ‘wonder’) → S *imëš > e.g. North imaš, Inari iimâš
  • F *kadëš (> Fi. kade ‘jealous’) → S *kāðëš > e.g. North gáđaš, Inari kaađâš
  • F *laudëš (> Fi. laude ‘seat in sauna’) → S *lāvtëš > Skolt laaudâš
  • F *murëš (> Fi. murhe ‘sorrow’) → S *morëš > e.g. North moraš, Inari muurâš
  • F *säigeš (> Fi. säie ‘thread, fiber’) → S *šeajkëš > Kildin šieigaš
  • F *säigeš also? → S *sājkëš > e.g. Skolt saaiǥâš
    (This looks like a contamination of the previous word × the verb *sājkē- ‘to wear out’ reflected in most of Samic; which is probably not loan, but older inheritance from original *säjkä, as no vowel-stem forms survive in Finnic. North sáiggas ‘worn’ is then simply a native derivative from the verb, as also per the semantics.)
  • F *tarbëš (> Fi. tarve ‘need’) → S *tārpëš > e.g. North dárbbaš, Inari taarbâš
    (In this one case, with an *s-stem quite widely alongside: *tārpēs > Southern daerbies, also Lule; *tārpës > e.g. Skolt taarbâs, also Pite, Lule. Lule Sami seems to have all three variants: dárpaj, dárpes, dárpas, and even a vowel-stem dárpa. There is Finnish dialectal tarvis as well, so the diversity clearly goes back to parallel loaning in some fashion.)

(2) Eastern Sami /-š/ ← Finnic *-is

  • F *kallis (> Fi. kallis ‘expensive’) → S. *kāllëš > e.g. North dial. gállaš, Skolt kaallâš
    (in Inari *ēs-stem kaalles, apparently with a nativized adjective ending)
  • F *ruumis (> Fi. ruumis ‘corpse’) → S. *rumëš > e.g. North rumaš, Inari ruumâš
    (parallel *romës in Skolt roomâs, [3] compareable with the Fi. dialectal variant rumis from Southern Ostrobothnia; and with a vowel stem in Southern Sami: *romē > räbmie.)
  • F *rugis (> Fi. ruis ‘rye’) → S. *rukëš > e.g. North rugaš, Inari ruuvaš
  • F *valmis (> Fi. valmis ‘ready’) → S. *vālmëš > e.g. North válmmaš, Inari vaalmâš

(3) Eastern Sami /-s/ ← Finnic *-eš

  • F *kantëlëš (> Fi. kantele ‘a traditional string instrument’) → S *kāntëlës > Inari kaddâlâs
  • F *kiireš (> Fi. kiire ‘hurry’) → S *kirës > e.g. Inari kiirâs
  • F *kärmeš (> Fi. käärme ‘snake’) → S *kearmëš ~ *kearmës > e.g. North gearpmaš, Ter “kermʾs
  • F *pereš (> Fi. perhe ‘family’) → S *pearëš ~ *pierës > e.g. North bearaš, Kildin пӣрас
    (vowel-stem *pearë in Skolt piâr)
  • F *terveš (> Fi. terve ‘healthy’) → S *tearvëš ~ *tiervëš > e.g. North dearvvaš, Inari tiervâs
  • F *voidëš (> Fi. voide ‘lotion, ointment) → S *vōjtës > Inari vuoidâs
    (From Sammallahti’s reverse dictionary of Inari Sami. Álgu does not have this lexeme, so I have no idea if there are equivalents elsewhere in Samic. This could be also an independent derivative within Samic from the base verb: PS *vōjtë- ‘to grease, anoint’, interestingly an *ë-stem one instead of *ē-stem, as could be expected.)

(4) Eastern Sami /-s/ ← Finnic *-is

  • F *nakris (> Fi. nauris ‘swede (type of turnip)’) → S *nāvrëš ~ *nāvrës > e.g. North návrraš, Inari naavrâs
  • F *saalis (> Fi. saalis ‘catch’) → S *sālëš ~ *sālës > e.g. North sálaš, Inari saalâs

A few initial comments:

  1. I’ve only included cases with -s when Western Sami, or failing that Finnic, actually points to *-š. Of course F *-š ~ S *-s can be also found in older shared vocabulary, as in ‘boat’: *venəš > F. *veneš > Fi. vene; > S. *vënës > e.g. North vanas, Inari voonâs; > Mordvinic *venəš > e.g. Erzya венч /venč/. ‘Hurry’ could be theoretically also of this type; per the vowel correspondence *a ~ *ā, kaddâlâs clearly cannot.
  2. This entire word group seems to be centered on Northern and Inari Sami. Reflexes are practically absent from Southern Sami (only gïermesj ‘snake’), very rare also in Ume and Pite Sami. This would fit well together with late separate loaning from early Finnish specifically + occasional diffusion into other Sami varieties.
  3. Some of these words are originally from Germanic, and could be in theory partly borrowed directly from there into Samic, but I haven’t found any examples where Proto(-Western)-Samic *-ëš appears in a loanword without Finnic equivalents. Also, many enough cases are native Finnic, either wholly (e.g. kantele perhe säie, nauris saalis) or at least the *S-derivative is (terve); or come from Baltic (käärme). The only case where parallel loaning is clearly involved is the ‘need’ group: probably *tārpëš via Finnic, versus *tārpës directly from Scandinavian *þarbiz.

Here is a quick distribution chart, as you may wish to consult for point 2: [4]

*imëš:      - - - L N I - - -
*kātëš:     - - - - N I S - -
*lāvtëš:    - - - - - - S - -
*murëš:     - - - L N I S - -
*seajkëš:   - - - - - - - K -
*sājkëš:    - - - - - - S K -
*tārpëš:    - - - L N I - - -
*kāllëš:    - - - - N I S K T
*rumëš:     - - - L N I S - -
*rukëš:     - - - L N I - - -
*vālmëš:    - - - L N I S K T

*kāntëlës:  - - - - - I - - -
*kirës:     - - - - - I S K T
*kearmëš/s: S - P L N - - - T
*pErëš/s:   - - - L N - - K T
*tErvëš/s:  - - - - N I S K T
*nāvrëš/s:  - U P L N I S K -
*sālëš/s:   - - - L N I - - -

          S U P L  N  I  S  K T
totals:   1 1 2 10 13 13 10 8 6

OK then, caveats done with, what is actually going on in here?

Mikko Korhonen in Johdatus lapin kielen historiaan mentions passingly (p. 200) only that the *-ëš-group “appears in correspondence to the loan original’s h(“š esiintyy itämerensuomalaisissa lainoissa originaalin h:ta vastaamassa“). The same is stated in stronger terms by Mikko Heikkilä in Bidrag til fennoskandiens språkliga förhistoriet i tid och rum (p. 107), where he claims that late Proto-Finnic *h would have been adopted in Samic as *h syllable-initially and *š syllable-finally. This seems phonetically implausible to me however, given that (1) Scandinavian /h/ is regularly borrowed into Samic as /h/ ~ ∅, never as **š, (2) Finnic coda *h from *k is never borrowed as Samic **š, and (3) there definitely is also a layer of loanwords where Finnic onset *š gives Samic *š.

Heikkilä seems to suggest that late substitution as *š could have involved loaning from an intermediate stage of the *š >> *h shift, that he gives as [ç]. A palatal fricative could indeed be plausibly borrowed as Proto-Samic *š, especially if this was a palatal sibilant [ɕ] (as suggested by its origin from *ś, and its later development to /j/ in Western Sami, when before a consonant). This intermediate reconstruction is however based on a common misunderstanding. Sound changes of the type *š > *h do not involve a trek through every single intervening POA you can find on an IPA chart! [5] These are rooted in the tendency of retroflex consonants in particular to acquire a velar coarticulation, which can then take over as the primary POA; and also for spirants such as [x] to lenite to [h]. Palatal [ç] would be overpassed entirely in this process.

So I see no other explanation than that the cases of *-ëš ~ *-eh must have been borrowed before the Finnic sound change *š > *h (before the loss of the sibilant feature, to be exact). And the distribution suggests that Proto-Samic would have been by this point already quite thoroughly broken up: after all, these words seem to have been borrowed independently mainly into the precedessors of L N I S. Perhaps Proto-North-Lule and Proto-Inari-Skolt at the deepest, in case such entities could be assumed (usually classifications of the Sami languages go with Pite-Lule and Skolt-Kola groupings instead, but I am not entirely sold on this).

In other words, I answer my headline question regardless in the negative: no, there did not exist any *š-stems yet in Proto-Samic, not even in any possible early subgroups like Proto-Western, Proto-Eastern or Proto-Non-Southern; they have only come about later through contacts with early Finnic.

I have not invented any real explanation for the dual treatment of *-š in Eastern Sami. For this, I can only offer a few hypotheses (that all point in different directions):

  • maybe early on there was a sound change *-š > *-s in Eastern Sami, and cases with retained are newer loans, perhaps partly from Northern Sami (since they seem to be fairly rare in Kola Sami)? This probably could not be equated with the general Samic shift of original *š to *s, since there are many enough good examples of the retention of prevocalic PS *š- in Eastern Sami, and none of unexpected *s- (that I know of). [6]
  • maybe Finnic *š was for a while again borrowed in Eastern Sami as *s, due to being increasingly non-palatal [ʂ], while cases with *-š are older loans from an [ʃ] stage?
  • maybe a lost Finnic variety has been involved where word-final *-š > *-s? A late analogical development of *-h-stems to -s-stems is known from Southern Ostrobothnia… which is however nowhere near the attested Eastern Sami languages.

Going by the vowel substitutions also diverging in *pearëš ~ *pierës, *tearvëš ~ *tiervës and *rumëš ~ *romës, the last two explanations sound somewhat better than the first.

This problem very likely needs to be further tied in with *-eš ~ *-is variation appearing even within Finnic, again largely with a West-East divide, such as Western Fi. tarve ~ Eastern Fi. tarvis; Fi. käärme ~ Karelian keärmis; Fi. säie ~ Karelian säijis; Fi. laine < *laineh ~ Olonets-Ludian-Veps lainis ‘wave’. But it is not clear to me if this is good enough to run with my third hypothesis, since there seems to be very little correlation in the occurrence of alternation in Eastern Sami versus in Finnic: there is no e.g. **seajkës in Samic, and more importantly, no **kantelis, **peris or **nakrëš, **ruumëš, **saalëš in Finnic.

[1] YSS has been now added to my fairly slowly growing Bibliography. If anyone’s curious, the five words with *š ← *s(i, ü) are: PS *šëlëtē ← PF *siledä ‘smooth’; Western *šëljō ~ Northern+Eastern *šiljō ← Fi. silja ‘courtyard’ (clearly rather one of the post-PS loanwords); PS *šëlmē ‘eye of an ax’ ← PF *silmä ‘eye’; PS *šëltē ← *silta < PF *cilta ‘bridge’; PS *šëntë- ← PF *süntü- ‘to become, be born’. I also suspect that PS *šōjē ‘rowan’ may derive from PF *sooja ‘protection’, as in Finnish (as also e.g. Germanic) mythology / folk belief the rowan tree has been considered to grant protection to the homestead. It’s not quite clear why would we have *š- and not *s- here, though. An independent loan from the same Indo-Iranian source (*sćāyā- ‘protection’) would also work.
[2] A slightly better explanation along almost the same lines might be “etymological alienization”, where the existence of Finnish hiili would have prompted a reshaping of e.g. expected Northern Sami ˣšilla into hilla, possibly fairly late then. This does not seem to be feasible in the case of Eastern Sami, though: in particular Inari and Skolt Sami have only come into intensive contact with Finnish fairly late, but the lack of of ˣh- indicates relatively early loaning. (IIUC /h/ → ∅ has remained the default case in contacts between Karelian and Kola Sami, however.)
[3] Álgu gives for this a comparison with North ruomas ‘wolf’, which looks like a rather recent (taboo? epithet?) borrowing from Skolt.
[4] S U P L N I S K T for Southern, Ume, Pite, Lule, Northern, Inari, Skolt, Kildin and Ter Sami respectively. Yes, that’s “S” appering twice, but you can figure this out.
[5] One impressive example of this approach is the development path “ʃ > ʂ > ç > x > χ > ħ> h” given in Kallio’s “Kantasuomen konsonanttihistoriaa”.
[6] Amusingly but probably unrelatedly: in “An essay on Saami ethnolinguistic prehistory” Ante Aikio mentions five examples of the “opposite” correspondence, with *s- in Western Sami ~ *š- in Eastern Sami.

Proto-Uralic *ë in Mari

Mari is one of the key languages for the reconstruction of Proto-Uralic *ë, in having a mostly unique reflex *ü > Hill Mari /ü/ ~ Meadow Mari /ü/. The only other known regular source of this vowel correspondence is would-be *ü̆ (from earlier *ü, *i, *e) in roots of the shape *CV, such as Hill Mari /šü/ ~ Meadow Mari /šüj/ ‘neck’, from PU *śepä.

The development *ë > *ü was first explicitly proposed by Wolfgang Steinitz in his Geschichte des finnisch-ugrischen Vokalismus (1944) (in his notation: *i̮ > *ü). This fact has been later on essentially forgotten, though. E.g. fifty years later (1994), Gábor Bereczki in Grundzüge der tscheremissischen Sprachgeschichte recognizes only two examples theoretically falling under this: /šüm/ ‘bark, crust’ (< *śëmə ‘scales’) and /nölə-pikš/ ‘blunt-tipped arrow’ (< *ńëlə ‘arrow’), which he furthermore explains, following Erkki Itkonen’s views from 1954, instead as “sporadic” fronting from *u and *o. [1]

The grounds would have been ripe for a reassessment of the historical vocalism of Mari already since the rehabilitation of *ë by Janhunen and Sammallahti in the 80s. It has been taking a bit longer, though. The next source after Steinitz that is on board with his theory seems to be a footnote by Ante Aikio in his 2006 article “Etymological nativization of loanwords”, [2] hence adding up to a blackout period of more than 60 years. I believe this has been an independent rediscovery rather than a revival as well. Aikio notes also that the conditions for the change are unclear, and it is indeed the case that PU *ë as reconstructible per the evidence of the other languages is often enough reflected also as Mari *å (Hill /a/ ~ Meadow /o/) or *o (Hi. Me. both /o/). So how should we deal with these cases? [3] By now we have at least one initial suggestion, by Mikhail Zhivlov from 2014, that *ü would be the default reflex, *å ~ *o the reflex before the velars *k and *ŋ (but not *x).

The split *å ~ *o remains unclear for now as well, but this is also the typical development of *a, hence we seem to be dealing with an early lowering *ë > *a, as also in many other Uralic branches. This is general in West Uralic; in Permic it seems like the most common development (followed then by *a > *o > *u), versus retention *ë > *ë mostly before sonorants; and Hungarian has “a-umlaut”: *ë-a > *a. I suspect on the other hand that *ë-ə > *aa in Khanty is the result of relatively late lowering from earlier *ëë, which could be connected to the same change in Northern and partly Eastern Mansi. (The development *a > *aa is attested too, but rare, and more common developments like *a > *a, *a > *oo, *a > *uu seem to require room for maneuvering in pre-Khanty.)

After having looked over the data [4] once more though, I have settled on a different view: the primary conditioning seems to be instead syllable closure. (This is one of what I think of as the “stock” conditioning features for divergent vowel developments, along with metaphony and labial/palatal coloring due to neighboring consonants. [5])

1. *ë > *ü in open syllables (before a single consonant):

  • *ëla > *üla > *ül ‘under’
  • *ëŋas(V) > *üŋəSə ‘rested’ [6]
  • *jëxə- > *jüä- ‘to drink’
  • *lëCə-ta- > *lüðä- ‘to fear’ [7]
  • *lëčə- > *lüčä- ‘to get wet’
  • *mëxə > *mü-ländə ‘land’, *mü-ðe- ‘to bury’
  • *ńëlə > *nülə ‘arrow’
  • *ńërə > *nürə ‘flexible’
  • *sënə > *sünə > *Sün ‘sinew’
  • *sëtə > *Süðər ‘spindle’
  • *śëmə > *śümə > *Süm ‘bark, crust’
  • *śëta > *Süðə ‘100’
  • *wajə ~ *wëjə > *ü(j) ‘butter’

2. *ë > *a > *å ~ o in closed syllables:

  • *ëkta- > *akta- > *opte- ‘to put, place’
  • *lëkśə- > *lakśə- > *lokSə-ńća- ‘to chop’
  • *lëntV > *lantV > *lånda-ka ‘valley’
  • *mëksa > *maksa > *mokS ‘liver’
  • *ńëčkə > *načkə > *nočkə ‘wet’
  • *ńëkćəmə > *nakćəmə > *nåSmə ‘palate’
  • *pëŋka > *paŋka > *poŋgə ‘mushroom’
  • *tëktə > *taktə > *toktə ‘loon’
  • *wëlkətə > *walkətə > *wålɣəðə ‘light’

Set #2 certainly shows a lot of velars following either immediately (*kt *kś *ks *kć *ŋk) or as the second member of the cluster (*čk *lk), but this probably doesn’t need any explanation other than the general abundance of *k in the PU consonant cluster inventory. There is also one case with *nt. Set #1 meanwhile has only one example with *-ŋ-, but similarly, in CVCV roots *-k- is by contrast rather rare.

This pattern is complicated though by suffixation and consonant cluster simplification processes in Mari. In these cases we find both *üC and *aC, and I would hypothesize that this means that the split of *ë dates in-between some of these.

3. *ë > *ü also in secondarily open syllables:

  • *ëptə > ? *ëpə > *üp ‘hair’
  • *mërja > ? *mëra > *mür ‘strawberry’
  • *sëntə- > ? *sëtə- > *Süðä- ‘to clear woodland’

4. *ë > *a also in secondarily closed syllables:

  • *ďëmə → *ďëmə-pawə > ? *ďëmpa(w) > *lampa > *lombə ‘birch cherry’ (a compound with the word for ‘tree’ as the second member)

5. *ë > *a in “tertiarily open” syllables?:

  • *ëppə > ? *appə > *owə ‘father-in-law’
  • *këččə > ? *kaččə > *kåčə ‘bitter’
  • *lëmpə > ? *lampə > *lop ‘depression in ground’
  • *wëlka- > ? *walka- > *wåle- ‘to go down, descend’

6. *ë > *ü in “tertiarily closed” syllables?:

  • *ńërə(-ka) > ? *nürə-kA > *nürɣə ‘cartilage’

But it’s also possible that there are a few other, smaller conditioning factors here as well. It seems somewhat dubious to me in particular to to end up dating *mp > *p (in *lop) as younger than *nt > *t (in *Süðä-). In principle most cases here could be also further confirmed or falsified by other results on the chronology of consonant cluster simplification in Mari.

This hypothesis also points towards a different line of explanation for some other instances of Mari *ü. There is at least one case where we find *ü in a closed syllable clearly retained from PU, in what seems like an original back-vocalic environment. This is *jükSə ‘swan’, for which I have earlier sided with reconstructing *ë … but since none of other languages show evidence particularly in favor of *ë, maybe a development *o > *u > *ü or the like will be a better explanation for this one case after all.

[1] I don’t think I can dismiss strongly enough, in polite company at least, the notion of reconstructing “sporadic” sound changes. As some readers know, my (hopefully soon-to-be-wrapped-up) Master’s thesis treats the research history and reconstruction of the Proto-Finno-Ugric long vowels. One meta-result of this work has been that, by now, I see Itkonen’s insistence on sporadic sound changes as having prevented substantial progress in the reconstruction of comparative Uralic vocalism for just about half of the entire 20th century (to some extent even up to today). This device is not much more than a license to stop thinking — to avoid placing a given language group’s phonological structure in a general comparative context, and therefore, to be unable to discover more parsimonius explanations such as properly conditional splits. Closer to the topic though: I cannot blame Bereczki very much for not seeing /ö/ and /ü/ as etymologically equivalent, since the lowering of *ü to /ö/ (perhaps better: retention?), as later unraveled in detail by Aikio, has at least somewhat complex conditioning.
[2] In Diane Nelson & Ida Toivonen (eds.): Saami Linguistics, pp. 17–52.
[3] Steinitz did not have any trouble with these exceptions, since he postulated extensive original “ablaut” variation such as *a ~ *i̮ as a data-cleaning deus ex machina of his own.
[4] Three of the cases in section 1 are absent from all three recent overviews of the development of *ë either in general or in Mari in particular, i.e. Aikio 2014, 2016 and Zhivlov 2014 (see Bibliography). (1) The reconstruction *ëŋas(V) ‘rested’, reflected also in Samic *vōŋēs, is from Aikio’s PhD thesis (2009: 289). (2) *sëtə (> Mo Ma P) can be found in already in UEW (as *setɜ, and with Erzya /sad/ ‘stem, trunk’ rejected, though it fits perfectly under *ë; this may be a better etymology for the Mari word than the comparison with Finnic *kecrä, Mordvinic #kšťəŕə ‘spindle’ ← pre-II *ketstra-). (3) The ‘butter’ word has been consistently reconstructed only as *wajə (*waje, *wōje etc.) so far. Aikio 2016 notes that Samic and Mordvinic point to *ë — but so do also Mari as well as Udmurt /vɤj/. Finnic *voi (regular from both) and Komi /vɨj/ (irregular from both, though possibly less so from *ë) don’t allow disambiguating; therefore it is only the Ugric reflexes that point to *wajə, and perhaps it is them that have innovated here, not the western languages. — An additional similar case is (4) *lëčə-, appearing already in UEW as *lače-, and covered by Zhivlov, but not Aikio.
[5] I do not rule out other consonant-environment-related changes, of course. For just one of my favorite examples of something less obvious, there is how the labialization of earlier /wa/ to /wɔ/ in Early Modern English (later > /wɒ/, /wɑ/, /wɔːɹ/ etc.) (dwarf, quarter, swan, swap, walk, war, was, what, etc.) is blocked before velars (quack, twang, wag, wank, wax, whack etc. instead have the usual development /a/ > /æ/). But I would be hesitant to apply this type of explanation too liberally. At its worst this can turn into over-fitted sound laws where each specific environment applies to no more than one or two words.
[6] I’m leaving aside here the only marginally dialectally retained contrast between PU *s, *ś and *š, which is irrelevant for the present issue.
[7] A trisyllabic reconstruction with a lost middle syllable (all of *lëjə-, *lëwə-, *lëxə- and even *lëkə- would work) seems to be required to account for the correspondence between Mari /-ð-/ (normally < *-t-, *-tt-, *-d₂-) and Samoyedic *-r- (normally < *-r-, *-d₁-). The lenition of *-t- to *-ð- > *-r- in the latter, regular after noninitial syllables, seems to have taken place also in “contracted” roots of this type. Compare *jëxə- → *jëxə-ta- > *ë-r- ‘to drink’.

Tagged with: , , , ,
Posted in Reconstruction

Studia Uralo-Altaica Online

This Tuesday night, while looking for something else entirely, I’ve accidentally stumbled on another linguistic publication series making the leap online (a few years ago already in fact): University of Szeged’s book series Studia Uralo-Altaica, including also its Supplementum sub-series. I already had a number of these scans through indirect channels, but there are also many items of interest I either did not have yet in digital shape (e.g. László Keresztes’ two-volume Geschichte des mordwinischen Konsonantismus), or which I’ve not gotten acquaintanced with before at all (e.g. most of the Turkological literature).

The UoSz archive unfortunately does not give a single listing on the series’ contents, and the collection volumes have been split into separate articles (including forewords & such) — so for general convenience, here are all the volumes together in a single list:

Or let’s make it two lists, and keep the Supplementa in their own one:

Sometimes I feel I’d like to see an anti-etymological dictionary.

Given two or more different etymological dictionaries, especially for an entire group of languages, typically one of them (usually from the older end) is going to end up being less critical, while another one (usually from the newer end) is going to end up being more critical. If we want to know what is known so far about a word’s etymology (cognates, reconstruction, etc.), we’d look in the more modern dictionary, of course. But if we want to know what is not known about a word’s etymology — i.e. what research questions are still open? neither of these sources is really going to work. What’s needed for this is, at a pinch, the difference between them.

Sometimes older separate etymological groups get combined into a single one, and sometimes older single etymological groups will turn out to comprise unrelated words and will be disassembled into various different ones (maybe under different native roots, but maybe also as loans or derivatives). This is all no major problem so far, especially if newer research will bother to mention that earlier, zoop was considered cognate with foop, but per current understanding it is actually cognate with doop.

But etymologies can also simply vanish from the literature record without comment, or with minimal comments along the lines of “strike this” (this latter type I’ve seen in erranda or in “update notes” to new editions). This I find unsatisfying. Even when an explicit reason has been given (”the correspondence z ~ f is irregular”), if this merely renders the compared words without etymology, then we are again back to square one on what the words’ origin actually is. Or, for that matter, on why the earlier observed similarity exists at all?

It is possible for similarity to exist for reasons other than by proper common inheritance or pure random chance: loans between related languages, loans in parallel from a third source, common inherited morphology applied to different roots, contamination between semantically nearby words, universal onomatopoetic patterns… Traditional etymological dictionaries I’ve only seen commonly apply the last of these with any consistency. The first is usually invoked only in cases of obvious, long since established layers of loanwords (in Uralic context e.g. Finnic → Samic, Komi → Ob-Ugric). The second thru the fourth are rarely explored at all.

So I would hope for truly thorough etymological dictionaries to also include a discard pile of words and comparisons from earlier literature that remain without an adequate explanation, something which would definitely make future etymologists’ work slightly easier.

I am currently doing some “antietymological” groundwork myself: charting how much content there is in Collinder’s Fenno-Ugric Vocabulary that is not reproduced also by later sources (mainly the UEW on one hand, Janhunen’s Samojedischer Wortschatz on the other). It is not a lot, and most of the omissions are clearly dregs, but some small part of the material remains interesting. It is even possible to find examples that have later reappeared again: one is the comparison of Mari *lüðä- ‘to fear’ with Samoyedic *lër(ə)- ‘to be afraid’, rediscovered by Ante Aikio in his paper on new Mari etymologies from a few some years back.

A much bigger amount of work, however, would entail somehow bridging the still largely aligned FUV and UEW etymological corpora with the more heavily pruned ones in Janhunen 1981 and Sammallahti 1988. For most of the comparisons rejected by the latter two authors as insufficiently regular, this has been done quietly, without any arguments given at all. This may very well have allowed in increases in historical phonology, but at the cost of what seems like a hefty step back in how much we can claim to know about Uralic etymology.

Even further observations could be perhaps made by taking a look at even earlier etymological compendia: Budenz’ Magyar–ugor összehasonlitó szótár (1873–1881), Donner’s Vergleichendes Wörterbuch der finnisch-ugrischen Sprachen (1874–1888), as well as the extensive material quoted in the major historical phonology overviews that followed in their wake, such as Paasonen’s “Beiträge zur finnischugrisch-samojedischen Lautgeschichte” (1913). I again know of some recently rediscovered etymologies that have first been suggested already around this time or even earlier. Especially the first two include etymological comparisons still more boldly than FUV and UEW though (which were at least constrained by mainly compiling etymologies from already published literature), so the junk to real forgotten goodies ratio would surely be still lower.

There’s also another sense in which “anti-etymologies” could be compiled from this period, however. This far back it is not difficult at all to find comparisons that have been rendered firmly obsolete by now, not just left into a limbo of “irregularity”. These might be illustrative in showing how has etymological progress been achieved over the last 100+ years. Have they been superceded by new native comparisons enabled by new data? by loanword etymologies? by new morphological analyses? something else? … and the results of such a survey could perhaps be then used as a roadmap for future research as well, to work out what’s likely and what’s not likely to continue to provide new results.

Phonology squib: ‘Clay’ in Proto-Uralic

I have a principle that applies quite often when working with quantity-over-quality mass comparative dictionaries (papers, databases, etc.): what is asserted without evidence can be dismissed without evidence.

The UEW is, unfortunately, a repeat offender on assertions without evidence. This comes up maybe the most with its own reconstructions, which do not seem to follow any definite scheme: there definitely isn’t one expounded on anywhere in the book, and to my knowledge none of the editors have published detailed papers on the topic, either. [1] This results in many junk reconstructions that seem to have only been hastily eyeballed together, sometimes with crass errors.

To avoid excess alarmism though: by “its own reconstructions”, I mean only a subset of the Proto-Uralic (Proto-Finno-Ugric, -Permic, etc.) reconstructions presented, those that seem to have been put together for the first time by the UEW team. Many of the reconstructions are however not all-new, and have been inherited from earlier research. Maybe the most direct source is Collinder’s Comparative Grammar [2], but various bits also trace back to earlier studies on historical phonology, such as Itkonen’s comparative vocalism surveys, or Paasonen and Setälä’s early 1900s Neogrammarian works that mainly involved consonantism, or even the 1800s comparative dictionaries of Budenz and Donner. Alas, none of this is explicitly referenced, and so the reader is left in the dark. Determining what, if anything at all, some particular reconstruction is based on would take a wild goose chase through the un-annotated list of literature found at the end of each entry.

(For non-specialists in Uralic reconstruction, as a quick rule of thumb I would say: any reconstruction with cognates in Finnish + at least two other Uralic subgroups can be treated as relatively safe; so can all remaining reconstructions that are continued in 6+ subgroups, which are usually given in bold; anything continued more narrowly is in principle suspect; anything prefixed with a question mark should be treated as unreliable entirely.)

Even if many of the UEW’s reconstructions are junk, this does not however imply that the etymological comparisons they are attached to would also be. Sometimes it will be fairly easy to work out a better reconstruction. Today I have taken a look at a word for ‘clay’ that the UEW reconstructs as *śojwa, and noticed that this seems to not match any of the descendants given…

Not absolutely everything is wrong, of course. The consonant skeleton *ś-jw- works well enough: we have entirely regularly Samic /č-/ ~ Permic /ś-/ ~ Samoyedic /s-/, and S /-jv-/ ~ P /-j-/ ~ Smy ∅ is reasonable. But the vowel reconstruction *o-a seems to be not really defensible.

  • In Samic, we have reflexes only in Kola Sami: Kildin /čuwwj/ (though apparently чуййв in the written language), Ter /čujjvɛ/. These nominally suggest Proto-Samic *čujvē — but, from earlier *śojwa, we would instead expect to see PS *čoajwē > Kola **čuəjjve. Compare PU *ojwa ‘head’ > PS *oajvē > Kildin /vuəjjv/ вуэййв, Ter /vɨəjjvɛ/.
  • In Permic, we have *o > Komi /o/ ~ Udmurt /u/. This is not a regular reflex of *o: it instead usually continues PU *a or *e. There are various other claimed cases of *o > *o (at least *kojə-ma > *kom ‘male’ — the source of the ethnonym Komi — seems unassailable, even if still possibly irregular), but normally we would expect *o-a to give *u.
  • The Samoyedic examples are a bit hard to assess offhand: we have reflexes only from Selkup and Kamassian, and so Janhunen’s Samojedischer Wortschatz leaves this word unconsidered. /üü/ in the former can go back to various pseudo-diphthongs; including *åj (*såjtə- > /süütɨ-/ ‘to sew’), *oj (*tojmå > /tüüm(ɨ)/ ‘larch’), *uj (*jujtə- > /küütäptɨ-/ ‘to dream’), *əj (*pəj > /püü/ ‘stone’), even *äj (*päjwä > /püü/ ‘warm(th)’). Kamassian /e/ does not seem to match any of these on a quick checkup, but there are probably various conditional developments involved that blur the picture. PU *o-a regularly gives PSmy *å-(å), so maybe the first is what we should bank on… However, in an *A-stem, *jw would be expected to remain in PSmy; and result then in *ľć in Selkup. [3]

The Kola Sami ~ Permic vowel correspondence can be however quite well derived from *a-a; developing to *ō-ē in Proto-Samic. This normally later gives /uu/ in Kildin, /ɨɨ-ɛ/ in Ter, but presumably (see below) earlier *uu was shortened here to /u/ before it could unround in the latter. *a-a also gives Samoyedic *å(-V), i.e. works at least as well as reconstructing *o-a.

I would also favor reconstructing medially *-wj- instead of *-jw-. UEW, I imagine, bases the latter on Ter Sami; however this is actually non-diagnostic, since in the language, there is regular metathesis of PS *-vj- to *-jv-. The Kildin form should be therefore instead taken as evidence for *-wj-. (In literary Kildin Sami, it seems that Ter-esque -ййв- is preferred in place of *-vj-, e.g. тоаййв ‘often’, while T. I. Itkonen’s Koltan- ja kuolanlapin sanakirja gives /tɑwwj/. Does this maybe stand for dialect variation within the language?) This in mind, the ad hoc-sounding shortening (*a > *ō >) *uu > *u also makes decent phonetic sense: we’d be dealing with [uːw] > [uw], a contrast that seems difficult if not impossible to maintain.

I believe no exact precedents are known for the development of *-wj- in Permic, but in general *-w- is lost always, while *-j- remains at least in various clusters; so *-wj- > *-j- seems about as good as could be expected. As for Samoyedic, *-w- is lost syllable-finally: this means we’d expect *śawja > *såj(V), which is at least a decent contender for the Selkup-Kamassian preform. (Preferrably not *såjå; contrast *kåjå > Kamassian /kuja/ ‘sun’. *-a > *ə is however quite common in Samoyedic, maybe in particular after (original?) consonant clusters.)

Altogether, I end up with the conclusion that all words given by UEW under *śojwa are better considered to continue Proto-Uralic *śawja.

These adjustments also open some new vistas. They allow the possibility to consider that my new and updated reconstruction might be a part of the same original root its established synonym: *śawə (UEW: *śawe). This is continued directly only in Finnic (*savi > Fi. savi etc.), but also in various derivatives: *śawə-nV in Mordvinic (*śovəń > Erzya & Moksha сёвонь) [4], Mari (шун) and Komi (сюн); *śaw(ə)-d₂V in Mansi (*suwľ(V) > Northern сӯли) and Khanty (*sawəj > *sawïï) [5]. It seems therefore likely that also the *śawja group is similarly originally a derivative *śaw(ə)-ja. The exact morphology going on remains however mysterious. *-nV is only known as a vague diminutive suffix; *-ja usually forms action nouns; *-d₂V is, to my knowledge, not reconstructible for Proto-Uralic at all (there may be one other parallel within Ob-Ugric though: *ńooɣəď ‘meat’, maybe *ńaKV-d₂a).

It would be also possible to shuffle the *-ja and *-d₂V groups around a bit: *-j in Khanty and Samoyedic can continue either just as well. At least the Mansi form with *ľ and the Samic & Permic forms with *j however must be distinct from each other.

[1] Editor-in-chief Rédei has arguably taken some steps towards this in his 1968 article “A permi nyelvek első szótagi magánhangzóinak a történetéhez” (NyK 70: 35–45). His “pre-Permic” vowel system does end up being identical to the Proto-Uralic vowel system that is currently accepted the most widely, but this may be just a happy accident: he makes no effort at all on the issues of if and how the other Uralic languages could be derived from the same system; and his treatment of which particular original vowel should be assumed in which particular words is very patchy as well, covering only some incidental examples.
[2] His Fenno-Ugric Vocabulary gave only comparative data; their associated reconstructions were only given in an appendix to CompGramm., wherein he had presented his thinking on Uralic comparative phonology and morphology as well.
[3] This oddball soundlaw probably proceeds something like *jw > *jj > *jɟ > *ʎɟ > *ʎtɕ = *ľć.
[4] *o is, I believe, due to the following development: first *a-ə regularly > *å-ə > *o-a, followed by a conditional split: *o > *u before a velar sonorant (regularly established in the case of *-oŋ- and IMO also occurring in the case of *-olk-); lastly *u > *o.
[5] With Kazym /sŏwĭ/, Krasnoyarsk (Southern) /săwə/ regularly retaining PU *-w-.

