A research project wishlist

I’m only starting out on real scientific publishing (it looks like my first squib-size article, currently in peer review, will be out in early 2019), but during the years I’ve run this blog and worked on my thesis, I’ve already racked up a fair-sized publication plan and stack of article drafts. There will be roughly one for each of the various conference presentations I’ve given so far, maybe a dozen that would expand on various blog posts, and a handful of thesis work leftovers. Many others have not been announced in any fashion.

Looking at the far end of the list though, I think I’ve been tacking on also ideas that aren’t really research plans as much as things I wish someone would do. Many of them call for substantial background work, and in the foreseeable future of 5-10 years, they will be unlikely to fit on my plate. The following are free for grabs, if anyone reading by any chance happens to be looking for research project ideas:

  • An updated handbook on the history of Finnish — the last updated version of Hakulinen’s Suomen kielen rakenne ja kehitys came out in 1979, and a lot has happened since then. In particular the overview of native vs. borrowed components in the Finnish lexicon seems long out of date.
    — I would likely start on some component of this myself if nothing has happened by let’s say 2030, but that’ll be a while still.
  • A study of the lexicon of Kukkuzi Ingrian/Votic. Researchers have waffled back and forth on if this Finnic variety should be considered a variety of Votic with an Ingrian superstrate, or a variety of Ingrian with Votic substrate, mostly on phonological and morphological criteria. With the 2012 release of the extensive Vadja keele sõnaraamat, it should be possible to investigate if there is also an anomalous amount of vocabulary that’s either present in Kukkuzi and absent elsewhere in Votic, or absent from Kukkuzi but well-represented in the other Votic dialects.
  • Similar studies to the previous could be probably also done to check how coherent “Ingrian” really is (with or without Kukkuzi) — the main varieties are clearly delineated, and all show their own similarities varyingly also with Votic, Ingrian Estonian, the two Ingrian Finnish varieties (Savakko and Äyrämöinen), Southeastern Finnish, and Karelian. There could be other Kukkuzi-esque misanalyzed varieties mixed in here as well.
  • A comparative reconstruction of Proto-Hungarian, based on not just the philological Old Hungarian evidence but also the evidence of the various Hungarian dialects. Handbooks sometimes state that all the modern dialects could be derived from “Middle Hungarian” circa 1500–1600, but this is obviously nonsense at least in the case of the Székelys. Many other dialects could also have diverged earlier, only to be later assimilated back towards the mainstream. Loanword evidence would be also important (for one thing, they completely destroy the theory that Old Hungarian would not have had vowel length), and obviously Uralic ancestry would have to be kept an eye on too. — Sometimes the term “Proto-Hungarian” is used instead for the prehistorical pre-migration ancestor of Hungarian, but I cannot recommend this practice: this time depth is firmly within the “single-branch” phase of Hungarian and cannot be probed by the comparative method.
  • A study of substrate in Ob-Ugric. Mansi and Khanty gain their similarity from at least three sources: the two are related (minimally within Uralic), they form a common language area (as shown by isoglosses that only cover parts of both languages) and share later contact influences (most importantly from Komi, Tatar and Russian), but on archeological and anthropological grounds, an additional fourth source could be a pre-Uralic substrate of western Siberia (Helimski’s “Yugra”). What comes up if we apply modern methods of substrate language research to the two?
  • A comparison of the Ugric and East Uralic hypotheses. There is by now a good amount of data that has been collected purportedly in support of a common Ugric (Hungarian–Mansi–Khanty) group within Uralic; but it has been pointed out that the original and clearest point of evidence, the rearrangement of the PU sibilant system (traditional formulation: *s *š *ś > *θ *θ *s, later *θ > Hung. ∅ ~ Mansi *t ~ Khanty *ɬ) applies also to Samoyedic, leading to a larger grouping recently named “East Uralic”. This is the case for at least a few other features too. Does all this end up showing that either or both of these groups should be considered areal?
    Some other possible sub-angles include: is some of the common Ugric vocabulary better considered loanwords e.g. from Hungarian into Ob-Ugric? Can previously unidentified OU-Samoyedic cognates be found? How many of the commonalities could be potentially interpreted as shared retentions rather than shared innovations? How does the alleged Ob-Ugric subgroup compare with either hypothesis?
    — I will be doing at some point at least the related comparison of the East Uralic hypothesis with its clear opponent, the long-standing Finno-Ugric hypothesis (which, as far as I can tell, has always remained merely a glorified assumption that has never been studied in detail, either pro or con).
  • A bibliography of Indo-Uralic studies, either a simple list of works, or a more detailed breakdown by etymology. It would be interesting to see e.g. how much of the compared material across the times is individually reconstructible within the two families … there is sometimes “cherrypicking” of words from just one subfamily, and in at least some cases they turn out to be clearly better analyzed as loanwords from IE into Uralic.
  • Studies on the history of extensively spread areal sound changes. Two that come to mind easily are w > v, found pretty much everywhere between the Atlantic Ocean and the Urals; and p > ɸ > h/f, found across Eurasia roughly in a belt from Hungary to the Aleuts, as well as across most of Northern Africa plus Arabia. It is not clear to me if the two last-mentioned are really two separate areas, or rather just one, or perhaps more than two.
  • A look at what level of language Zipf’s Law follows — orthography, phonology, phonetics? (This could have been done already, I have not searched for this in detail.)
New and updated links

Updates to blog sidebars are easy to overlook. So, this is to note some historical-linguistics-related journals or publication series available online that I have added links to recently:

  • A nyelvtörténeti kutatások újabb eredményei
    Article collection series from University of Szeged. The archive includes also several smaller release series, and the more specifically Hungarological series A mai magyar nyelv leírásának újabb módszerei.
  • Keleti szemle / Revue orientale (previously tangentially mentioned on Tumblr)
    An early 20th century Hungarian journal. From the viewpoint of this blog, one noteworthy contribution is Heikki Paasonen’s six-part article series “Beiträge zur finnischugrische-samojedischen Lautgeschichte” in vols. 14–17 (1912–1917). This is maybe the culmination point of pre-World War Neogrammarian efforts in Proto-Uralic phonological reconstruction, charting out consonant correspondences from PU to Samoyedic in nearly the same shape as known today. (I am considering posting more detailed a commentary later on.)
  • Magyar nyelvőr
    Long-running Hungarian linguistics journal. Recent issues are available online too, but I’ve linked instead the older archives, which include also some general comparative studies. E.g. volumes 11–12 (1882–1883) have, in several installments, Munkácsi’s lengthy re/overview of the Finno-Ugric theory as seen through Budenz’ comparative dictionary. (The archives could really use an index, though.)
  • Studia orientalia (previously tangentially mentioned in my post on Meshcheran)
    An ongoing e-journal on one hand, a sizable monograph etc. archive on the other, covering several fields: literature, linguistics, sociology, ethnography etc. The most common topics are Indology and Altaistics, with occasional coverage also from Africa and elsewhere in Asia. A few of the article collections even have Uralistic and Indo-Europeanist tidbits, e.g. Bertil Tikkanen and Asko Parpola‘s Festschriften.

I have also fixed the broken Fenno-Ugrica Suecana link, and added a permalink to my post indexing the Studia Uralo-Altaica series, which seems to be a popular visitor destination.

Notes on the phonology of Kamassian

For a language family mostly made up of minority languages, Uralic is really quite well documented by any standards. Most of the smaller languages have received decent descriptions already in the 19th century, and many also theoretically updated reflections later on in the 20th and 21th. The big exception had for long been the Samoyedic languages, with the literature being mainly dictionaries and comparative studies, and only Tundra Nenets being described in good detail by multiple scholars. By now however, even the Samoyedic situation has improved. Within the last few decades, Nganasan has received a reference grammar by both Wagner-Nagy (2002) and Katschmann (2008), Forest Nenets by Pusztay (1984), Forest Enets by both Künnap (1999a) and Siegl (2013), Kamassian by Künnap (1999b), Mator by Helimski (1997) (and that’s all before me having looked too much into the literature in Russian). For Selkup I’m not sure of a source worth singling out, but there’s a fair amount of scattered literature. Tundra Enets is the only well-delineated variety I do not know to have been specifically covered by anyone, though it’s also very close to Forest Enets anyway (treating them as two different languages entirely has not struck me as warranted).

At this point it then seems that not only has descriptive work on Samoyedic caught up with comparative work, on a few fronts the former has passed the latter altogether. Janhunen’s classic Samojedischer Wortschatz from 1977 closely follows his 1976 reconstruction of the vowel system of Proto-Northern Samoyedic, [1] and in particular southern Samoyedic data seems to not be quite systematically integrated into the reconstruction. This task has been later accomplished only for Mator in Helimski’s monograph. For Selkup, there have been numerous specialized studies, [2] but AFAIK no substantial synthesis so far.

The situation is the most haphazard for Kamassian. There are overview notes in comparative Samoyedic works like Paasonen (1913–1917) and Mikola (2004), and also in Collinder’s even more generalist Comparative Grammar; as well as two monographs by Künnap on historical morphology. But that’s about it. To my knowledge, no-one has ever taken a good detailed look at the basic phonological and etymological issues of the language.

With today’s resources, this will be not an especially hard task. I already have my WIP database of proposed Proto-Samoyedic etymologies in a decent shape. Extracting a list of etymologies extending to Kamass and/or Koibal takes 30 seconds; gathering the corresponding reflexes from SW a bit longer, maybe two days’ worth of work. Double-checking other sources would take longer though. This is especially due to inconsistent treatment of the most thorough records available, those of Kai Donner in the 1910s. Janhunen gives what I believe is Donner’s full transcription, but others seem to habitually drop “unnecessary” diacritics.

However, if we don’t have a solid grasp of even the basic phonological inventory of Kamassian, can we really be sure which diacritics are “unnecessary”? Overviews like Künnap’s grammar or the 1998 handbook on Uralic languages seem to present basically a bare minimum system with just the “base letters” extracted. Which is a good place to start from, but already a look at the other Samoyedic languages (e.g. Selkup with its giant vowel system, or Nenets with its extensive palatalization) suggests that more complexity could have plausibly existed.

I offer here, for starters, two suggestions for amending the synchronic phonology. A full survey of the Donner materials would be required really, but already the inherited Samoyedic component seems to allow tentative conclusions.

The uvular stop /q/ should probably be recognized for Kamassian (as also for Selkup). Donner transcribes [q] versus [k] (actually [qʰ], [kʰ] in most cases, but I will ignore aspiration) [3] somewhat but not totally consistenly depending on the following vowel:

  • before a, å: 2×k, 20×q
  • before e, ɛ: 5×k
  • before ə: 3×k, 2×q
  • before i: 1×q
  • before o: 2×k, 9×q
  • before u: 9×k, 6×q
  • before ʉ: 5×k

So there is potentially decent evidence for a contrast /k/ : /q/ at least before u. Importantly, this seems to be etymological. ku- is mostly from *ku-, *ko- (3× *kå-), while qu- is mostly from *kå-, *kə- (1× *ko-).

Before ə the situation also looks promising at first, but may not stand up to scrutiny. The distribution is again etymological: the three cases of kə- are from *kë-, *kï-, *ku-, while the two cases of qə- are from *kə-. However, two of the three former cases are actually transcribed with ə̣ (= turned ė; there is no ‹ə› as a base symbol in UPA), which seems to be better considered an allophone of /e/ or perhaps /i/. This vowel mostly continues earlier PSmy front vowels *i, *ɛ, *ə̈; it takes front vowel harmony (ə̣d-ľɛm ‘I am visible’, mə̣ɛm ‘I sell’, sə̣βəʔ-ľɛm ‘I take out’), varies with e in Donner’s transcription (bej-ľɛm ~ bə̣j-ľɛm ‘I go over’), often corresponds to Castrén’s i (C mija ~ D mə̣j`ɛ ‘earth’) and can be sometimes seemingly triggered by a present or lost palatal consonant (ə̣ŕåŋ ‘drill’ < *pərəjəŋ, nə̣mi < *jumpə ‘moss’). In other cases there’s instead back ə̑, which mainly continues *ə, and might be even analyzable as an allophone of /a/ (with the result that there would be no ˣ/ə/ in Kamassian after all). If so though, then the apparent | contrast could be mostly rendered as a contrast between /Ke/ and /Ka/, instead of /k/ versus /q/. (There still remains a near-minimal pair, though: kə̑m ~ kɛm ‘blood’ < *këm | qə̑ʔ ‘pus’ < *kət.)

Word-medial cases of Proto-Samoyedic *k are not common, but where not palatalized, they also seem to show a somewhat Turkic-style split between velar g (nāgur ‘3’ < *nakur) and uvular ʁ (ťaʁa ~ ďåʁå ‘river’ < *jəkå), which perhaps remains analyzable as allophonic.

I also believe vowel length was phonemic in Kamassian. Donner and Castrén are not completely consistent with each other on this, but there are several indications that lenght is regardless neither random nor context-dependent.

As far as basic statistics go, the different Proto-Samoyedic vowels are split in two clear groups: the open vowels *ä *å and especially *a, and mid *o often yield long vowels, while the close vowels *i *ü *ï *u, reduced *ə and mid *ë only rarely do. Ignoring quality changes and counting half-long vowels (à ù etc.) as short for now, the quantity reflexes are as follows:

  • *a > short ×24, long ×26 (52%)
  • *ä > short ×47, long ×23 (33%)
  • *å > short ×47, long ×16 (25%)
  • *o > short ×19, long ×7 (27%)
  • *ë > short ×25, long ×1 (4%)
  • *u > short ×26, long ×5 (17%)
  • *ï > short ×10, long ×2 (17%)
  • *ü > short ×18, long ×0 (0%)
  • *i > short ×51, long ×2 (4%)
  • *ə > short ×68, long ×6 (8%)

Of course open vowels normally tend to be longer than close ones, so this kind of a chart is to be expected. However, the qualitative changes in the vowel system mean that at minimum ā, å̄ < *a are effectively in contrast with a, å < *ə. At least ō and ē seem to be also phonemic. Minimal or near-minimal pairs can be found (å replaced by a for clarity):

  • tar̀ ‘hair’ < *tə̈r | tār ‘gills’ < *čar
  • qăn̆-ńȧm ‘I freeze’ < *kəntɜ- | māⁿ-ńim ‘I measure’ < *mančə-
  • qăzɪl̀ ‘wart’ < *kəsər | qāzə̑ra ‘nutcracker’ < *kasɜra
  • kora ‘reindeer bull’ < *korå | kōla ‘fish’ < *kålä
  • le ‘bone’ < *lë | ďē ‘heel’ < *jä, ‘woman’ < *nä
  • pel-ľɛm ‘I put’ < *pën- | sēlə-ľɛm ‘I sharpen’ < *sälä-

The analysis can be improved by noting also some conditional developments leading to the “wrong” vowel length. For one, before consonant clusters or stem-final obstruents, almost no long vowels occur. For two, Donner’s records have also words with stressed or long vowels in the 2nd syllable; these, too, never seem to have long vowels in the 1st syllable. A near-minimal pair of this kind is toli· ‘thief’ < (? *tōlī <) *tåläjə, vs. tōlu ‘darkness’ < *tålwə. For three, long vowels from *close vowels, *ə or *o often seem to be the result of vowel contractions, and after taking this into account, *o patterns together with the other mid vowels after all:

  • *i: pīdi ‘thumb’ < *pij-
  • *u: ďēdər-ľɛm ~ ťʉ̄dərə-ľɛm ‘I dream’ < *jujtə-, ńī ‘child’ < *ńuə(j), šʉ̄ ‘fire’ < *tuj
  • *ə: bʉ̄źɛ ‘husband’ < *wəjs (surely rather *wəjsä?), ťīma ‘tail’ < *təjwå; also, from Castrén: khâŋ < *kəjŋ 'thunder', ‘moose’ < *kəå
  • *o: ~ ‘branch’ < *moə, qōriʔ < *koər 'container', šō-ľȧm ‘I come’ < *toj-, šōmi ‘larch’ < *tojmå

An additional indirect line of evidence for phonemic long vowels is the transcription of consonant length. Donner transcribes medial and final single consonants as half-long in some cases ( etc.). At least in CVC words this seems to depend on vowel length: word-final consonants after short vowels are fairly consistently transcribed as half-long, word-final consonants after long vowels consistently as short (though only r occurs more than once: bōr ‘ridge’, ńēr ‘point’, tār ‘gills’, ťēr ‘center’).

I also notice a tendency that long vowels in words of the shape CVCV seem to occur most often before close or reduced vowels in the second syllable, not so often before mid and open vowels (a pattern closely resembling Selkup; cf. Helimski (2007), as cited in footnote 2). But this issue should wait for a detailed survey. Vowel lengthening has probably taken place more than once and probably with variable exact conditioning depending on vowel quality.

One further issue to investigate would an issue I already brushed above: the front unrounded vowels. Donner’s transcription distinguishes no less than five heights i ė e ɛ ȧ and several reduced counterparts ɪ ə̣ ə ə for the first three, which obviously should be phonologized as something simpler (among back rounded vowels he only has u o å). But offhand I am not sure how the different heights should be delineated. Etymologically Donner’s e ɛ are mostly from PSmy *ä, while ə̣ are mostly from *i *ə̈ *ə. This may suggest that ė ə̣ should be counted as allophones of /i/ and not /e/. Some apparent front vowels could also be fronted allophones of i̮ e̮ — or perhaps the situation is the opposite, and these back illabial vowels are, despite always continuing PSmy *ï *ë, actually synchronically just backed allophones of the front vowels (similar to the situation in Nenets). Just the native Samoyedic part of the vocabulary is probably insufficient to work out a solution for this though, since this seems to be mostly an issue of free variation and not conditional allophony. Probably the best line of evidence would be instead the degree of variation within Kamassian, e.g. as noted by Donner from his different informants, or between root words and their derivatives and compounds, or between Donner and Castrén’s records.

I would have observations on historical phonology as well, but those shall be left for another time.

[1] Janhunen, Juha. 1976. “Adalékok az északi-szamojéd hangtörténethez: Vokalizmus. Az első szótagi magánhangzók”, in Néprajz és Nyelvtudomány 19–20: 165–188.
[2] Some examples:

  • Helimski 1976, “О соответствиях уральских a– и e-основ в тазовском диалекте селькупского языка”, in Советское финноугроведение (SFU ) 12: 113–132.
  • Katz 1979, “Beitrag zur Lösung der Problems der Entwicklung von ursam. *j im Selkupischen und der hiemit zusammenhängenden Fragen der historischen Morphologie dieser Sprache und des Uralischen”, in SFU 13: 168–176.
  • Mikola 1981, Adalékok a szelkup vokalizmus történetéhez. Nyelvészeti dolgozatok 193.
  • Terentjev 1982, “К вопросу о реконструкции прасамодийского языка”, in SFU 18: 189–193.
  • Helimski 2007, “Продление гласных перед шва в селкупском языке как фонетический закон“, in Linguistica Uralica 43/2: 124–133.
  • Gusev 2012, “О возможных источниках селькупского сочетания -lć-: ПС *jw, *jk, *jm“, in SUST 264 (Festschrift Janhunen): 77–81.

[3] In UPA: versus k.

A Fourth Laryngeal in PIE

The Proto-Indo-European laryngeals seem to form, in most people’s thinking, a kind of a phonological subsystem. Usually they end up as a class of back fricatives, or at least some kind of weaker back consonants. They certainly have similar diachronic behavior… but if this implies also unique synchronic similarity is not immediately obvious. After all, there is a rather wide range of consonants that can be easily lost from a language (in the “merges with zero” sense). And inversely: even if many members of some natural class are lost, not every one of them will have to. E.g. transient voiced spirants in various Uralic languages: early pre-Permic *β *ð *ɣ are all lost by late Proto-Permic, while out of late Common Finnic *β *ð *ɣ in Eastern Finnish/Karelian, only the latter two are lost, and *β instead gives /v/.

Occasionally PIE internal reconstructors will go further still, and point out that the most widespread reconstruction with three laryngeals would be tempting to compare with the three series of velar consonants, suggesting rewriting *h₁ *h₂ *h₃ as *x́ *x *xʷ. The analogy is clearly imperfect though. E.g. the laryngeals do not show much signs of a centum / satem isogloss, not along the usual dividing line at least; [1] there are no parallels to the conditional neutralizations among the velar stops, such as *ḱr > *kr; the labiovelar stops *kʷ *gʷʰ *gʷ do not show any *o-coloring effects (for *k *gʰ *g some *a-coloring effects have been proposed though). A more common objection still however seems to be that there is a widely held alternate hypothesis: many mainstream IEists think that *h₃ is better mapped as a voiced fricative: [ɣ], [ʁ] or [ʕ], and *h₁ as a glottal consonant: [h] or [ʔ].

This semi-consensus view still assigns *h₂ as a voiceless back fricative: [x] or [χ], as the direct Anatolian evidence also strongly suggests. The occasionally suggested pharyngeal [ħ] can be IMO ruled out per arguments such as those in Michael Weiss’ recent paper. (I have already opted to use *x and not *h₂ in my index of the LIV roots, and will mostly do so in the rest of this post too.) However, this leaves an opening for an objection that does not seem to be commonly made, but to me feels quite relevant. If *h₁ and *h₃ are really something like *h and *ɣ, would *h₂ = *x then really be an isolated voiceless velar fricative, without palatovelar and labiovelar counterparts? [2]

A brief typological survey shows that such gaps among back fricative systems are indeed not common. In particular, any language that has both /kʷ/ and /x/ is rather likely to also have /xʷ/. [3] A look at the PHOIBLE data turns up the following results:

  • all of /k kʷ x xʷ/: 35 languages
    (Bilin, Buwal, Central Atlas Tamazight, Central Siberian Yupik, Chipaya, Chipewyan, Comox, Cupeno, Dghwede, Gavar, (Paraguayan) Guarani, Gwandara “4 and 6”, (Northern) Haida, Iraqw, Jicarilla Apache, Kumiai, Lagwan, Lamang, Luiseno, Mezquital Otomi, Nootka, Quileute, Seri, Serrano, Shuswap, Tachelhit, Tera, (Southern) Tiwa, Tlingit, Tolowa, Tonkawa, Wamey, Wichi Lhamtes Nocten, Yuqui)
  • only /k kʷ x/: 14 languages
    (Awing, Ese Ejja, Kwasio, Nizaa, Nuclear Daba, Purepecha, Saliba, Sui, Taushiro, Tilquiapan Zapotec, Uru, Ute-Southern Paiute, Yala, Yurok)
  • near misses: Haka-Chin with /k kʷ x w̥/, Izi-Ezaa-Ikwo-Mgbo with /k kʷ χ/, Wuzlam with /k kʷ χ hʷ/.

So a language that has /kʷ/ and /x/ is about 2.5 times more likely to have /xʷ/ than not; a very substantial result, when otherwise only some 3.2% of the languages in the world PHOIBLE sample have /xʷ/.

There are moreover plenty of languages that have /k kʷ/ and some non-velar pair of ±labialized back fricatives. The most popular setup by far is /k kʷ h hʷ/ (Amharic, Arabela, Argobba, Cherepon, Fwe, Gikyode, Guinea Kpelle, Gwandara “2”, Hausa, Ikwere, Inor, Iyive, Kamayura, Kawaiisu, Kistane, Mbuko, Merey, Mesqan, Mofu-Gudur, Moloko, Nuclear Igbo, Piaroa, Sebat Bet Gurage, Siona, Suya “2”, Vame, Wandala, Wari, Wolaytta, Yeyi). Now, /k kʷ h/ is also very common; but given that x > h is a common sound change, it seems likely that many of this group of languages have come about from earlier *x *xʷ. In three cases /h hʷ/ also combines with an unpaired buccal back fricative: /k kʷ x h hʷ/ (Mfumte, Nyam), /k kʷ χ h hʷ/ (Tewa). [4] Other similar inventories are:

  • /k kʷ ç çʷ/ (Quechan)
  • /k kʷ χ χʷ/ (Bana, Kabyle, Xamtanga)
  • /k kʷ ħ ħʷ/ (Bade)
  • /k kʷ χ χʷ ħ ħʷ/ (Moroccan Arabic)

Lastly there is also the notable Pacific Northwest cluster of languages (Bella Coola, Coeur d’Alene, Lushootseed, Spokane, Squamish, Straits Salish, Upper Chehalis) with either just /kʷ xʷ/ (no plain velars; all have non-labialized uvulars though) or /k kʷ xʷ/ (with /k/ looking like a recent reintroduction by loans). This is tangential to the question, though.

Remarkably, this typological trend continues even within Indo-European! Nowadays Hittite is analyzed as having indeed phonemic /xʷ/ ḫu ~ uḫ beside plain /x/ (for a recent detailed review see Suter (2014) [5]). Per correspondences like Lycian /kʷ/ q, the same is also thought to have been the case already in Proto-Anatolian. This *xʷ corresponds to traditional PIE *h₂w, and is usually considered to come about by simple cluster coalescence. It would be however also quite feasible to set up *xʷ already for PIE itself, so that there wouldn’t be any asymmetry in stop versus fricative labialization. (This idea is supported already by Suter, whose article I only found after coming up with the idea myself.)

This will require a slight change in thinking: the concepts of “laryngeal” as “a consonant that is deleted” and “laryngeal” as “a back fricative” will need to be uncoupled. *xʷ will be a “laryngeal” in the second sense, but not in the first: it leaves at minimum a *w behind in core IE, after all. I think this sharpening of concepts would be beneficial, as Indo-European studies already suffers from treating the laryngals as excessively phonetically vague.

I belive additional evidence for *xʷ can be also found in PIE root structure. Clusters of (plain) velar + *w are often set up for PIE, but they’re much rarer than the labiovelars proper. LIV has the following counts: *kʷ 15 + 18 (root-initial + root-final), *gʷʰ 8 + 14, *gʷ 17 + 16; — *kw 7 (root-initial only), *gʰw 0, *gw 2. For *xw there are however 18 cases initially + 7 finally, which would make this both the most common *Cw cluster and by far the most common *HR cluster in PIE. [6]

Even more interesting are the verb root *xwyedʰ- ‘to strike dead or injured’, and the noun *xwl̥h₁néx ‘wool’: these appear to have a very rare *CRR- onset structure, unparalleled elsewhere in PIE to my knowledge. Reconstructing a monophoneme *xʷ and not a cluster **xw would however reduce these to the usual *CR-. Labiovelar stop + resonant clusters are rare as well, but at least attested, e.g. *kʷles- ‘to furrow’, *gʷyeh₃- ‘to live’, *gʷʰreh₁- ‘to smell smth’.

I would even suggest that some further internal reconstruction can be applied here. The typical onset structure in PIE is *(F)(T)(R)- (with F = fricatives, T = stops, R = resonants). In traditional reconstructions this is however violated by a number of cases of *w + resonant (attested in LIV: *wl- 1–3, *wr- 10–12, *wy- 3). However, many of these could be probably replaced by *xʷR-. Even the development to attested /wr-/, /vr-/ in a few descendants such as Germanic and Indo-Aryan would not have to be common core IE: it could represent independent developments, versus direct loss (or maybe *xʷ > *x > *h > ∅) in branches like Italic. — For *wy- some cases seem to be attested almost solely in zero-grade. They could probably be also reconstructed with *i as an original non-zero-grade root vowel, and an analogical full grade in some sporadic Indo-Iranian reflexes, similar to the case of *bʰux- ‘to grow’.

The above is just structural reanalysis, so far. It is less clear to me so far if setting up a PIE *xʷ will have implications also for the routing of the reflexes in the daughter languages; if some cases will regardless have to be retained as a cluster *xw; or even, if this could also be set up in a few additional positions.

Suter proposes one readjustment of this type: reconstructing ‘to wash’ as *lexʷ-, and not anything like *leh₃w- or *lewh₃- (and with intervocalic *-xʷ- > -[ɣʷ]- in Hittite, same as with plain *-x- > -[ɣ]-). This promisingly enough seems to cut out some ad hoc “laryngeal metathesis” rules. However, it also suggests an odd property for *xʷ: a-coloring in Latin (lavō) but o-coloring in Greek (λοέω).

How does this fit together with the seven examples I mentioned that have already been earlier reconstructed as *Ceh₂w-?

  • *deh₂w- ‘to roast on a spit’: Sanskrit dunóti < *du-ne-H-, Greek δαίω, δέδηε < *daw-ye-, *de-dāw-, OHG †zuscen < *du/ū-sḱe-, Irish dóïd < *do/ōw-eye- etc.
  • *geh₂w- ‘to be glad’: Greek γαίω, γάνυμαι < *gaw-ye-, *ga-n-u-, Latin gaudeō < *gāwedʰ-, and perhaps also some reflexes that LIV splits as a separate root *geh₂dʰ-.
  • *ḱeh₂w- ‘to set on fire’: Greek *kaw-ye-, *kāw-s-, Lith. kūles ‘Brandpilze’ (?!), Albanian than ‘to dry’ < *ća-, Tocharian *kaum ‘sun’. Kind of a weak-looking semantic grab-bag root etymology.
  • *keh₂w- ‘to hit’: reconstructed in LIV with *-h₂w- per Tocharian *kɐw- : *kåw- < *kəw- : *kāw-, even though most reflexes (Latin, Germanic, Balto-Slavic, Greek) point instead to *kuH- : *kewH-. If ad hoc metatheses are going to be assumed, why not in Tocharian rather than in all the other languages?
  • *kleh₂w- ‘to cry’: Greek + Albanian *klaw-ye-.
  • *melh₂w- ‘to grind’ — probably not with *xʷ, but rather an extended stem *melx-w/u-, from the more common *melx- ‘to grind’.
  • *peh₂w- ‘to stop, finish’: only Greek πάυω < *paw-.

It seems that the behavior here is rather different from the ‘wash’ case, with several examples confirming a-coloring in Greek. But they also all seem to involve more complex constructions; maybe the difference could be one between coda *xʷ (retained until a-coloring?) and medial *xʷ (leniting to *w earlier?). Many also seem to involve reflexes that point to *CuH-, instead of expected *ā(w) : *aw from *ex(w) : *əx(w). And does dunóti involve o < *aw < *exʷ, maybe coming about by some kind of a *dux- > *duxʷ- development?

Nowadays lengthened grades are usually thought to be secondary, so I even wonder if instances of ā that surface here are that, instead of from *aH < *ex. The (partial) late PIE ablaut scheme for roots in *xʷ would then be *āw : *aw : *u (lengthened grade : *e-grade : zero grade). Eichner’s Law (*ēx > *ē and not **ā) on the other hand still seems to require that a-coloring is usually younger than the rise of lengthened grade.

Latin lavō can be of course also explained through Thurneysen-Havet’s Law: *o > a / _wV́. And so, if this and λοέω are *o-grades after all, there will be no trouble in assuming that *xʷ is leftwards a-coloring, just like plain *x.

So far, in summary: introducing *xʷ gets rid of several typological-phonotactic anomalies in PIE. These include at least all *CRR- roots, a large group of *CeCR- roots, possibly numerous *RR- roots, the strange abundance of the cluster *h₂w, and the unusual /k kʷ x/ inventory.

The second of these issues is, however, not exhausted by this reanalysis. CeCR- roots also regardless remain like a suspicious feature of laryngeals in particular: there are no roots in anything like **-sw-, **-dy-, only things like *-h₁w-, *-xy-. One can wonder if *xʷ is maybe only the top of an iceberg, and also a few additional “laryngeals” of this kind (back fricatives that do not get deleted entirely) should be assumed.

But there will be many other options available too, especially with laryngeals other than *x that cannot be easily grounded in direct Anatolian evidence. For very quick offhand speculation for the sake of example… since laryngeals’ presence is in some ways easier to determine than their exact position, and since in particular *-Hy- clusters are often assumed to be subject to metathesis, we could rewrite these as the more typical *-wH-, *-yH-, and simultaneously then rewrite the roots currently reconstructed as *CewH-, *CeyH- as being instead “close-vowel roots” *CuH-, *CiH- (with ablaut only secondarily by analogy).

[0] Thanks to various members of the Zompist Bulletin Board for a number of discussions on this topic.
[1] It is true that *h₂e > *a and *h₃e > *o merge often, and conceivably this could even have gone through an early merger of *h₂ and *h₃. But this happens also in the non-satemic Germanic, while failing in the satemic Armenian. The corresponding “centum” merger of *e and *a as distinct from *o also seems to be unattested entirely.
[2] The same could be asked of *h₃ as *ɣ too, but there happens to be a very easy answer here — just identify “missing” *ɣ́ and *ɣʷ with the semivowels *y and *w, or at least assume that the fricatives merged with the semivowels at some early stage.
[3] The situation for palatalized velars seems similar, but the controversy over if if *ḱ was [kʲ], or if *ḱ *k were perhaps instead [k q], makes this question harder to survey.
[4] How these cases have come about seems harder to figure out from just general principles. Some hypotheses I can think of would be asymmetric debuccalization, i.e. *x ≡ but *xʷ > hʷ; and later secondary lenition, such as *q > χ or *ɸ > x, some time after the introduction of contrastive labialization. Loanword phonemes could be involved, too: for a not quite exact parallel, Udmurt has /k kʷ/ natively (the latter is usually, but IMO unconvincingly, analyzed as a cluster) versus /x/ only in recent loans from Russian.
[5] He also refers to the same typological sound inventory argument as I do, but working with an earlier stage of PHOIBLE, he only gets together 25 examples of symmetric /k kʷ x xʷ/ versus 11 of asymmetric /k kʷ x/.
[6] The other *H + glide clusters come in at *h₁w at 7 + 4, *h₁y at 1 + 1–5 (with lots of cases where it seems to be unclear if *y is a part of the root that gets deleted, or a widespread suffix), *xy at 0 + 1–6, *h₃w at 2 + 2, *h₃y at 1 + 0. All *H + liquid or *H + nasal clusters occur initially only, with *xm- the most common at 7 examples. Other *Cw clusters are likewise root-initial only: *sw- 21 (in this position more common than alleged *xw, but not altogether), *tw- 8, *dʰw- 7, *dw- 5, *ḱw- 5, *ǵʰw- 3–4, *ǵw- 1–2.

The fate of *w in Altaic

A fairly striking typological commonality between the “micro-Altaic” language groups: Turkic, Mongolic and Tungusic (Tk, Mg, Tg) is the lack of a labial glide such as /w/.

This is clearly out of line among both the world’s languages in general, and Eurasia in particular. /w/ is one of the most common phonemes in the world’s languages, that can usually be found even in languages with seriously impoverished consonant inventories such as Hawai’ian (at ZBB we [1] once compiled stats on this thing), and there is no shortage of *w in any of the other major language families hanging out nearby: IE, Uralic, Semitic, Dravidan, Sino-Tibetan, Austronesian, Eskaleut, you name it. Even in languages that lack /w/ precisely (Finnic, Slavic, most non-English Germanic…), it has usually not gotten too far off-field and has merely become a more frontal labial continuant such as /β/, /v/, /ʋ/. Yet none of these can be found in Turkic / Mongolic / Tungusic either. This clearly means that any long-range relationship hypotheses like Nostratic, Eurasiatic, Ural-Altaic will need to explain whatever happened to *w in Altaic.

There are two main hypotheses going around that I know of: *w > ∅ versus *w > *b. The former is the stance of some old-school Ural-Altaicists like Räsänen, among Nostraticists apparently Bomhard [2] and I gather also Illič-Svityč. The latter is the stance of, at minimum, Dolgopolsky. (He proposes also *w > ∅ before labial vowels in Turkic. [3])

I think the actual answer is neither of these, and the demise of *w is only post-common Altaic (if such a thing existed at all) — since comparison with Uralic seems to be able to show a fair number of good examples of both developments, yet strongly split according to their distribution. It does not really matter for this purpose if the comparanda are real cognates or loans … but see below for a hypothesis.

In the following, I have stuck to the clearest data, where comparison with Uralic seems, usually on semantic grounds, preferrable to or at least equally good as the proposed Altaic connections. Checking up on the non-EDAL lexicon of the languages would probably also turn up something, but I will leave that for later.

1. Turkic: *w > *b

(1.1) *bāj ‘rich, noble’ ~ Samic-Finnic *wäjä- ‘to be able, have power’, Hung. vív ‘to fight’
(not worse than ~ Mg ‘strong’, Tg ‘many’, Jp ‘to surpass’)

(1.2) *bakɨr ‘copper’ ~ PU #wäśkä ‘(reddish) metal, ? copper’ > Khanty *wăɣ ‘iron’
(rather than ~ Mg ‘patina’, Jp ‘dust’)

(1.3) *balk- ‘to shine’ ~ PU *wëlkəta ‘light, white’
(rather than ~ Mg *mel-, Tg *mial- with no **-k-; Ko *mark- may or may not belong; maybe here also Tg *beli ‘pale’, rather than ~ Mg ‘dark’)

(1.4) *bań ‘fat’ ~ PU *wajə ‘id.’
(rather than ~ Mg ‘churn’, Tg ‘storage’)

(1.5) *bek ‘firm, stable’ ~ Samic-Finnic *waka ‘id.’
(rather than ~ Mg Tg ‘big’)

(1.6) *bejŋi ‘brain’ ~ PU *wajŋə ‘breath, spirit’ > Selkup *kȫŋə ‘brain’
(rather than ~ Mg ‘forehead’)

(1.7) *bij- ‘sharp edge’ ~ Samic-Finnic *wijə- ‘to be sharp’
(rather than ~ Mg ‘to crush’, Tg ‘to mince’)

(1.8) *(b)ōl-, Mg *bol- ‘to become’, Japonic *wər- ‘to be’ ~ Uralic *(w)alə- ‘to be’ > Ob-Ugric ‘to be, to become’

(1.9) *burun ‘nose’ ~ PU *wara ‘mountain’ > Hung. orr ‘nose, †peak’
(rather than ~ Jp Ko ‘beak’)

(1.10) *būt ‘leg’ ~ Samoyedic *utå ‘hand’
(Tg *begdi may or may not belong)

(1.11) *dabul ‘wind’ ~ PU #tɜwlə ‘id.’
(rather than ~ Mg ‘typhus’, Tg ‘to be infected’)

(1.12) *debe ‘camel’ ~ Samoyedic *tëə < ? *tëwə ‘(tame) reindeer’
(rather than ~ Mg *temeɣen; long compared also with isolated Karelian tevana ‘elk cow’ (often mis-cited as Finnish))

(1.13) *sib- ‘to spin thread, pull out fibre’, Tg *sib- ‘id.’ ~  PU *siwə ‘fibre’
(rather than ~ Mg ‘to tuck up’)

I suspect most of these to be loans into Turkic from early Ugric, and in the case of *bōl-, thence into Mongolic. At least #wūta is probably better taken as a loan in the opposite direction, since this is innovative vocabulary replacing PU *kätə (and Samoyedic does not tolerate **wu-). Perhaps likewise for #dewe.

For a few IE parallels, I can moreover mention e.g. Tk *basu ‘hammer’ ~ II *wadźra- ‘hammer, mace’; Tk *ebin ‘grain’ ~ IE *yewo- ‘id.’; Tk *gēb- ‘to chew’, Mg *gebi- ‘id.’, Tg *keb- ‘to bite’ ~ IE *ǵyew- ‘to chew’. Comparison with Japonic would also immediately provide examples for *w > *b. There has been some debate on if *b or *w should be reconstructed for Proto-Japonic, but as far as I gather, *b has been assumed for ease of Altaic comparison, while most of the actual data clearly sides with *w. [4]

*w > *b also has good areal parallels, being found in both the north(west)ern and south(west)ern neighbors of Turkic: on one hand widely in Samoyedic, viz. in Enets, Nganasan, Kamassian and Mator (partly even in Yurats and eastern dialects of Tundra Nenets), on the other, in East Sakan (Khotanese and Tumshuqese).

There is also one notable exception where Turkic seems to have *w > ∅ instead: *öl- ‘to die’ ~ PU *widə- > Hung. *ül- > öl- ‘to kill’. This isolated example could be, however, merely an accidental similarity, esp. since the semantics are off. (‘Die’ and ‘kill’ are close enough concepts, but usually do not interchange without causative / anticausative morphology.) Contrast also ‘nose’, where we seem to have *wu > *bu in Turkic but *wu > *u > o in Hungarian.

All in all, the details may use further fine-tuning, but I think there is good evidence to assume that earlier *w develops into *b in Turkic. Contrary to what I earlier commented on this topic though, it is also easy enough to find equally good-looking cases of Turkic *b ~ Uralic *p (e.g. *bas- ‘to press’ ~ *puńćə- ‘to press, squeeze’, *beliŋ ‘panic’ ~ *pelə- ‘to fear’, *bɨč- ‘to cut’ ~ *päčkä- ‘id.’, *bulun ‘cloud’ ~ *pilw/ŋə ‘id.’), so probably this was still a merger with a pre-existing *b.

2. Mongolic: *w > ∅

Supported by less data, but even fairly tight reins on semantics still allow finding some evidence.

(2.1) *oŋgi ‘hole’, Tg *uŋgV ‘id.’ ~ PU *woŋkə ‘id.’
(rather than ~ Tk ‘to dig’)

(2.2) *ök/g- ‘to give’ ~ PU *wexə- ‘to take somewhere’ > Samoyedic *ü- ‘to drag’
(rather than ~ Tk, Tg ‘to heap up’; maybe here better Tg *bū- ‘to give’?)

(2.3) *udu- ‘to lead’ ~ PU *we/ätä- ‘to pull, lead’, PIE *wedʰ- ‘id.’
(rather than ~ Tk ‘to send’)

(2.4) *usu ‘water’ ~ PU *wetə ‘id.’
(rather than ~ Tk *sɨb)

(2.5) *üdže- ‘to see’, Tg *edže- ‘to understand’ ~ PU *weńćä- ‘to look, watch’ [5]
(rather than ~ Tk ‘to think, understand’; ‘understand’ is surely secondary in both etymological groups, and ‘think’ ~ ‘see’ does not match)

(2.6) *ündü-sü ‘root’ ~ PU *wanča ‘id.’
(possibly suggests that PU *č < *ts or *tU; Tg *ŋǖŋte may or may not belong)

A possible IE parallel that looks like it could have been transmitted thru Uralic: *ös ‘revenge, hate’ ~ II *dwiša- ‘hate’ (→ Permic, Finnic #wiša) (not worse than ~ Tk, Tg ‘bad, evil’). This is not attested in Ugric or Samoyedic, though, unlike all of the above examples.

The different treatment here is possibly however simply due to geography / relative chronology and not due to an actually different native development. Mongolic is a more eastern family, and may have gotten rid of *w already before contact with Uralic or some flavor of Para-Uralic — perhaps still indeed by > *b as per comparison with Turkic. So the correspondence here might indicate that in later loans, *w was substituted as zero.

I have not managed to find any reasonable-looking cases of Mongolic *b ~ Uralic *w (other than ‘to become’, see under Turkic).

The loanword layer interpretation can be also supported by how for Tungusic I cannot on a quick look-around find any clear etymologies of either type at all (i.e. where comparison with Uralic would be clearly preferrable to supposed Altaic origin). You can find some Tg cognates above under both my Turkic and Mongolic comparisons, but they might be loans. I could still add a few word-internal cases suggesting *w > *b, though: *dolba ‘night’ ~ Samoyedic *tålwə ‘dark(ness)’ (no worse than ~ Mg ‘to stay up overnight’); *nebi ‘new’ ~ PIE *new-.

[1] “We” being at least 90% the OP “Nortaneous” (lingblr yeli-renrong); myself probably not more than 5%, and a handful of remaining people suggesting single datapoints.
[2] He does not explicitly say so, and in his book leaves the Altaic column empty in the overview of Nostratic sound correspondences; but the few examples he has of a root with *w- being reflected in Altaic show zero onset.
[3] Well, “with rounding of the adjacent vowel”, but I would not buy any current claims about Proto-Nostratic vowel reconstruction with a nine-feet pole.
[4] As for Korean, the modern language has /w/, but I have the impression it mostly occurs due to vowel breaking or in loanwords from Chinese. I admit knowing very little about Middle or Old Korean though, and hence I am skipping over Korean in this post entirely.
[5] IMO better thus than UEW’s *wića-. Permic *dź ~ Hung. gy clearly proves *ńć, and front-harmonic cognates in these clearly prove *-ä and not *-a. Hung. front-harmonic í is also almost always from *e, not *i. Finnic can be routed as “*weŋ́śä-” > *wejśä- > *viisä-, and for Permic I suspect early *e > *i next to palatals in certain cases.

Yurats Addenda

One step up from the likes of Meshcheran, probably the most obscure Uralic language to have still been rudimentarily documented is Yurats: a Northern Samoyedic language recorded in one wordlist by G. H. Müller in the mid-1700s. As far as I know, we have zero other information about the language, not even any clear idea on when it might have gone extinct. A century later Castrén did not record it, but to my knowledge he also did not really search for it either; * unlike Mator, which we can be pretty sure was indeed extinct by 1845.

Some parts of the data were reprinted by Pallas in the late 1700s and Klaproth in the early 1800s (a reproduction of the latter can be found in Donner’s Samojedische Wörterverzeichnisse, pp. 36–50). Janhunen’s Samojedischer Wortschatz (1977) only takes these secondary editions into account when listing Yurats cognates. Just the year before in 1976, though, Helimski had put out an article that actually reviews Müller’s original data instead (but presumably back in the 1970s article collections published in Tomsk were not yet in the habit of diffusing to Helsinki within one year). He also includes a transcript of the vocabulary. This article has by now been conveniently reprinted in Helimski’s 2000 compilation book Компаратистика, уралистика (Moscow: Языки русской культуры).

This is somewhat corollary-snipey, but I might as well still put this out there: a comparison of Janhunen’s Yurats coverage with the original data. Several additional etymologies can be easily noted, at least.

  • áddinelma‘ < PSmy *ånčɜ (perhaps a loan from Enets due to lack of ŋ-?)
  • cháru ‘larch’ < PSmy *kårwï (not in SW)
  • ja ‘flour’ < PSmy *jaə (not in SW; loan from Indo-Iranian *yawa- ‘grain’)
  • jur ‘fat’ < PSmy *jür (loan from Turkic *ür₂)
  • kírwa ‘bread’ < PSmy *kïrɜwå (not in SW)
  • módi jarra ‘I cry’ < PSmy *jåru-
  • maraga ‘cloudberry’ < PSmy *məråŋkå (not in SW, but cf. PU *mura)
  • mug ‘arrow’ < PSmy *muŋkə (not in SW, but cf. PU or maybe better a west Siberian Wanderwort #muŋkɜ)
  • nócha ‘arctic fox’ < PSmy *nokå
  • ngóde ‘berry’ < PSmy *wota (with *wo- > *o-, as also in Ne En)
  • pi ‘aspen’ < PSmy *pi
  • pimà ‘boot’ < PSmy *pajmå (loan from Turkic *bal₂mak)
  • poiju ‘alder’ < PSmy *pəjɜ (not in SW; misglossed by either Müller, Helimski or some intermediate editor as ‘almus’ pro ‘alnus’, but it’s in the middle of the tree names section)
  • pämesúma ‘darkness’ < PSmy *pəjmä (not in SW, but cf. PU *pid₂mä)
  • túa ‘wing’ < PSmy *tuəj

There would be more cases that only go back to Proto-Northern Samoyedic or perhaps just Proto-Nenets (e.g. sárnu ‘egg’, wuing ‘sea’ ~ Tundra Nenets sar°ʔńu, wī̮ʔ < *sarəʔnü, *wïəŋ), but I cannot claim to have put together any reasonably good coverage of these.

A small etymological puzzle is múde. Janhunen lists this under two different roots: from Pallas under *mərkä ‘shoulder’ (with the comment “(? < En)”), and from Adelung under *utå ‘hand’. Müller only has the sense ‘arm’, which could be a semantic shift from either, but also suggests there is only one word here, not two homophones. Straightforwardly we’d probably expect ˣmarze, ˣŋuda, so maybe contamination is however possible. — A reflex of ‘hand’ with ŋ- indeed appears in ngudéesse ‘ring’ (‘hand-iron’), but (j)ésse ‘iron’, with *ẃ > j, is clearly a loan from Tundra Nenets, and so maybe the first part of the compound is as well. Actually, nothing rules out even a third interpretation: that in Yurats *ŋ > m / _u regularly?

Another intriguing case is ngä́mme ‘breast’. This seems related to PSmy *ńimmä, but not as a direct descendant: it points to something like *əjmmä instead. *ə- rather than *ńi- could be perhaps by analogy from *əm- ‘to eat’ … but it could be also an archaism, since ‘breast’ is derived from *ńim- ‘to suck’, which in turn has also a variant *imə- in Proto-Uralic (> Fi. imeä, Hu. emik etc.). I believe that if a derivative *imə-mä > *immä had been formed already in PU, then this would regularly develop into *əmmä in PSmy, reaching quite close to the Yurats form. But I still have no good explanation for palatalization to ä.

The comparison also reveals a few words in SW sourced from Pallas that do not seem to appear in Müller. These are mainly anatomical terms: лы ‘bone’, пулы ‘knee’, хоба ‘skin, bark’, хыва ‘blood’. Pallas’ materials have elsewhere also an issue with words from a single source being duplicated under multiple languages, so maybe here as well? On the other hand, at least the last still looks phonologically clearly like Yurats specifically: *k- > x- and *-m- > -w- rule out Enets (which has kiʔ : kio- for ‘blood’), while *ë > ɨ seems to rule out Tundra Nenets (which has xe̮m ‘blood’; xe̮wa- ‘to bleed’).

Altogether, give or take some unclear cases like this, the number of Yurats words with a Proto-Samoyedic etymology seems to be some 140±5. This already suffices to work out the main points of historical phonology. Even already among the above examples you can note a few repeating correspondences: *å > a, *ŋk > g and various trivial identities. The big picture seems to be of a language with a vowel system close to (Proto-)Nenets (*ə > a, some apocope, *a > ä and *ä > e kept apart, almost no vowel clusters), but with a few quirks in the consonant system that instead align with Enets (chiefly *mp *nt *ŋk > b d g). Basically everything seems to be also derivable from Proto-Nenets-Enets without reference to the other Samoyedic languages.

There are at least a few individual quirks however. One is the development *w > b, which in Yurats only seems to happen before front vowels: bedu ‘intestine’ < *wätə; behánna ‘sturgeon’ < *wäkånå; bi ‘water’ < *wet, bidímat ‘to drink water’ < *wetɜ-; ’10’ < *wü(ə)t. Before back vowels, w remains: uáddu ‘root’ < *wånčå; wark ‘bear’ < *wərkə; wéneku ‘dog’ < *wën. So at least allophonic palatalization of labials for some time existed in Yurats. Having /bʲ/ but no /b/ would be weird though, and I suppose the split may have been one where *ẃ simply drops its velar component to yield *β > /b/.

Another distinctive conditional shift is that normally *a > ä, but *ja instead > ja, as in ja ‘flour’ < *jaə; jákki ‘smoke’ < *jačkə; jálle ‘light, sun’ < *jalä. Since the fronting of *a is a common Nenets-Enets (“Northwestern Samoyedic”?) feature, I would think this is probably a back-development similar to *Ca > *Ćā in Nenets. This is also suggested by two examples of *jü > ju (jur ‘fat’, jur ‘100’) versus retention otherwise ( ’10’, tükǘjalle ‘today’).

As you may have noticed, Müller also marks stress most of the time. This seems to be primarily on the penult (cháru, nócha etc.; behánna etc.) but there are also smaller groups of words with final stress, invariably marked with a grave accent and not an acute one (e.g. pürrè ‘pike’), or with initial stress on a trisyllable (wéneku). Tetrasyllables are rare, compounds aside, but seem to most commonly (6 out of 10 cases) have antepenult stress (e.g. tehánuda ‘wolf’). I have no idea if any of this has comparative significance.

* Addendum 2018-10-11: I have been informed that Castrén may have met the Yurats after all, as he mentions meeting, near the mouth of the Yenisei, a Nenets group whose speech had similarities to Enets. While he does not have any records marked as being specifically from this dialect, apparently his Nenets materials do contain a few dialectalisms that look Yurats-ish. My thanks to Olesya Khanina and Juha Janhunen for the correction.

Lastly, under the cut: the full wordlist itself (in Helimski’s transcription).

Read more ›

An Attestation of Meshcheran

Slowly poking around digitized back issues of Studia Orientalia, I recently ran into Kecskeméti (1968), an article indexing PallasZoographie (1811). This is a notable early source of animal names from several languages of Russia, collected since the late 1700s. Some of these languages would not be otherwise substantially attested until the 1900s, and for a few it is just about the last source available before extinction. (Pallas’ consistency in transcription and coverage are both poor, but we’ll take what we can get.)

During a closer look, for checking some Samoyedic data, I however had to do a double-take upon reaching the heading Mᴇsᴛsᴄʜᴇʀᴀᴇᴄɪs. This is obviously Meshcheran, one of the extinct more western Uralic languages. (Interestingly also with /-sč-/ and not the evidently Russified /-šč-/?) Except, all sources I have seen so far have claimed that Meshcheran went extinct already somewhere around the 1500s…

OK, Pallas only records four “Mestscheraic” words, and a distinct Meshcheran ethnicity is reported to have lingered long after Russification — in at least one case even into the current century! [1] So fairly likely we are dealing here not with a living language, only with substrate loan vocabulary, a natural enough fate for animal names. Yet this is still interesting due to being an attestation securely flagged as Meshcheran. There are two competing theories on the affinity of this language within Uralic — one sees it as a branch or sister of Mordvinic, the other, Permic. To my knowledge both of these build mainly on evidence such as toponymy found in the traditionally Meshcheran region, which is susceptible to errors from pre-Russification population movements.

The list comprises bird names entirely, all given with obsolete binomials:

  • Büdaenae ‘Tetrao coturnix’ (= hazelhen, Coturnix coturnix)
  • Kagau ‘Accipiter milvus’ (= red kite, Milvus milvus)
  • Kuki ‘Cuculus borealis’ (= a cuckoo sp., probably the common cuckoo Cuculus canoris)
  • Schibirtschik ‘Motacilla albeola’ (= common wagtail, Motacilla alba)

The third is obviously undiagnostic of anything, but the others may be worth something.

I cannot make much of the first on a quick lookaround: it would be a hard match for the common Mordvinic term for the hazelhen (Erzya /povo/, Moksha /pova/ < PU *püŋə) and even poorer for the common Permic term (Udmurt /śala/, Komi /śɤla/) — at most it has some very vague similarity to Komi /bajdɤg/ ‘partridge, tarmigan’. [2] The second is however a good match with Mordvinic /kaval/ ‘kite’ (? < *kaɣal), though the implied vocalization of final *-l meanwhile looks amusingly Permic. The last has vague and probably insufficient similarity with Moksha /šäjgiča/ ‘wagtail’ (← /šäj/ ‘valley’ + /kiča/ ‘gull’) on one hand, Russian шибать ‘to hit’ (< Proto-Slavic ‘to whip’) on the other.

I do not feel like rooting around for possibly related names of similar birds; but per ‘kite’ I would at this point lean cautiously towards a Mordvinic-ish affiliation for Meshcheran.

[0] In a blog post this short you’ll probably manage without me doing hyperlinks for the footnotes.
[1] Thus per V. Patrušev apud Rahkonen (2009) in a single village “in a Mordvin area”.
[2] This is probably a loan from early Hungarian or some common source though — cf. Hu. fajd ‘grouse’, from earlier #paďt- per Mansi *paľta ‘black grouse’. If this really were a common Uralic root, I would expect instead **poľt- in Permic (and the cluster *-ďt- would be also unprecedented). OTOH Komi seems to show *p-d > /b-d/, which may allow dating the word to the common Permic era regardless.

Stop voicing across Uralic: some musings

Finnish often gets used as an example of a language that does not contrast voiced and voiceless consonants. While this is not really correct for Standard Finnish (which at least prescribes all of the voiced stops /b d g/), it’s true for many dialects, especially in pre-modern times. [1] The same also holds for most reconstructions of Proto-Uralic and Proto-Finnic. A few times I’ve seen this even given as a typical feature of the Uralic languages. This much is not the case, though. The presence of voiced stops in the recorded Uralic languages varies, but generally tends towards inclusion.

  • No voiced stops:
    • Most spoken Finnish; Northern Karelian
    • Most of Ob-Ugric
    • Forest Nenets, Northern Selkup
  • Allophonic voiced stops:
    • Estonian (short stops optionally voiced medially)
    • Ingrian (voiced before sonorants)
    • older Mari (voiced after nasals)
    • some varieties of Ob-Ugric, at least Southern Khanty per some descriptions (voiced medially)
  • Phonemic voiced stops:
    • most Samic languages
    • most of Finnic: Standard Finnish, Livonian, Votic, Southern Karelian incl. Livvi, Ludian–Veps
    • all of Mordvinic
    • newer Mari
    • all of Permic
    • Hungarian
    • most of Samoyedic: Tundra Nenets, Enets, Nganasan, Southern Selkup, Kamassian, Mator

(The distribution of voiced sibilants such as /z/ is very similar, though they are additionally lacking in standard Finnish and in northern Samoyedic. They are however less important for the forthgoing points, so I will focus on the voiced stops.)

This might still be a higher proportion of languages without voiced stops than within most language families of the Old World, though. Within Indo-European I can only think of Tocharian; Icelandic and varieties of High German; Scottish Gaelic; and, per some views, much of Anatolian. Maybe one of the Eastern Iranian languages that are heavy on spirantization? Even outside IE, the only other national language examples I know of are Chinese (not even in its entirety; at least Wu and Min still preserve the Middle Chinese voiced stop series) and Mongolian. Continental Southeast Asia has plenty of languages that are short on voiced pulmonic stops proper, but these often “compensate” by having instead implosives or prenasalized voiced stops; e.g. Vietnamese with /ɓ ɗ/, Hmong with a full series from /ᵐb/ to /ᴺɢ/.

Reconstructions could be added to the picture as further data points of their own, e.g. Proto-Samic and Proto-Samoyedic are both reconstructed without any voiced stops. However, when we move from synchrony into history, it is probably more important to consider the origin of voiced stops. This shows variation as well, but some particular pathways crop up repeatedly:

  • *-P- > -B- (general voicing of original singleton stops):
    • Southern Sami
    • Finnic: Livonian, Ludian–Veps
    • Mordvinic
    • Tundra Nenets, Kamassian, Mator
  • *-P- > -P- ~ -B- (voicing of singleton stops through consonant gradation):
    • Kola Sami
    • probably Proto-Finnic
    • ? Southern Selkup
  • *-Đ- > *-B- (hardening of earlier voiced spirants):
    • Standard Finnish (*ð > /d/ only)
    • Votic (*ɣ > /g/ only)
    • newer Mari (†ð, †ɣ > /d/, /g/)
  • *-NP- > -B- (simplification of stop+nasal clusters):
    • most of Samic (Southern through Skolt); usually as geminate -BB-
    • Permic
    • Hungarian
    • Enets

You might notice that all of these apply word-medially only. I have also left some more complicated cases off the list for now.

One wildcard approach is Nganasan, where the two most widely established phonemic voiced stops /b/, /ď/ come typically from *w, *j and are unrelated to the original stop consonants. [g] only occurs natively as the equivalent of /k/ under consonant gradation; [d] is even more limited, found as the weak grade of /t/ in the cluster [nd], while between vowels the result is [ð]. (Due to loanwords both could be probably now considered phonemic in modern Nganasan, though strangely enough these kind of inventories seem to then call the dental phoneme /ð/ per its intervocalic allophone and not, as could be expected, /d/.) Also some /b/, /ď/ come by gradation from PU *p, *ś. Their strong grades though are not the corresponding voiceless stops, but instead, a few sound changes later, /x/ and /s/. [2]

A similar setup *w *j > /b dž/ is found also in Kamassian and Mator, in these accompanied though by regular medial voicing. *j > /ď/ alone is more common yet: this is standard in Southern Karelian and Ludian, and found also in some varieties of Veps, Mari, Udmurt and Enets, at least. I even recall reading about a dialect of Hungarian that does this, but I don’t have any good overviews of Hungarian dialectology on hand to double-check with.

This is all also from the POV of synchronic voiced stops. Medial voicing, gradation-related or not, has likely happened at some point in by far most Uralic languages, but this often continued on with further lenition. E.g. in Permic, intervocalic *-p- *-t- *-k- are all continued as zero, most likely with intermediate > *[b] *[d] *[g] > *[β] *[ð] *[ɣ]. In at least one case, two separate rounds of medial voicing have been involved: thus in Southern Karelian, which has both consonant gradation and general medial voicing, so that original singleton stops yield the alternations b ~ v, d ~ ∅, g ~ ∅. This continues earlier stop/spirant gradation: *p ~ *v, *t ~ *ð, *k ~ *ɣ, [3] which in turn is probably from even earlier voiceless/voiced gradation: *p ~ *b, *t ~ *d, *k ~ *g.

Something similar may be actually the case in Permic. There’s reason to suspect that the full *-NP- to -B- shift was later than the lenition of medial single stops. Insted of filling in new voiced stops after the lenition of medial single stops to spirants, these clusters may have instead, in the first phase, filled in new voiceless stops already before the simplification of the original geminates. This is suggested by how a few late loanwords from Iranian still show *-NP- > -B- (/pad/ ‘crossroads’ ← Ir. *panta- ‘path’) but also seem to retain voiced stops as is (Udm. /vudor/ ~ Komi /vurd/ ‘otter’ ← Ir. *udra-); even Indo-Iranian voiceless stops can be continued as voiced (Udm. /kureg/ ~ Komi /kurɤg/ ‘hen’ ← Ir. or II *karka-; per *a > /u/ this must be an older loan than the previous two). So perhaps words of this group were all originally borrowed with simple voiceless stops (*pantɜ or *päntɜ > *patɜ, *vutrɜ, *karäkɜ > *kurekɜ), and they then went through a second round of medial lenition in late Proto-Permic, before the fall of final vowels (> *padɜ, *vudrɜ, *kuregɜ > *pad, *vudr, *kureg)? On the other hand, loaning from some Iranian variety with medial voicing is also conceivable, in the last case even an alternate analysis with *-eg as a suffix, and *rk > *r as in native vocabulary. (The epenthesis to *karäkɜ that would need to be assumed otherwise looks very sketchy, actually.)

I have even wondered if this could have been the same voicing process that affected Proto-Permic single voiceless stops after an unstressed syllable in mainline Komi, but not in Udmurt or Komi-Permyak (e.g. in the adjectival ending Udm. /-et/ ~ K /-ɤd/ ~ KPerm. /-ɤt/). But the fate of the original geminates suggests this is unlikely: since they yield modern Permic simple voiceless stops, same as everywhere from Veps on east, their shortening would have to be later than the voicing of any transient secondarily introduced medial voiceless stops. And it seems rather unparsimonious to assume geminates were still maintained as late as Proto-Komi.

Hungarian also has both *-P- > *-Đ- (*-p- *-t- > -v- -z-; *-k- > †-ɣ- > -v- ~ ∅) and *-NP- > -B-, but here we likely only need a single common round of medial voicing, followed by a chainshift of sorts of *-B- *-NB- to *-Đ- *-B-. Unlike Permic, new /-NP-/ or /-NB-/ clusters are established early-ish; though in loanwords from Iranian the only example seems to be kincs ‘treasure’ < pre-Hu *kenčɜ ← *gandz-. [4] Several others have correspondences elsewhere in Uralic, but I suspect these cases to be mostly loans / Wanderwörter rather than proper native inheritance. (They probably deserve to be more carefully looked at at some point, though.)

This big picture, I think, also raises some questions about the supposed retention of voiceless stops in a few languages.

I am not talking about any kind of a spin on the alternate reconstruction by Steinitz — who outright posited an original stop versus spirant contrast *-t- : *-ð-, instead of a gemination contrast *-tt- : *-t- (and, among the dentals, shunting *d₁ = traditional *ð then off as an absurd “retroflex spirant *δ̣”). This remains conclusively debunked by loanwords from Indo-European, whose voiceless stops turn up with traditional *-t- etc. (Indo-Iranian *ćata ‘100’ → *śëta > Hungarian száz, Erzya сядо /śado/, etc.), instead of Steinitz’ *-t- = traditional *-tt-. A weaker version of this could be perhaps still entertained: medial *-tt- : *-d- etc., but I don’t really see any particular benefit to it. In my opinion the situation found in Samic, Finnish–Karelian, Nganasan and perhaps Selkup can be still considered archaic, with all stop consonants voiceless by default, voiced (> lenited to non-stops in Finnish, Karelian and the immediately adjacent Sami varieties) at most under consonant gradation.

But the other four cases of Uralic languages without any voiced stops seem more dubious. To reiterate: (most of?) Mansi, (most of?) Khanty, Forest Nenets, Northern Selkup. These are all bundled together in western Siberia; the two latter have close relatives that do show medial voicing (i.e. Tundra Nenets and Southern Selkup); and even the former two are usually considered somewhat closely affiliated with Hungarian. Unlike Finnic and Samic, they also all show general shortening of geminates. In most Uralic languages this has been associated with earlier medial voicing, i.e. *-tt- : *-t- > *-tt- : *-d- > *-t- : *-d-, with the length contrast transphonologized as a voicing contrast, as is more common worldwide.

The languages have also gone through some non-general medial lenition: *-k- > *-ɣ- in Ob-Ugric (including even clusters such as *sk), and in Samoyedic *-k- is lost at least in *ə-stems (though not in all cases in *A-stems; established examples of retention include *pirkä < *pid₁kä ‘high’, *kåjkə < *kod₂ka ‘spirit’). In Far Eastern Khanty also *-p- > /-w-/. There is also some limited direct evidence of stop devoicing: like Nganasan, Kamassian and Mator, Selkup also fortites *w and *j — but all the way to voiceless *k, *ḱ.

So I suspect that voicelessness of all stop consonants, as could be proposed for Proto-Uralic, is not actually directly continued in these languages. This looks more like an areal feature, either an innovation wave that crossed a few language boundaries on its way, or subtrate influence. Direct influence from Forest Nenets or some extinct related variety seems possible for Northern Selkup, while in the case of Ob-Ugric, this is maybe more likely to to have been taken up from the original pre-Uralic substrate languages of the region.

This would also mean that degemination and medial voicing could be reconstructed as common Ugric features, if desired; with voicing developing further into spirantization in Hungarian, but eventually mostly reverted in Ob-Ugric. If so, this continues undermining further the notion of Ob-Ugric as a genetic subgroup within Uralic. Previous surveys by Honti and Viitso have not found any common innovations in the languages’ consonant systems other than the nearly trivial degemination, and several trivial shared retentions such as the maintenance of *w- as still /w-/. The evidence of Hungarian-Mansi isoglosses (e.g. *wi > *wü- > *ü-) and even Hungarian-Khanty ones (e.g. *d₂ > *j, further shared by Samoyedic) should also be weighed here: perhaps it is rather some of these that are old common inheritance after all, as has been suggested by various people at various times.

[1] Note that /f/ versus /v/, a contrast fairly widely established in western dialects, does not count as a voicing distinction: the latter is the approximant/semivowel [ʋ]. This is even treated as further equal to /u/ in some generative models of Finnish phonology. I write this as /v/ in broad transcription both for simplicity & following the traditional Uralistic transcription (which itself follows Finnish standard orthography), much like I also generally use /a ä/ instead of the IPA-compliant [ɑ æ].
[2] This also has, I think, implications for the reconstruction of the history of consonant gradation, since *z > /ď/ does not seem plausible. Either we have to date the emergence of consonant gradation between voiceless and voiced grades already into pre-Proto-Samoyedic (= effectively Proto-Uralic), with further ramifications; or, if we want to consider this pattern an innovation specific for Nganasan that never occurred in its close relatives (note in particular that while medial stops are generally lenited in Enets and Tundra Nenets, the same does not apply to /s/), then it is instead the loss of palatalization in *ś that must be also dated as post-Proto-Samoyedic. We would not need to assume an outright palatalized stop or affricate, though: a conceivable route to the modern situation would be *[ś] > [ś] ~ [ź] > [s] ~ [j]  > [s] ~ [ď]. Note also that while palatalized *kʲ > *ć in early common Samoyedic merges with /s/ in northern Smy, in southern Smy these have distinct reflexes /š/ and /s/, suggesting *ś > *s rather soon after PSmy, at the latest.
[3] Traditionally the labial spirant stage is given as [β], but to my knowledge, there is no evidence whatsoever anywhere in Finnic for a distinction between this and regular /v/ < *w; only for retained /b/ in Livonian and Ludian–Veps. Setälä conceived of the latter as a re-fortition from [β], but to me a marginal archaism that never went through a spirant stage seems more likely. It’s conceivable that the shift from *w to labiodental /v/ was not yet completed by the time of [b] > [β], and so this may have been immediately a merger, with [β] > [v ~ ʋ] only following later. The fact that several Finnish dialects are reported to have [w] for /v/ next to rounded vowels (e.g. in SE Tavastia [wuos] for vuosi ‘year’, [sywä] for syvä ‘deep’) may even support reconstructing [w] still for Proto-Finnic in some positions at least.
[4] Judging by the voiceless k and cs, this looks like one of those early loans where Proto-Iranian *c *dz (later > /s z/) were substituted by *č in Uralic, instead of anything directly related to the unexpected appearence of /dž/ in Persian گنج ganj.

Three observations on Bactrian

As a part of my ongoing quest to get a better handle on the Indo-Iranian languages (mostly, yes, but not only due to their important early contact influence on the Uralic languages), some time ago I caught wind of Saloumeh Gholami’s PhD thesis Selected Features of Bactrian Grammar (2010) and have given it a thorough-ish read. Bactrian has been and probably continues to be one of the more poorly documented Iranian languages, and Gholami provides what seems like a good summary of the newer ongoing research.

Already at this point there are a few interesting observations to be made. And I hope you will not be too disappointed to find out that my thoughts so far mostly involve the historical phonology of Bactrian — the syntax and morphology no dout have interesting phenomena going on too, but I probably won’t be able to say anything intelligible about those before knowing much better how they work also in the other Iranian languages from the same period and/or area (Sogdian, Xwareshmian, Middle Persian, Pashto etc.)

Gholami’s overview of the phonology of Bactrian is introductory in nature but still very historically grounded: she gives a pre-Bactrian etymology for almost every example word mentioned. These are not sourced, so it is hard to tell how far back they are supposed to go (all the way to Proto-Iranian?), but I get the impression that they’re based on earlier groundwork on Bactrian by Nicholas Sims-Williams, whom she mostly refers to for basics.

The thesis also does not contain any kind of a word index, so I’ve had to comb the initial chapters by hand for examples, getting a bit over 400 of them together. Further vocabulary would appear in the grammatical chapters with their extensive interlinear glosses, but generally without proto-forms. If we regardless suppose her given pre-Bactrian reconstructions to be reliable, they seem to allow for the following observations.

One: there seems to be a rule of non-open vowel shortening.

Middle Iranian *ē (in Bactrian from Proto-Iranian *ai, *aya, *iya, *ā-i) is in Bactrian spelled varyingly as ‹η› (likely /eː/) or ‹ι› (likely either /i/ or /iː/). Gholami suggests that *ē develops to ‹ι› before a nasal, on the basis of the following data: *waina- > ‹οιν-› ‘to see’, *kainā- > ‹κινο› ‘revenge’, *abi-dayanā > ‹αβδδινο› ‘custom’, *abi-dayana-ka > ‹αβδδιγγο› ‘way, manner’, *xrayanā > ‹αρχινο› ‘purchase’. Raising of long vowels before nasals is common across Iranian, sure enough. However, Bactrian shows no signs of the parallel developments *ōN > **ūN (*gauni-čiya- > ‹γωνζο› ‘basket’, *čiyāt-gauna > ‹σαγωνδο› ‘as, like’) or *āN > **ōN (*bāmušn > ‹βαμοϸνο› ‘queen’, *gawāna > ‹γαοανο› ‘fault’, *nāma > ‹ναμο› ‘name’, *fra-māna > ‹φρομανο› ‘command’, *fšupāna > ‹χοβανο› ‘shepherd’…)

An assumption of pre-nasal raising also does not exhaust the cases with *ē > ‹ι›: this also occurs in *ziyakā > ‹ζιγο› ‘damage’, *waignā > ‹οιγνο› ‘famine’ (unless phonetically with [-ŋn-]?), *-iyaθwa > ‹-ιλφο› ‘a suffix’ (thanks Gholami, very illustrative glossing).

I would instead suggest the following rules:

  1. *ē gives ‹ι› before an original unstressed *ā. This handles ‘damage’ and ‘famine’, but also ‘revenge’, ‘custom’ and ‘purchase’. This is likely primarily also shortening *ē > *e, with raising *e > /i/ only following secondarily.
    • This does not seem to apply to /ē/ from i-umlaut of *ā: *dāraya- > ‹ληρ-› ‘to have’, *wādaya- > ‹οηλ-› ‘to lead’, *wādžaya > ‹οηζο› ‘ability, power’, *wi-čāraya- > ‹οισηρ-› ‘to purchase’. These could suggest either that implicit intermediate unstressed *ē (*dārē- > *dērē-, *wādē- > *wēdē- etc.) did not trigger shortening; or, alternately, maybe i-umlaut of *ā initially led to a distinct low front vowel *ǣ, which was only raised to ‹η› after the shortening/raising of *ē from *ai, *aya, *iya. The latter might be preferrable in light of one case with *au > *ō > ‹ο› (rather than ‹ω›) before *aya > *ē: *tauxmaya > ‹τοχαμηιο› ‘relationship’ (here *ē is not lost; thanks to further suffixation?). As a vowel, ‹ο› probably mostly stands for /u/, as is suggested by its use also for /w/ (‹οηζο› = /wēdz/, etc.) and the general typology of vowel systems across Iranian: Old and Middle Iranian languages mostly do not have short /o/. [1]
  2. *ē gives ‹ι› also before word-final consonant clusters. (NB: ubiquitous final ‹-ο› is thought to be only a Greek-derived orthographic device.) This handles ‘way’, as well as the ‹-ιλφο› suffix, and maybe also *fšuyantīčī > ‹φινζο› ‘lady’ (though here we instead have *-uya-, which I suppose could have contracted to *ī rather than *ē already to begin with).
    • This is again applicable also to the development of *ō: *aitat-gaunaka > ‹δαγογγο› ‘such, in this way’, *bawanta > ‹βονδο› ‘completely’.

These rules only seem to leave the verb root ‘to see’ unaccounted for. However, a more general version of rule 1 might cover some inflected forms (*wēn-ēd > ‹οινηδο› ‘see.2PL’), and actually also an allomorph with retained *ē exists (*wēn-an > ‹οηνανο› ‘see.subjunctive-1PS’). Gholami thinks these are chronologically separated versions before and after the sound change from ‹η› to ‹ι› (early /wēn-/ > late /win-/?), but if there is a chronological difference, maybe this rather involves levelling-away of the /wēn-/ allomorph.

Rule 1 then suggests that before the onset of root stress and the reduction of all suffix and prefix syllables, Bactrian went through a stage of mobile stress attracted rightwards by long vowels, as I believe occurs in several other Indo-Iranian languages (though don’t ask me about the exact details on this).

Two, a few notes on vowels in prefixes. These are mostly reduced heavily, and are spelled varyingly with ‹α› or ‹ο›, which Gholami interprets as [ə]. E.g. *fra-gāwa > ‹φρογαοο› /frəɣāw/ ‘profit’, *ni-kanta- > ‹νακανδο› /nəkand-/ ‘to dig’, *uz-bara- > ‹αζβαρο› /əzvar-/ ‘to bring forth’. There is also epenthetic /ə/ before some consonant clusters: *spāsV > ‹σπασο ~ ασπασο› /spās ~ əspās/ ‘service’. Despite some cases of variation like this, schwa seems to be still an underlying phoneme, however: consider *xšayanta- > ‹αχανδ-› /əxānd-/ ‘to control’, with first *xš- > *əxš-, followed by *š > ∅ (if not rather > *hx > /xː/, spelled simply as ‹χ›?); and *upa-stāna > ‹αβαστανο› /əvastān/ or /əvəstān/ ‘support’. There doesn’t seem to be much evidence against considering [ə] an unstressed allophone of /a/, though. (Gholami takes no stance on questions about the phoneme inventory of Bactrian and operates only with orthographic vs. surface phonetic levels of analysis.)

There are also some cases where *ni- is still spelled as ‹νι-›. Gholami suggests that these would be retentions. I think they might be however secondary umlaut developments: in the data given, they occur mostly preceding a palatal root vowel ‹ι› or ‹η›, as in *ni-štaya- > ‹νιττι-› /nihti-/ ‘to send (a message)’; or preceding a palatal sibilant (possibly itself originally conditioned by *i through RUKI), as in *ni-šadman > ‹νιϸαλμο› /nišalm/ ‘seat’. There are also examples of ‹ι› continuing earlier prefixal *a in a similar context: *waz-antiyaka > *wəzindēg (with umlaut in the root: *a-i > *i) > ‹οιζινδδιγο› /wizindiɣ/ ‘current’. Gholami attributes this last example to a supposed development of *a to ‹ι› before /s z/, which would also be seen in *dasta > ‹λιστο› /list/ ‘hand’. There are however plenty of counterexamples, say *aspa > ‹ασπο› ‘horse’, *ā-xasa- > ‹αχασ-› ‘to quarrel’, *basta- > ‹βαστο› ‘to bind’, *dasa > ‹λασο› ’10’; *azam > ‹αζο› ‘I’, *azdā > ‹αζδο› ‘knowledge’, *gazna > ‹γαζνο› ‘treasury’, *waza- > ‹οαζ-› ‘to use’. I don’t know what is up with ‘hand’; theoretically, some kind of suffixation to *dasta-ya- would work. [2]

Lastly, one case with the development of *fra- ‘pre-‘ suggests that vowel reduction actually has been fairly early, resulting in this prefix first in *fr̥-, which then in unstressed position mostly unpacks again to *frə-. Consider *fra-stāya- > ‹φοϸτιι-› ‘to send’: this exemplifies the sound change *rs > /š/ (compare e.g. *kr̥sta- > *kirsta- > ‹κιϸτο› /kišt/ ‘to detain’), and therefore requires *fr̥stēy- > *fštīy- > /fəštīy-/.

Three, the development of *š shows double treatment. Gholami notes that in some cases, *š is retained as ‹ϸ› /š/; in others, it developes to ‹υ› /h/, which can be further lost (or perhaps only unwritten in various consonant clusters, I wonder?). This does not appear to be a simple case of dialect mixture or whatever, since both outcomes can sometimes occur in the same word: *ni-šašta- > ‹ναυαϸτο› /nəhašt/ ‘to settle’.

Examining the data, to me the distribution does not appear to be entirely unpredictable, though. *š > *h seems to be the main development for *š originating by RUKI:

  • *is, *us > *iš, *uš > *ih, *uh
    • *awa-gta > ‹ωγοτο› /ōɣu(h)t/ ‘to conceal’
    • *d-manyu > ‹λρουμινο› /lruhmin/ ‘enemy’ [3]
    • *fra-ta-ka > ‹φρητογο› /frē(h)təɣ/ ‘messenger’
    • *kasta- > ‹κισατο› /kisə(h)t/ ‘youngest’
    • *ni-gaa- > ‹ναγαυ-› /nəɣāh-/ ‘to hear’
    • *ni-šašta- > ‹ναυαϸτο› /nəhašt/ ‘to settle’
    • *ni-štaya- > ‹νιττι-› /nihti-/ ‘to send (a message)’
    • *snā > ‹ασνωυο› /əsnōh/ ‘daughter-in-law’
    • *wi-šmāra- > ‹οαυμαρ› /wəhmār/ ‘to account’
    • *wrta-ka > ‹ροτιγο› /ru(h)tiɣ/ ‘rope’
  • *rs > *rš > *r(h)
    • *ā-pr̥št- > ‹βαρτ-› /var(h)t-/ ‘to be necessary’
    • *gr̥šta- > ‹γιρτο› /ɣir(h)t/ ‘to complain’ (past stem)
    • *hr̥šta- > ‹υιρτο› /hir(h)t/ ‘to leave’ (past stem)
    • *wi-xwata- > ‹οοχορτο› /wəxur(h)t/ ‘to quarrel’
  • *ḱs, *k⁽ʷ⁾s > *ćš, *kš > *š, *xš > *h, *x(h)
    • (PII *ćš >) *pašman > ‹παμανο› /pa(h)man/ ‘wool’
    • *āθriya > ‹χαρο› /x(h)ār/ ‘ruler’
    • *ayant- > ‹χανδ-› /x(h)ānd-/ ‘to control’
    • *apā- > ‹χαβρωσο› /x(h)avrō(t)s) ‘night-and-day’
    • *nauθra > ‹(α)χνωρο› /(ə)xnōr/ ‘satisfaction’
    • *wašti > ‹χοατο› /xʷa(h)t/ ’60’ [4]
    • (PII *ćš >) *xšwašti > ‹χοατο› ’60’
    • *waa > ‹οαχο› /wax(h)/ ‘interest’

In one case I’m not sure if RUKI or *ćt > *št is involved: *paršti-čī- > ‹παρσο› /parts/ ‘backwards’.

Meanwhile, retention of *š seems to be entirely regular in the position *a_V, *ā_V. In these positions *š would be maybe the most likely to continue PII *sč < *sk(e), though *ćš is also an option, and some could be innovative Iranian vocabulary from somewhere else entirely:

  • *dāšinV > ‹λαϸνο› /lāšn/ ‘gift’
  • *fra-xāšaya- > ‹φριχηϸ-› /frixēš-/ ‘to seduce’
  • *paga-šaka- > ‹παχϸιιο› /paxšiy/ ‘in-law’
  • *uz-gaša- > ‹αζγαϸ-› /əzɣaš-/ ‘to dissent’
  • *xāša-ka > ‹χαϸιγο› /xāšiɣ/ ‘clothing’

A few clear cases of retained /š/ from RUKI also appear:

  • *kr̥šāka > ‹κιϸαγο› /kəšāɣ/ ‘plough-ox’ (<< PIE *kʷels- ‘to plough’)
  • *ni-šādman > ‹νιϸαλμο› /nišalm/ ‘seat’ (<< PIE *sed- ‘to sit’)

In most cases of retention I am not sure about the pre-Iranian origin of *š (but RUKI is conceivable in many of them):

  • *a-xwašn- > ‹αχοαϸνο› /axwašn/ ‘unpleasantness’ (any relation to Ir. *xwad- ‘to make pleasant’ < PIE *sweh₂d-?)
  • *bāmušn- > ‹βαμοϸνο› /vāmušn/ ‘queen’ (any relation to Persian بانو /bānu/ ‘lady’?)
  • *daxštana > ‹λαχϸατανιγο› /laxšətaniɣ/ ‘crematory’ (from pseudo-PIE *dʰegʷʰ-sth₂no-?)
  • *hāwišta-ka > ‹υαϸκο› /hāšk/ ‘pupil’
  • *pitr̥-šti- > ‹πιδοριϸτο› /piðurišt/ ‘ancestral estate’ (from pseudo-PIE *ph₂tēr-steh₂-?)
  • *ni-šašta- > ‹ναυαϸτο› /nəhašt/ ‘to settle’ (maybe *š-s > *š-š, if from PIE *steh₂-?)
  • *škara- > ‹αϸκαρ-› /əškar-/ ‘to follow’
  • *wi-xwarša- > ‹οοχωϸ› /wəxōš/ ‘quarrel’
  • *xšāya- > ‹ϸιι-› ? /šīy-/ ‘to be able’ (< PII *kšaH-? but cf. /x(h)/ in the derivative ‘ruler’)
  • *xšidža-ka- > ‹ϸιζγο› /šidzɣ/ ‘good’

I could suggest at least that before a vowel, *rš >/š/ (‘plough-ox’, ‘quarrel’), while before a consonant, *rš > /r(h)/ (‘to be necessary’, ‘to complain’, ‘to leave’, ‘to quarrel’ and ‘backwards’).

The cases with *št from PII *ćt seem to be rather evenly split. *š > *h appears in:

  • *aštā > ‹αταο› /a(h)tā/ ‘8’ (<< PIE *oḱtōw)
  • *ham-gašta- > ‹αγγιτι› /angi(h)ti/ ‘to receive’ (past stem) (< East Iranian *gādz- ‘to receive’, of unknown earlier origin per Cheung)
  • *ni-pixšta- > ‹νιβιχτο› /nəvix(h)t/ ‘to write’ (past stem) (<< PIE *peyḱ- ‘to paint, decorate’)

while retention appears in:

  • *pašti- > ‹παϸτο› /pašt/ ‘agreement’  (<< PIE *peh₂ḱ-; cf. pact)
  • *rašta- > ‹ραϸτο› /rašt/ ‘true, loyal’ (<< PIE *h₃reǵ-; cf. right)

Same goes for cases with *fš, though examples are rather rare:

  • *š > *h in *pati-fšarV > ‹πιδοφαρο› /piðəf(h)ar/ ‘honour’; *fšuyantīčī > ‹φινζο› /f(h)indz/ ‘lady’
  • retention in *kafši > ‹καφϸο› /kafš/ ‘shoe’
  • and even: *fš > /x/ in *fšupāna > ‹χοβανο› /xuvān/ or /xəvān/ (or /xʷvān/?) ‘shepherd’.

Is it perhaps relevant that ‘shepherd’ comes from PII *pću-, while the others are more likely to be from *ps with Iranian “second RUKI” to *fš? Maybe additionally *fš- > /f-/ root-initially versus retained medially.

It’s also worth pondering that *š > *h fits somewhat poorly into the phonological big picture of Bactrian. Usually *š >> /h/ correspondences go through *x (thus in e.g. Finnic or Spanish; Pashto remains at the /x/ stage), but Bactrian retains Proto-Iranian *x just fine. Two other possibilities come to mind, but they both would require Bactrian to have split off from the other Iranian languages relatively early:

  1. Perhaps *kʰ > /x/ and *kC > /xC/ are fairly late in at least some parts of Iranian: they are, after all, not reflected in some of the languages, such as Balochi and Wakhi. *š > *h could then have passed through a transient *x state already earlier.
  2. Perhaps the path was here rather *š > *s > *h, the second change being common (but not Proto-) Iranian. But this leaves many cases unexplained: original *-st- for example does not develop into **-ht-, but *-št- still does in many cases (‹νιττι-›, ‹φρητογο›, ‹ωγοτο›, etc.)

Likely having more clarity on this issue would require examining also the cognates elsewhere in Iranian, and not necessarily taking Gholami’s pre-Bactrian reconstructions as a given. But this remains difficult as long as there is no general Iranian Etymological Dictionary to consult.

[1] Gholami suggests /o/ for cases with *a-u > ‹ο›, such as *madu > ‹μολο› ‘wine’. Other eastern Iranian languages with this assimilation, though, end up with *u, e.g. Ossetian муд. I-umlaut of *a-i also gives ‹ι›, not ‹ε›, e.g. *kanyā > ‹κινο› ‘canal’.
[2] Sometimes it is proposed that ‘hand’ in Iranian would be native only in Persian, and borrowed from there to most of the other varieties, since this has PIE *ǵʰ- and is expected to give /d-/ only in Persian, but /z-/ elsewhere (and Avestan indeed has that). In this case the widespread Middle Iranian fronting of short *a to *æ, which appears to be absent from Bactrian, might result in *destV > *ðistV > /list/ in Bactrian. However I think that dissimilation before syllable-final *s is perhaps more likely: PIr *dzast- > *dast- (this proposal I’ve seen from Martin Kümmel). — There is however the fact that ‘hand’ contains original PIE *s, while my counterexamples like ‘horse’, ’10’ and ‘I’ mostly have secondary *s *z from PIr. *c *dz < PII *ć *dź⁽ʰ⁾ < *PIE *ḱ *ǵ⁽ʰ⁾. This could be perhaps leveraged, if wanted, but I don’t see what phonetical sense this would make, and so I don’t feel like doing a full check-up on the matter.
[3] The (rather funky!) consonant cluster /lr-/ presumably by folk etymology from *drauga > ‹λρωγο› /lrōɣ/ ‘false(hood), wrong’.
[4] In principle pre-epenthesis *swašti > *šwašti could also work, with *š > *h then feeding into common Iranian *hw > *xw?

Were there Proto-Samic *š-stems? Some issues of Samic-Finnic chronology

Despite ongoing disputes about the subgrouping of the Uralic family, it is clearly the case that the Finnic and Samic languages have been at least neighbors for several millennia now, exchanging linguistic features and material back and forth. With care, this allows teasing out substantial facts about the relative chronology of the history of the two families. (Germanic can be also added to the bundle, though the evidence from here is much more unidirectional.)

The sibilant system shows several good examples. While Finnic /s/ and Samic /s/ correspond to each other consistently all the way from Proto-Uralic to the present day, the “shibilants” have a more complex history. In old inherited vocabulary the main correspondences for these are Finnic *s ~ Samic *č (from original *ś ~ *ć, be it Proto-Finno-Samic or all the way from Proto-Uralic) and Finnic *h ~ Samic *s (from original *š). The latter correspondence can also appear in old (perhaps mostly either parallel or Finnic-mediated) loans from Germanic, whose *s was substituted as *š at least on the Finnic side; no way to tell if also in Samic.

(This probably indicates that pre-Finnic *s was, following the merger of *s and *ś, realized as laminal [s̻], while *š was (sub?)apical [ʂ]. Germanic *s was likely apical [s̺], and therefore matched better with pre-Finnic *š. I am not sure how far back the modern Finnish realization of /s/ as apical [s̺] dates, but at least the Northern Karelian shift of *s to an apical postalveolar [s̱] š most likely starts from this same value.)

The correspondence Finnic *s ~ Samic *š appears in a small number of native-looking cases, where they seem to represent original preconsonantal *ś (PF *laskë- ~ PS *lōštē- < *laśkə- ~ *laśk-ta- ‘to let out, pour, etc.’; PF *kisko- ~ PS *këškē- < *kiśka(w)- ‘to tear, pull’; PF *vaski~ PS *veaškē < *wäśkä ~ *waśka ‘copper’). It is however more common in loanwords between the two. E.g. Finnic *s before *i and *ü seems to be fairly regularly substituted as *š in Samic; the YSS data has 5 examples of this out of its 11 examples altogether of PS *š-. [1] All late loanwords from Samic into Finnish also show *š → /s/, for the obvious reason that Finnish has had no other sibilants for most of its independent existence. (Even the modern loanword phoneme š or sh is still limited to educated speakers. Probably a rather large proportion of Finns counts as “educated” by typical contact linguistic standards by now, though…)

Lastly, also the fourth theoretically possible correspondence between plain sibilants is attested: Finnic *h < *š ~ Samic *š. (I will not be treating the various affricates in this post.) This might be the group that has the most value for establishing chronology, since it is bounded both from above and below: prevocalic *š only occurs in loanwords in Proto-Samic, but any such loanwords from Finnic must then pre-date the pan-Finnic change *š > *h.

Some of the data in this group suggests that it stretches beyond the breakup of Proto-Samic. One example is the word for ‘coal, ember’; in Finnic *šiili > *hiili (Fi. hiili etc.), which then appears as pseudo-PS *šilë in Southern, Ume and Pite Sami (SS sjïjle etc.); as pseudo-PS *hilë in Lule and Northern Sami (NS hilla); and pseudo-PS *ilë in Eastern Sami (Inari illâ etc.). I’ve sometimes seen also the explanation that these kind of cases would not be parallel loanwords, but rather several layers of re-loaning, with each new loanword then flushing out the previous one. This however seems unlikely to me, especially when dealing with a non-cultural term like ‘coal’ that has no reason to be repeatedly loaned from Finnic, and when the distribution of the different variants is perfectly complementary. [2]

Meanwhile, *š > *h is usually taken to be late Proto-Finnic, i.e. at least Proto-Core Finnic (probably later than at least the splitting of South Estonian though). Does this mean that Proto-Samic is therefore younger than even Core Finnic? And how does this measure up with how e.g. Jaakko Häkkinen (Jatkuvuusperustelut ja saamelaisen kielen leviäminen, osa 2: see table on p. 19) comes out with the opposite result: Proto-Samic would have broken up earlier than Proto-Finnic?

One option would be to sigh and concede that apparently words like ‘coal’ are multiple layers after all. But I would hold out for a different explanation: we can probably shift the dating of *š > *h ahead quite a bit beyond its various termini post quem. E.g. the introduction of *h → *h in Germanic loanwords into Finnic does not have to be enabled by the development of a native /h/ in Finnic; it can represent also the taking-up of a new loanword phoneme, which besides probably already existed as an allophone in the clusters *kt [ht] and/or *sl *sr *sn [hl hr hn]. In fact, since Proto-Finnic also had all four of *st *kl *kr *kn, then the introduction of [h] in both *kt and the *sR group would have already been sufficient to phonemicize it: it could be no longer identified uniquely as either /s/ or as /k/. — Again, I plan on writing a full article on this topic in the future.

This finally brings me to the topic I mention in the title. The Samic languages have borrowed *š from early Finnic also in several consonant-stem nominals. However, while these have consistently /-š/ in Western Sami, they seem to have dual representation in Eastern Sami: sometimes they surface with /-s/, sometimes with /-š/. At first sight this sounds like it might be related to the fact that some of these cases are loaned from PF *-is and not *-eš — but no, that contrast appears to be completely orthogonal.

Let’s roll out the data:

(1) Eastern Sami /-š/ ← Finnic *-eš

  • F *imeš (> Fi. ihme ‘wonder’) → S *imëš > e.g. North imaš, Inari iimâš
  • F *kadëš (> Fi. kade ‘jealous’) → S *kāðëš > e.g. North gáđaš, Inari kaađâš
  • F *laudëš (> Fi. laude ‘seat in sauna’) → S *lāvtëš > Skolt laaudâš
  • F *murëš (> Fi. murhe ‘sorrow’) → S *morëš > e.g. North moraš, Inari muurâš
  • F *säigeš (> Fi. säie ‘thread, fiber’) → S *šeajkëš > Kildin šieigaš
  • F *säigeš also? → S *sājkëš > e.g. Skolt saaiǥâš
    (This looks like a contamination of the previous word × the verb *sājkē- ‘to wear out’ reflected in most of Samic; which is probably not loan, but older inheritance from original *säjkä, as no vowel-stem forms survive in Finnic. North sáiggas ‘worn’ is then simply a native derivative from the verb, as also per the semantics.)
  • F *tarbëš (> Fi. tarve ‘need’) → S *tārpëš > e.g. North dárbbaš, Inari taarbâš
    (In this one case, with an *s-stem quite widely alongside: *tārpēs > Southern daerbies, also Lule; *tārpës > e.g. Skolt taarbâs, also Pite, Lule. Lule Sami seems to have all three variants: dárpaj, dárpes, dárpas, and even a vowel-stem dárpa. There is Finnish dialectal tarvis as well, so the diversity clearly goes back to parallel loaning in some fashion.)

(2) Eastern Sami /-š/ ← Finnic *-is

  • F *kallis (> Fi. kallis ‘expensive’) → S. *kāllëš > e.g. North dial. gállaš, Skolt kaallâš
    (in Inari *ēs-stem kaalles, apparently with a nativized adjective ending)
  • F *ruumis (> Fi. ruumis ‘corpse’) → S. *rumëš > e.g. North rumaš, Inari ruumâš
    (parallel *romës in Skolt roomâs, [3] compareable with the Fi. dialectal variant rumis from Southern Ostrobothnia; and with a vowel stem in Southern Sami: *romē > räbmie.)
  • F *rugis (> Fi. ruis ‘rye’) → S. *rukëš > e.g. North rugaš, Inari ruuvaš
  • F *valmis (> Fi. valmis ‘ready’) → S. *vālmëš > e.g. North válmmaš, Inari vaalmâš

(3) Eastern Sami /-s/ ← Finnic *-eš

  • F *kantëlëš (> Fi. kantele ‘a traditional string instrument’) → S *kāntëlës > Inari kaddâlâs
  • F *kiireš (> Fi. kiire ‘hurry’) → S *kirës > e.g. Inari kiirâs
  • F *kärmeš (> Fi. käärme ‘snake’) → S *kearmëš ~ *kearmës > e.g. North gearpmaš, Ter “kermʾs
  • F *pereš (> Fi. perhe ‘family’) → S *pearëš ~ *pierës > e.g. North bearaš, Kildin пӣрас
    (vowel-stem *pearë in Skolt piâr)
  • F *terveš (> Fi. terve ‘healthy’) → S *tearvëš ~ *tiervëš > e.g. North dearvvaš, Inari tiervâs
  • F *voidëš (> Fi. voide ‘lotion, ointment) → S *vōjtës > Inari vuoidâs
    (From Sammallahti’s reverse dictionary of Inari Sami. Álgu does not have this lexeme, so I have no idea if there are equivalents elsewhere in Samic. This could be also an independent derivative within Samic from the base verb: PS *vōjtë- ‘to grease, anoint’, interestingly an *ë-stem one instead of *ē-stem, as could be expected.)

(4) Eastern Sami /-s/ ← Finnic *-is

  • F *nakris (> Fi. nauris ‘swede (type of turnip)’) → S *nāvrëš ~ *nāvrës > e.g. North návrraš, Inari naavrâs
  • F *saalis (> Fi. saalis ‘catch’) → S *sālëš ~ *sālës > e.g. North sálaš, Inari saalâs

A few initial comments:

  1. I’ve only included cases with -s when Western Sami, or failing that Finnic, actually points to *-š. Of course F *-š ~ S *-s can be also found in older shared vocabulary, as in ‘boat’: *venəš > F. *veneš > Fi. vene; > S. *vënës > e.g. North vanas, Inari voonâs; > Mordvinic *venəš > e.g. Erzya венч /venč/. ‘Hurry’ could be theoretically also of this type; per the vowel correspondence *a ~ *ā, kaddâlâs clearly cannot.
  2. This entire word group seems to be centered on Northern and Inari Sami. Reflexes are practically absent from Southern Sami (only gïermesj ‘snake’), very rare also in Ume and Pite Sami. This would fit well together with late separate loaning from early Finnish specifically + occasional diffusion into other Sami varieties.
  3. Some of these words are originally from Germanic, and could be in theory partly borrowed directly from there into Samic, but I haven’t found any examples where Proto(-Western)-Samic *-ëš appears in a loanword without Finnic equivalents. Also, many enough cases are native Finnic, either wholly (e.g. kantele perhe säie, nauris saalis) or at least the *S-derivative is (terve); or come from Baltic (käärme). The only case where parallel loaning is clearly involved is the ‘need’ group: probably *tārpëš via Finnic, versus *tārpës directly from Scandinavian *þarbiz.

Here is a quick distribution chart, as you may wish to consult for point 2: [4]

*imëš:      - - - L N I - - -
*kātëš:     - - - - N I S - -
*lāvtëš:    - - - - - - S - -
*murëš:     - - - L N I S - -
*seajkëš:   - - - - - - - K -
*sājkëš:    - - - - - - S K -
*tārpëš:    - - - L N I - - -
*kāllëš:    - - - - N I S K T
*rumëš:     - - - L N I S - -
*rukëš:     - - - L N I - - -
*vālmëš:    - - - L N I S K T

*kāntëlës:  - - - - - I - - -
*kirës:     - - - - - I S K T
*kearmëš/s: S - P L N - - - T
*pErëš/s:   - - - L N - - K T
*tErvëš/s:  - - - - N I S K T
*nāvrëš/s:  - U P L N I S K -
*sālëš/s:   - - - L N I - - -

          S U P L  N  I  S  K T
totals:   1 1 2 10 13 13 10 8 6

OK then, caveats done with, what is actually going on in here?

Mikko Korhonen in Johdatus lapin kielen historiaan mentions passingly (p. 200) only that the *-ëš-group “appears in correspondence to the loan original’s h(“š esiintyy itämerensuomalaisissa lainoissa originaalin h:ta vastaamassa“). The same is stated in stronger terms by Mikko Heikkilä in Bidrag til fennoskandiens språkliga förhistoriet i tid och rum (p. 107), where he claims that late Proto-Finnic *h would have been adopted in Samic as *h syllable-initially and *š syllable-finally. This seems phonetically implausible to me however, given that (1) Scandinavian /h/ is regularly borrowed into Samic as /h/ ~ ∅, never as **š, (2) Finnic coda *h from *k is never borrowed as Samic **š, and (3) there definitely is also a layer of loanwords where Finnic onset *š gives Samic *š.

Heikkilä seems to suggest that late substitution as *š could have involved loaning from an intermediate stage of the *š >> *h shift, that he gives as [ç]. A palatal fricative could indeed be plausibly borrowed as Proto-Samic *š, especially if this was a palatal sibilant [ɕ] (as suggested by its origin from *ś, and its later development to /j/ in Western Sami, when before a consonant). This intermediate reconstruction is however based on a common misunderstanding. Sound changes of the type *š > *h do not involve a trek through every single intervening POA you can find on an IPA chart! [5] These are rooted in the tendency of retroflex consonants in particular to acquire a velar coarticulation, which can then take over as the primary POA; and also for spirants such as [x] to lenite to [h]. Palatal [ç] would be overpassed entirely in this process.

So I see no other explanation than that the cases of *-ëš ~ *-eh must have been borrowed before the Finnic sound change *š > *h (before the loss of the sibilant feature, to be exact). And the distribution suggests that Proto-Samic would have been by this point already quite thoroughly broken up: after all, these words seem to have been borrowed independently mainly into the precedessors of L N I S. Perhaps Proto-North-Lule and Proto-Inari-Skolt at the deepest, in case such entities could be assumed (usually classifications of the Sami languages go with Pite-Lule and Skolt-Kola groupings instead, but I am not entirely sold on this).

In other words, I answer my headline question regardless in the negative: no, there did not exist any *š-stems yet in Proto-Samic, not even in any possible early subgroups like Proto-Western, Proto-Eastern or Proto-Non-Southern; they have only come about later through contacts with early Finnic.

I have not invented any real explanation for the dual treatment of *-š in Eastern Sami. For this, I can only offer a few hypotheses (that all point in different directions):

  • maybe early on there was a sound change *-š > *-s in Eastern Sami, and cases with retained are newer loans, perhaps partly from Northern Sami (since they seem to be fairly rare in Kola Sami)? This probably could not be equated with the general Samic shift of original *š to *s, since there are many enough good examples of the retention of prevocalic PS *š- in Eastern Sami, and none of unexpected *s- (that I know of). [6]
  • maybe Finnic *š was for a while again borrowed in Eastern Sami as *s, due to being increasingly non-palatal [ʂ], while cases with *-š are older loans from an [ʃ] stage?
  • maybe a lost Finnic variety has been involved where word-final *-š > *-s? A late analogical development of *-h-stems to -s-stems is known from Southern Ostrobothnia… which is however nowhere near the attested Eastern Sami languages.

Going by the vowel substitutions also diverging in *pearëš ~ *pierës, *tearvëš ~ *tiervës and *rumëš ~ *romës, the last two explanations sound somewhat better than the first.

This problem very likely needs to be further tied in with *-eš ~ *-is variation appearing even within Finnic, again largely with a West-East divide, such as Western Fi. tarve ~ Eastern Fi. tarvis; Fi. käärme ~ Karelian keärmis; Fi. säie ~ Karelian säijis; Fi. laine < *laineh ~ Olonets-Ludian-Veps lainis ‘wave’. But it is not clear to me if this is good enough to run with my third hypothesis, since there seems to be very little correlation in the occurrence of alternation in Eastern Sami versus in Finnic: there is no e.g. **seajkës in Samic, and more importantly, no **kantelis, **peris or **nakrëš, **ruumëš, **saalëš in Finnic.

[1] YSS has been now added to my fairly slowly growing Bibliography. If anyone’s curious, the five words with *š ← *s(i, ü) are: PS *šëlëtē ← PF *siledä ‘smooth’; Western *šëljō ~ Northern+Eastern *šiljō ← Fi. silja ‘courtyard’ (clearly rather one of the post-PS loanwords); PS *šëlmē ‘eye of an ax’ ← PF *silmä ‘eye’; PS *šëltē ← *silta < PF *cilta ‘bridge’; PS *šëntë- ← PF *süntü- ‘to become, be born’. I also suspect that PS *šōjē ‘rowan’ may derive from PF *sooja ‘protection’, as in Finnish (as also e.g. Germanic) mythology / folk belief the rowan tree has been considered to grant protection to the homestead. It’s not quite clear why would we have *š- and not *s- here, though. An independent loan from the same Indo-Iranian source (*sćāyā- ‘protection’) would also work.
[2] A slightly better explanation along almost the same lines might be “etymological alienization”, where the existence of Finnish hiili would have prompted a reshaping of e.g. expected Northern Sami ˣšilla into hilla, possibly fairly late then. This does not seem to be feasible in the case of Eastern Sami, though: in particular Inari and Skolt Sami have only come into intensive contact with Finnish fairly late, but the lack of of ˣh- indicates relatively early loaning. (IIUC /h/ → ∅ has remained the default case in contacts between Karelian and Kola Sami, however.)
[3] Álgu gives for this a comparison with North ruomas ‘wolf’, which looks like a rather recent (taboo? epithet?) borrowing from Skolt.
[4] S U P L N I S K T for Southern, Ume, Pite, Lule, Northern, Inari, Skolt, Kildin and Ter Sami respectively. Yes, that’s “S” appering twice, but you can figure this out.
[5] One impressive example of this approach is the development path “ʃ > ʂ > ç > x > χ > ħ> h” given in Kallio’s “Kantasuomen konsonanttihistoriaa”.
[6] Amusingly but probably unrelatedly: in “An essay on Saami ethnolinguistic prehistory” Ante Aikio mentions five examples of the “opposite” correspondence, with *s- in Western Sami ~ *š- in Eastern Sami.

