Some new work on the Agricultural Substrate

Back in 2009, a very interesting paper was put out by Jaakko Häkkinen, then an early-stage PhD student: [1]Kantauralin ajoitus ja paikannus: perustelut puntarissa“. While no longer especially up to date (I will probably follow up on this claim in another post soon-ish, once one major paper in the works has come out in a future issue of Diachronica), this still remains a notable work that has turned out to be an impetus for quite a lot of discussion over the 10s and ongoing, on our basic assumptions about the early history of the Uralic languages. One of Häkkinen’s suggestions is to attribute some of the shared Finnic–Mordvinic vocabulary to a common southwestern substrate language. He outlines this on the basis of just six words that can be suspected to be of substratal origin per their semantics: three deciduous trees with a southern distribution (the word families of Finnish tammi ‘oak’, vaahtera ‘maple’, pähkinä ‘nut’ < *’hazelnut’ [2]), two species of high importance to agricultural societies (Fi. vehnä ‘wheat’, lehmä ‘cow’), and one innovative numeral (Fi. kymmen(en) ’10’), and which all also show novel phonotactic features: the word-medial consonant clusters *-mm-, *-kšt-, *-šk-, *-šn-, *-šm-, per him not attested in the Uralic comparative data reaching into the Ugric or Samoyedic languages. Häkkinen mentions also some more narrowly distributed substrate loan candidates with similar phonotactic features (e.g. with geminate nasals: Fi. konna ‘toad’, nummi ‘heath’; Northern Sami lidnu ‘eagle owl’, dápmot ‘trout’) that had been identified already in still earlier studies probing the possibly substratal vocabulary of Finnic or Samic in particular. But as far as I can tell, the idea of a common substrate vocabular layer extending also further east to Mordvinic, partly even Mari and Permic, was a new key innovation.

Increasing phonotactic complexity towards the (south)western end of Uralic is quite apparent really as soon as you pay attention to the topic. Already in one of my earliest posts on Freelance Reconstruction in 2013 I outlined the branch-level distribution of the clusters *šk, *kš, *kšk and *kšt across the Uralic comparative material. Heavy emphasis on Finnic, Mordvinic and Mari, but also not the northwestern Samic, is immediately evident. So there probably should be quite a lot of material that might be attributable to this “Agricultural Substrate” if we went looking for it in detail. 2014-ish I started collecting some additional data on this, taking particular semantic fields as my starting point. Before this reached sufficient completion though, a few other publications already ended up paying more attention to the same vocabulary stratum. I first saw Ante Aikio’s take, in a preprint version of his article “The Finnic ‘secondary e-stems’ and Proto-Uralic vocalism“. This singles out the consonant *š already by itself as a marker of vocabulary of possibly substratal origin (with 25 examples given; about 10 of them not otherwise phonotactically suspect) as well as proposes 9 other cases more on the basis of general phonological irregularity. As he had worked already earlier extensively on the Samic substrate in Northern Finnic and the pre-Uralic substrate of Samic, perhaps some of this was discovered independently though… Aikio only refers to Häkkinen’s paper passingly, not as a main inspiration.

Before Aikio’s paper officially coming out in 2016 [3], another version still was also outlined by Mikhail Zhivlov in a small conference paper “Неиндоевропейский субстрат в финно-волжских языках“, which identifies 20 items, likewise on the grounds of phonotactic novelties, the general presence of *š and some phonological irregularities; with substantial overlap with Aikio’s list. Taken together, these were already about as much I had assembled too, and I haven’t done much more on my draft since. Not much else seems to have happened on this topic in the late 10s either.


Last fall however, Carlos Quiles, an archeology/genetics/linguistics blogger at Indo-European.eu now seems to have put together a somewhat more substantial review of this and also some other data relevant for Uralic linguistic archeology, in a series of about ten blog posts starting here. This is nominally aimed more at locating the Proto-Uralic homeland — though it is easy to notice that Quiles relies mostly on secondary sources so far, and seems to miss a decent amount of relevant basic data in his chapters working more towards this goal. E.g. already the section on fishing technology is missing at least *sopśə ‘net needle’ and *tulkV ‘dragnet’; perhaps because these are traditionally identified as “Proto-Finno-Ugric” (only found up to Khanty in the east) and thus absent from earlier sources attempting to apply linguistic archeology to Proto-Uralic specifically. I also wonder about some geographic claims like Udmurt supposedly being spoken within the range of the Siberian pine. Probably today if we count migrant dialects further east and/or planted Siberian pines, but to my knowledge it’s certainly not native to Udmurtia (not even most of Komi Republic).

A full review of this whole topic would be a more involved question than I want to go into on the blog though, and anyway I am also not highly impressed by the overall precision of linguistic archeology as a method. It works just fine for ruling out places like the Circum-Baltic, the Arctic coast or the Caucasus as the Proto-Uralic homeland, but finer details like the long-standing debate on Volga-Kama versus Western Siberian homelands don’t seem like they can be easily resolved. At least two reasons conspire to make further progress difficult. One, if a language family starts off as (a part of) an only slowly expanding or even in situ diversifying dialect continuum, we might have trouble distinguishing “common Family” vocabulary from true proto-Family vocabulary. If any newly incoming vocabulary avoids hitting all the earliest isoglosses within the family, or is etymologically nativized across them, it may end up gaining a wide distribution and an appearence indistinguishable from native. Cases like the common Algonquian calque ‘firewater’ for ‘whisky’ that can be identified as much too recent on cultural grounds are just the tip of the iceberg here. Others could include cases like Proto-Finnic *lohi ~ Proto-Samic *lōsë ‘salmon’, which happen to fall into the outlines of Uralic comparative phonology just fine and would point to a common proto-form *lošə. Both are probably instead more recent loans from Baltic, either independently or in Samic thru Finnic; thus so even of they did really go back to this form in both lineages. From some language pairs like North Estonian ~ South Estonian (last common ancestor ca. 500 BCE), or indeed dialect pairs like Western Finnish ~ Eastern Finnish (LCA ca. 500 CE), with heavily parallel and mutually reinforcing trajectories of historical development up to today, we could probably find examples of this type by the thousands. (I call this phenomenon “convergent parallel loaning” and hope to one day treat it in more detail than just the one presentation in Finnish from 2016 so far. Cf. also Häkkinen’s spin on this under the name “invisible convergence“.)

I also consider it probable that our efforts on Uralic reconstruction so far on many points stops at the common Uralic stage, maybe especially in vocalism, not quite yet reaching Proto-Uralic proper. This is evident when attempting to reconstruct the proto-forms of several core vocabulary items, e.g. ‘heart’. West Uralic (Samic, Finnic, Mordvinic) suggests *ćüdäm(ə); Udmurt /śulem/ suggests *śedämV; Komi /śëlëm/ suggests *śädämV; Ugric suggests *śiďVmV or even *śijVmV; Samoyedic *säjä suggests *śäďä or *śäjä. We have no especially good way to explain most of this kind of “proto-variation” or to decide which of any of these variants might be the most original (of course at least the vowel difference between Udmurt and Komi is likely to be recent). The suggestion first made by Zhivlov that traditional PU *ś comes from an earlier *ć that was preserved in Samic, but replaced in areal vocabulary by a new *ć in Permic and the three Ugric branches, is probably right at least though. “*ś” is then basically a Common Nonwestern Uralic (maybe even just Nonsamic Uralic?) but not the proper Proto-Uralic reconstruction. (On structural grounds the same proposal has been made earlier also by at least Janhunen and Abondolo.)

Two, linguistic archeology cannot even in principle pinpoint an origin outside of a family’s current or historical range. Under the basic assumptions behind linguistic archeology, any terminology for e.g. natural realia exclusive to an “external homeland” would have to be either lost or repurposed in all descendants. This would even hold if one of the daughter lineages ended up re-entering the original territory. (Northern Sami speakers moving to Helsinki are not going to magically recover the lost but presumably once extant Proto-Samic words for things like ‘maple’ or ‘eel’.) Suppose for the sake of the argument that Uralic first expanded in a northward fan from someplace around the southern end of the Urals, near Orenburg or Magnitogorsk; southeast of the current range of Permic and Mari, well south(west) of the current range of Mansi. What kind of vocabulary evidence would we even expect this to leave, as distinct from an already originally more northern homeland?


But I believe that’s enough said for now on attempts to locate Proto-Uralic (again, watch for the upcoming issues of Diachronica for news on this). Going back to the Agricultural Substrate, Quiles identifies four semantic areas which would show prominent influence from this:

  1. tree names and related botanic terms;
  2. apiculture;
  3. agriculture;
  4. metallurgy,

In terminology related to animal husbandry and textileworking he gets together a few possible examples too, but contrasted with a more substantial number of loanwords from Indo-European.

I agree with most of these assessments as well. The one exception is apiculture, as the words actually comprising this layer (*mekšə ‘bee’, *metə ‘honey’, *śišta ‘wax’; unreconstructible #käras ‘honeycomb’ [4]) all have good Indo-European / pre-Indo-Iranian etymologies, unlike the vast majority of the others, and the cases of *š appearing in these can be well derived by RUKI. Even if *š might be often a marker of the Agricultural Substrate, this does not imply that all cases have to be so, and in particular this does not provide reason to abandon well-established loanword etymologies coming from actually attested language families. By a similar argument, I am likewise unconvinced with trying to reinterpret words like *šiŋərə ‘mouse’ (with regular reflexes in all three of Hungarian, Mansi and Khanty) as having anything to do with the Agricultural Substrate. The key motivation for setting this hypothesis up in the first place has after all been the highly limited distribution of words of certain semantic categories or with certain phonetic features. If we start including occasional etymologies that reach also Ugric or Samoyedic, we can no longer maintain the original explanation for why other words of this layer do not do the same (i.e. that the Agricultural Substrate was never in contact with these branches of Uralic). This indeed would come close to abandoning any reason for treating this layer as non-native in Uralic in the first place!

An additional issue that I seem to notice at this point is that, out of the possibly substratal cases of *š, quite few also occur in RUKI environments. The cluster *kš is particularly prominent: *makša ~ *mäkšä ‘rotten wood’, *päkšnä ‘linden’, *wakštVra ‘maple’, maybe *päkškV ‘hazelnut’ and *tekškä ‘ear of corn’ (surfacing as *šk ~ *kš vacillation). There is also a phonologically similar though clearly non-IE *š after *ŋ in *jaŋša- ‘to grind’, maybe also behind *riŋəšə ‘threshing ground’. Examples of *ks or *ŋs also do not seem to occur. I suspect that this points to the Agricultural Substrate actually coming to Uralic second-hand, and that it was instead first adopted into an extinct para-Balto-Slavic and/or para-Indo-Iranian language that, as expected per general Indo-European dialectology, regularly retracted *s to *š at least after velars; including in words that it had earlier adopted from the Agricultural Substrate proper. This hypothesis gives us also some more wiggle space in identifying the substrate in the archeological record: even archeological cultures that were probably Indo-European-speaking could be considered as the source.

Speaking of the ultimate identity of the substrate, Quiles has an interesting new suggestion on this, too: he seems to have found parallels for a number of the involved words in the West Caucasian language family, and attempts to sketch ways it could have been in contact with Uralic. This I think would be worth further exploring. Some more data to this effect might be also findable from Bernát Munkácsi’s 1901 monograph Árja és kaukázusi elemek a finn-magyar nyelvekben. While Uralic–Indo-European loanwords studies have been an extensive and productive field for long, on the topic of Uralic–Caucasian comparison of almost any flavor this remains just about the most recent even halfway serious overview. — Directionality, however, is not obvious to me. As Quiles notes, the WC ~ Uralic parallels center on technology and metalworking terminology. It seems to me they could be well explainable, besides pure accidental resemblance, also as a set of recent Wanderwörter, or parallel loanwords from a lost common source. There is thus barely any evidence yet to speak of a West Caucasian substrate language specifically.

By now I would have also more detailed comments on numerous individual etymologies proposed to belong in the Agricultural Substrate by one researcher or the other. This task will be best left for another time however, in many cases maybe also for another context entirely, and I might return to the topic only after having gotten more of these forthcoming etymological etc. observations out to print individually. Substrate languages are a fascinating topic, but they really are not highly feasible to tackle head-on: they emerge only from the dark corners of linguistic reconstructions, generally identifiable more by what is absent than by what is present.

[1] While Häkkinen continues to be active in our field and has a lot to say especially on the topic of the relative and absolute chronology of Uralic languages (recently e.g. coauthoring an article on Southern Sami with Minerva Piha in the latest Sananjalka), his PhD though unfortunately still remains unfinished.
[2] Part of the Finnish / Swedish grouping jalopuut, jalot lehtipuut / ädellövträd ‘noble (broadleaf) trees’. Other generally agreed members include the elm, ash, linden, beech and hornbeam. This might be convenient to calque into English too. Delimiting it in a context wider than just the Nordics has some difficulties though… would we only accept species whose distribution overlaps with the taiga zone at least within gardens, ruling out the likes of plane trees; and would we follow the main practical motivation of the term and rule out softwood broadleaf trees like the poplar?
[3] Nominally regardless claiming to be in the 2015 issue of Suomalais-Ugrilaisen Seuran Aikakauskirja. I wonder how often these kind of delays, between when a periodical is dated and when it actually comes out, are due to printing queues and how often due to actual editing issues.
[4] Mordvinic *käŕas, Mari *käräš, Udmurt /karas/; none of these can be native as such. The Mordvinic and Udm. words show a ⁽*⁾front vowel in the first syllable plus a ⁽*⁾back vowel in the second (PU unstressed *-ä- > Udm. /e/), and such disharmonic vowel combinations always result from either recent derivation or recent borrowing. The Proto-Mari vowel *ä then is non-native entirely. Probably mostly likewise for those cases of pre-Permic *ä that end up retracted to /a/.

Tagged with: , , , ,
Posted in Commentary, Methodology

A Century Late on Proto-Finnic sibilants

There are broadly two commonly seen ways of thinking about progress in science. The first is the “naive” Science Marches On narrative where we have ever-increasing aggregation of solid Results; the archetype is mathematics, where results indeed stay around as long as they’ve been established once, but a good part of the natural sciences today follow this as their main narrative as well (for no lack of reason, I feel). The second is the Kuhnian succession-of-paradigms narrative where most of the time scientists can go around aggregating results, but ever once in a while some basic assumption is declared to have been wrong, quite a lot of stuff ends up discarded and work is started over. Hence even the results we continue to accept still not should be thought of as unchanging truths but to be rather more temporal, provisional even. The archetypes for this seem to come from the humanities, where theories of how to understand even the main forces of history or literature or psychology still seem to be in quite a bit of flux and views are often split between battling schools.

In historical linguistics, as really in most even vaguely empirical sciences, we clearly have aspects of both around. Etymology and reconstruction generally turn up ever more results as time passes, though some individual results occasionally turn out to have been built on sand. We are lucky to have avoided drastic paradigm shifts though: there clearly do not exist any examples of things like language families that were first set up in detail and later abandoned entirely. [1]

These two attitudes have also similarities, not just differences. Above all, both are forward-looking: they hold that science is something that continues to be done and will have something new to say ten, hundred, probably a thousand years from now (no matter if built on top of or beside the things it says today). Another alternative yet exists as well though — the “golden age” narrative, according to which knowledge is not created (anymore?): it is or has been already out there, and what we can accomplish amounts to either preserving or rediscovering it. “Nothing new under the sun” & its restatements in various forms (probably this sentiment is itself ages-old too).

In fields like Uralistics, with a “long-and-thin” history, occasionally this also rings true. To quote here my colleague Niklas Metsäranta in the foreword of his recent PhD thesis (English translation mine):

“The best aspects of etymological research are doubtlessly those fleeting moments, when, while reading dictionaries, the stars align and one notices or at least thinks of having noticed a new connection between words, that no one has noticed before. Occasionally the initial buzz turns to disappointment though, when upon more careful browsing on etymological references one realizes to not have found anything new, but to only have brushed up an old dusty comparison advanced already by E.N. Setälä or Yrjö Wichmann.” [2]

Metsäranta’s work in Mari and Permic etymology has indeed a lot of preliminaries and precedessors around for it in the late 19th and early 20th century. Most progress in Uralic etymology in the second half of the 20th has not come from extending the corpus of comparisons, but rather from trimming it down, trying to find which parts of it are actually reliable and which of them might have other, better explanations, e.g. as Indo-European loanwords. This issue has been particularly obvious during the work that led to my recent paper on a sound change *i > *i̮ in Permic, which consists almost entirely of the rehabilitation of old etymological comparisons, most of them rejected later on for one reason or the other (but generally without any detailed critique). Only time will tell if this idea will lead to any all-new etymologies, too. Probably yes however, if the numerous 21st century works that again seek also entirely novel Uralic etymologies are anything to go by, I already cite also one applicable new etymology from a preprint by Aikio after all.


The early pioneers of Uralic of course did not just work on etymology. The development of general Uralic historical phonology shows also a similar broad outline: a “brainstorming” phase pre-WW1 eventually turning into a “consolidation” phase post-WW2. Here the situation seems also much more precarious in the details, really. There are several major etymological dictionaries out there by now, all household names to the historical Uralicist (FUV, SKES, KESK, DEWOS, TESz, UEW, YSS, SSA…). Most early etymologies worth consideration have been caught by at least one of them, even if not necessarily concluding in their favor. [3] By contrast studies of historical phonology have remained more data-driven / less literature-driven. Small details can be often re-derived as needed as long as their underlying etymologies remain known, sidelining credit from their first discoverers; or, also, they may end up forgotten entirely.

Getting finally to the topic my post’s title, about five years ago I sketched an observation about a distinct reflex of Proto-Finnic *c in Karelian. This is quite noteworthy in that *c is the first new phoneme to be added to Setälä’s 1890s reconstruction of Proto-Finnic that seems likely to stick, first properly consolidated as recently as by Kallio in 2007. Here we would then have evidence that this has not been retained only in the previously marginal South Estonian (its proper importance to Finnic reconstruction was not realized before at least the 70s) but also in the long-researched Karelian. There is quite a bit of noise in the Karelian data though, e.g. due to secondary affective affrication and some evident dialect mixing in the complex reflexes of *s. I wouldn’t blame earlier generations for not catching this idea.

But caught it has been. Earlier this year I noticed yet another old journal relevant to Uralic studies to be available online by now: De Monde Oriental, published in Uppsala from 1906 to 1947, still turning up regularly in bibliographies thanks to several contributions from K. B. Wiklund. The early issues are by now in the public domain and can be found at least in part in the archive.org collections. I usually follow up these kind of finds by taking a brief look over the contents of the back issues in general. Vol. 6 from 1912 turned out to contain an article from one N. Moosberg (not a previously familiar name to me at all), “Om utvecklingen af samfinskt s i den ryskkarelska dialekten in Vuonninen”. This contains pretty much exactly my observation, just more than a century earlier already: while North Karelian (in his article: just from the village of Vuonninen in the parish of Vuokkiniemi) reflects *s as /š/ by default, it also maintains instances of /s/ that cannot be explained by any regular secondary conditioning factors. In particular, this holds for the assibilated reflex of *t before *i, where we today reconstruct *c per the South Estonian evidence. Moosberg too concludes that the result of this assibilation must have been a consonant distinct from plain *s. I don’t know what to make however of his suggestion for a “probably more spirantic sound” (“troligen mera spirantiskt ljud”) — should this be read as suggesting something like a nonsibilant *θ?

Moosberg’s primary data behaves also more cleanly that what I was able to scrape together. In particular he finds *c > /s/ just fine also in kaksi ‘2’, kuusi ‘6’, kyⁿsi ‘nail’, uusi ‘new’, varsi ‘shaft’. Several preterite stems like kokosi ‘collected’, läksi ‘left’, löysi ‘found’, makasi ‘lay (down)’, tuⁿsi ‘felt’ are also adduced, some of these are even confirmed by the KKS data, while I did not look into the topic at all. My own preliminary suggestion that (*uc, *rc >) *us₂, *rs₂ > *us₁, *rs₁ (> , ) could of course still hold for some other varieties upon closer investigation, but I am now less trustful.

The other typical position where we (at Helsinki at least) now reconstruct PF *c is the cluster *cr. Moosberg notes a reflex /s/ in these as well, but he follows E. N. Setälä’s influential reconstruction with *str and is unable to treat this as the exact same sound change, instead assuming a distinct cluster change *str > *s₂r. Yet, also these cases still have had an affricate reconstruction advanced for them early on as well. This I believe was first proposed by Frans Äimä in a 1921 article in Virittäjä; as I found out already a bit sooner after my previous blog post, in the spring of 2018. Äimä in fact refers to an outright palatalized pronunciation with [źr] or [śr] from the dialects of Rugajärvi, Jyvöälahti and partly Tver. This has not been recorded in the macrophonemic transcription of Karjalan Kielen Sanakirja, but aluckily, scans of the original field records are already partly available too and they do show this unexpected palatalization: Rj. aźrain, keźrä, Pistojärvi aśroan, Tver ḱeźŕä, ildaḱeźro (the latter still there also in 1958) and even Vuokkiniemi keśrä (1956). Äimä also builds here on a suggestion made slightly earlier by Ojansuu (Karjala-aunuksen äännehistoria, 1918 [4]) to reconstruct *st > *ts > *ćć just for Karelian, but takes a step further and proposes a very modern-looking reconstruction *tsr already for Proto-Finnic. Unfortunately, it appears that no one has before now brought their proposal(s) together with Moosberg’s. The nascent discussion on what exactly to reconstruct behind the correspondence NKrl sr ~ SKrl, Ludian–Veps zr ~ Western Finnish hr ~ EFi and southern Finnic *tr simply seems to have been dropped post-WW2, with overviews defaulting to Setälä’s *str almost up to the present day. Even the current reconstruction with *cr is still not highly prominent really, being proposed by again Petri Kallio merely in a lengthy footnote #9 of his 2012 article “The Prehistoric Germanic Loanword Strata in Finnic“. A bit more visibility seems to be warranted here, and I would propose introducing the name “Moosberg’s law” for the North Karelian retention of /s/ from *s₂ < *c.

These finds taken together do not amount to merely rediscovering lost earlier wisdom, but the flavor is certainly there, and it’s hard not to wonder what other small but potentially crucial notes on Uralic historical phonology might be already out there, theoretically available to the reader but not roadposted by any modern back-references. [5] Considering the issue I have, in fact, considered starting work on a Uralic analogue of N. E. Collinge’s 1985 monograph The Laws of Indo-European, or maybe first some more limited analogue similar to e.g. Nathan W. Hill’s 2011 paper “An Inventory of Tibetan Sound Laws“.

[1] The closest is maybe the defunct Ural-Altaic hypothesis, and its succession in the Altaic wars on the other hand (restricting the family by the exclusion of Uralic and perhaps other parts), the Nostratic hypothesis on the other (widening it by the inclusion of e.g. Indo-European, Kartvelian and Yukaghir). All early defenses of Ural-Altaic are however obviously sketchy and often admit as much. There are no systematic reconstructions of grammar or phonology or lexicon, only take-it-or-leave-it collections of parallels, many of them by now reinterpretable as areal or typological rather than genealogical; and hence not strictly speaking abandoned as such.
[2] “Etymologisen tutkimustyön parhaimpia puolia ovat epäilemättä ne ohikiitävät hetket, kun sanakirjoja lukiessaan tähdet asettuvat linjaan, ja sitä huomaa löytäneensä tai ainakin luulee löytäneensä yhteyden sanojen väliltä, jota kukaan muu ei ole ennen huomannut. Välillä ensihuuma muuttuu pettymykseksi, kun tarkemmin etymologisia sanakirjoja selailtuaan tajuaa, ettei todellisuudessa olekaan löytänyt mitään uutta, vaan on tomuttanut esiin vain jonkin vanhan pölyisen jo E. N. Setälän tai Yrjö Wichmannin esittämän rinnastuksen.”
– Seconded on the buzz as well, which you might get a glimpse of from my previous post.
[3] The biggest remaining gaps are probably among words not found in the four “key languages” to have been covered by dedicated etymological dictionaries already in the 20th century (i.e. Hungarian, Finnish, Komi and Khanty). Newer etymological or etymologically-minded comparative dictionaries exist also for Estonian, Mordvinic, Mari and Selkup at least, but these do not pay much attention to early literature.
[4] Earlier in the shorter overview “Karjalan äänneoppi” (1905; p. 30) Ojansuu still follows Setälä in positing one-step *str > *sr.
[5] A search in my digital literature collection indeed turns up zero references to this article of Moosberg’s, only a handful of mentions of his other work on Ume Sami.

Tagged with: , , , , ,
Posted in Commentary, Reconstruction

How to (not) report a lack of etymology: Samic *keaðkē

I have been having a simmering discussion with commentator “M.” under the post on what’s important for what in historical Uralistics. One general topic there that I keep pushing hard back at is the idea of “etymology unknown” as anything like a fallback explanation or default hypothesis. This is not a hypothesis at all, it is the absense of one. At the worst it might end up being elevated to a curiosity stopper, an excuse to not keep looking.

At the same time, I want to still stress that this doesn’t mean that anything at all, any kind of nonsense thrown out, makes an acceptable etymology. I’m already on record in favor of more attention being paid to “anti-etymologies“. “Etymology unknown” sometimes really is what should be reported. But I think that this is essentially always too little detail by itself and should be combined with telling what, exactly, is it that we have ruled out as not being known. Basically no language on Earth is at a point of etymological research so widely practiced and thoroughly scoured that we would have grounds to assume that “etymology unknown” means actually having exhausted all possibilities. Words reported as “etymology unknown” in some sources have new good etymologies coming out for them all the time, sometimes even from older literature that was neglected by the compilers of the reference work in question. They will keep coming too, if my own backlog of unpublished etymologies is anything to go on on.

So what should it look like when a word’s etymology really remains firmly unknown, not just underresearched? For an example, let us consider Samic *keaðkē ‘stone’.

Step one: check the semantic equivalents in all known relatives and main contact languages. In the meaning ‘stone’, we can find clear non-cognates in all reasonable directions:

  • Most Uralic languages reflect Proto-Uralic *kiwə. There is some phonological overlap here (initial *k, front vocalism) but the correspondence *ðk ~ *w seems unbridgeable without massive speculation. *ea ~ *i doesn’t have any good precedents for it either. It’s not literally impossible that these could be some day solved, especially as long as no traces of *kiwə are otherwise found in Samic, but for the time being this is a non-match.
  • Samoyedic reflects instead *pəj, an even worse phonological fit. *ðk ~ *j would be actually regular (< PU *ďk?), but this observation conflicts with the proposal to treat the Samoyedic word as cognate with Finnic *pii-kivi ‘flintstone’, both if reconstructed back to a separate PU root *pijə, or if treated as a semantically and phonologically divergent reflex of PU *piŋə ‘tooth’ (> Finnic *pii ‘tine’), e.g. by back-formation from the same or a similar compound, plus irregular lenition of *ŋ. [1]
  • Per Nikolayeva, Yukaghir has *kïj ‘stone’ (plausibly ~ *kiwə [2]), Kolyma Yukaghir also /pē/ ‘rock, big stone’ (plausibly ← Samoyedic), Tundra Yukaghir also /jeďi/ < ? *jenći ‘stone’ (no idea about the etymology of this), all still nowhere near *keaðkē and now also way off geographically and genealogically, hence a priori weaker than anything found in languages securely known to be related to Samic.
  • Germanic reflects *stainaz; Baltic reflects *ákmō and Slavic *kamy, both going back to PIE *h₂akmon- whence also e.g. Sanskrit áśman. No chance here for a loan from any known non-Uralic language of Northern Europe, no evidence for an ancient Indo-Uralic archaism either.

In known loanword sources a bit further off, we could try looking more into Indo-Iranian, where words for ‘stone’ seem to diverge quite a bit. A quick trawl thru Wiktionary nets at least Persian and Balochi /sang/, Pashto /kāɳaj/, Kurdish /bird ~ berd/, Wakhi /wurt/, Ossetic /dur/, Hindi, Kashmiri etc. /pattʰar/… but again none of this initial haul really gets us any closer to Samic.

Step two: check for morphological analyses. For words that don’t look like basic word roots, this probably should be step one. There is something that can be done here too though: *-kē < *-kA is a widespread Uralic nominal suffix, and we probably shouldn’t stress too much if this in particular fails to correspond in an otherwise decent cognate. Still, a shorter #keað- (suggesting pre-Samic #keð-) does just as poorly among the non-cognates above. We also don’t have anything within Samic that would particularly point to such a division. The most phonologically similar words reconstructible for Proto-Samic are *(s)keaðē- ‘temple (of head)’ and *kiðë ‘spring’, both semantically miles off from ‘stone’. In more narrowly distributed words from Northern Sami I can find geađđat ‘amicable’, geađđi ‘dimness’ (+ Lule skädot ‘to dim (of eyes)’, Skolt ǩieđâš [3]) which don’t help either. Relaxing phonological similarity even further allows reaching a different substance term *čëðë ‘coal’ (< PU *śüďə), but even allowing for irregular *č > *k would not suffice to set up any morphological relationship. Unless we are also wrong about the development of PU *ü(-ə) to PS *ë(-ë), and this somehow first merged with PU *e-ə rather than the phonetically expected *i(-ə)? If so, then we might consider *śüďə > *ćeðə > *ćeð-kä > *čeaðkē > ? *keaðkē. But I feel a semantic shift ‘coal’ > ‘stone’ remains nonsensical despite a vaguely shared semantic field. A connection between these meanings probably should rather start from something more generic like ‘nugget, pellet, grain’. Even ‘small stone’ perhaps, but that would be a poor match with ‘stone’ just in a supposedly derived Samic reflex vs. ‘coal’ all across Uralic.

Step three: check for phonological matches and see if their semantic difference can be bridged. We have done some of this already in the previous step. Looking more widely for PU roots, even of the very rough shape *k + front vowel + *d/ď again fails to turn anything good though. Besides ‘spring’ (with cognates in Mordvinic) our options are *keďə ‘skin’, *käďwä ‘female; ermine?’, *küdV ‘brother-in-law’ (unless the proposed Ob-Ugric cognates of Finnic *kütü are just divergent reflexes of *käləw ‘sister-in-law’), all again no-go. Germanic could be scanned as well, though for the time being I have no good resources for doing this thoroughly (anyone want to link me to a digital dictionary of Old Norse?). Balto-Slavic and Indo-Iranian we can probably leave aside, as there are no examples of Samic *ð or PU *d/*ď that originate from these.

Step four: check for semantic near-matches. This is somewhat harder to do rigorously. In recent times the CLICS database offers one handy tool at least: charts of typical colexification relationships between concepts in the world’s languages. Their concept map for STONE provides us with the rough gist that the options are limited. So far the only attested colexifications are with ‘mountain’, ‘egg’, ‘hill’ (mostly in Pama-Nyungan) and ‘seed’ (mostly in Austronesian; Finnish kivi as ‘pit of fruit’ might count too). Only the first, as observable already in e.g. English rock, has substantial amounts of evidence backing it.

However, it turns out that we are now in luck! PU *muna > PS *monē ‘egg’ is right out, and no PU or PS word for ‘seed’ is known at all. The proposed PU words for ‘mountain’ or ‘hill’ number a handful, and the best-attested cases like *wärä (> Samic *vārē) or *mäkə are also way off. But one less firmly attested example is *kaďV — continued in Hungarian hegy and a Samoyedic word family that might reconstruct as *koəjə (if we take Nganasan †koaja as recorded by Castrén as representative and not a later derivative from something shorter). This turns out to match well indeed with the morphological analysis *keað-kē that I have already hypothesized above, and the two root consonants match regularly. The vowel development *a > *ea is not the usual one, but can be tentatively explained: this turns up in Samic also in other cases before palatalized consonants, especially syllable-final ones, including *kaća > *keačē ‘point, end’, *kaććV- > *keaččë- ‘to look’, *laśkV- → *leaškō- ‘to pour (out)’, *waćara > *veačērē ‘hammer’ (cf. Finnic *kaca, *kacco-, *laskë-, *vasara); and perhaps the common Uralic Wanderwort *waśkV > *veaškē ‘copper’, back-vocalic also in Finnic *vaski, Mari *wåž, Hungarian vas, Khanty *wăɣ (but then front-vocalic also in Mordvinic *viśkə, Permic *-veś, Samoyedic *wäsa). The conditioning of this probably could use more research though.

Regardless it seems we can, after all, propose an etymology: PU *kaďə ‘(rocky?) mountain’ > early pre-Samic *kaď-ka ‘rock (object)’ > late pre-Samic *keďkä > *keðkä > Proto-Samic *keaðkē ‘stone (substance)’. A very nice result I feel, for explaining such a basic vocabulary item that has so far gone unetymologized! [4]

At this point I must emphasize that this result was not pre-decided. This etymology does not come from my above-alluded stash of unpublished discoveries. Right up to looking up the CLICS concept map, I was laboring under the assumption that *keaðkē indeed is a word of unknown etymology; certainly that’s the only thing I’ve seen reported for it, and certainly it also fits my typological expectations of substrate vocabulary (which is, in the absense of features like consistently recurring phonetic irregularities, generally fairly unknowable speculation in the case of any one particular word). And yet it turns out … if we just diligently explore the options, instead of worshipping our ignorance and writing words off as “unknown-therefore-unknowable”, a lot of the time we can make progress on their etymology. Wir müssen wissen, wir werden wissen.

Probably there would be indeed words where the four steps above are still insufficient for putting together an etymology; then again it would be possible to sketch out also a few further steps. And I think I have demonstrated regardless not just an apparent etymology for *keaðkē after all, but also, how and why the first few directions that we could think of for seeking its etymology do indeed fail.

[1] A hypothesis that would work decently here is that first *iŋ > *iń, which is not contradicted by any data (is nonprovably regular) and is within Uralic even paralleled by Permic *piń; followed by regular *ń > *j in most reflexes. Only Selkup really conflicts with this. — The reconstruction of *ə seems unclear too (actually given by Janhunen as *ə¹ = *ə/*å). Only the correspondence Nganasan /hᵘalə/ < †fala ~ Nenets *pæ points to this, while we have a seemingly preserved /i/ in Kamassian /pʰi/ and Mator hilä, and a close vowel also in Enets /pū/ < †puj ‹пуи›. Maybe some of these could even reflect a heavily contracted *pijwə < PS *pińwə < pre-PS *pińkiwə < PU *piŋə-kiwə (with loss of *k from a secondary cluster *ńk, but intervocalic *w preserved in a no longer posttonic position)?
[2] Considering the main etymology I discover here, another possibility could be to derive this thru some flavor of Samoyedic #kVj ‘(rocky?) mountain’.
[3] Related to Germanic *skadwaz ‘shadow’ somehow…? The front vowel seems like a poor match, though.
[4] One further phonologically interesting feature in this is that the Samic-specific fronting *a > *e seems to take place earlier than the common West Uralic depalatalization *ď > *d (or > *ð). I’m not concerned though. This seems to be proven as an areally-spread change already the fact that also Mari shows *ď > /ð/ while differing from West Uralic in showing *d > ∅. Actually in principle nothing rules out either that palatalization before *ď was more widespread, since we lack Finnic and Mordvinic reflexes, but I don’t see much benefit in this assumption over the previous.

Tagged with: , , ,
Posted in Etymology, Methodology

nyolszáz, kilenszáz

Recently when tracking a variety of citations back into early literature, I was directed to Zsigmond Simonyi, 1901: “Az Ábel-féle szójegyzék” (Nyelvtudományi Közlemények 31: 225–227), an article reporting the corpus of a small Hungarian–Italian phrasebook from 1438. One point that caught me by surprize were the words for ’80’ and ’90’. These are written as gnalsase ~ gnalzase, chilansase ~ chilanzase. These are clearly not quite the modern Hungarian words nyolcvan, kilencven — they look like they instead contain száz ‘100’ as the last member. The article does not give much commentary in general, but this is indeed noted. Simonyi thinks they are simply mistranslations and stand for the Hungarian words for ‘800’, ‘900’. However, the phrasebook at other times renders Hungarian /ts/ as z or ç; nyolc ‘8’ is gnauz, harminc ’30’ is armiz ~ armiç. kilenc ‘9’ is indeed written as chilens, but I don’t think this would represent a failure of the Italian author to distinguish /ts/ generally, perhaps just after /n/. So why then gnalsase?

It can be noted that etymologically nyolc and kilenc do contain old morpheme boundaries: they’re constructed on the general Uralic pattern of ‘8’ and ‘9’ as 10−2 and 10−1, and their shared final -c represents a contracted reflex of tíz ’10’. I think this might be happening here in a different way. That is, the words indeed do not have an affricate, and would be nyolszáz, kilenszáz if projected to modern Hungarian. They are also not to be read as 8·100 or 9·100, but rather, as subtractive constructions 20−100 and 10−100; “two (decads) before 100” and “one (decad) before 100”. Perhaps this idea is known already in Hungarology, but of course tracking references forward in time is much more difficult than backward in time. (Google has nothing for nyolszáz, kilenszáz, but then these are modernized spellings by me. Honti’s 1993 monograph on numerals in Uralic I do not have on hand to consult.)

Also the word for ‘100’ is itself given as tissase, seemingly standing for ‘ten hundred’ (tízszáz). However the word for ‘1000’ (mod. ezer) is still given separately as esere, so I don’t think this represents a translation error either. My guess would be that the phrasebook’s Hungarian informant spoke a dialect where this archaic-seeming model of ’80’ and ’90’ was pleonastically extended to ‘100’ as well.

[added 2021-07-04] Novel words for ’80’ and ’90’ would not feel terribly out of place also since Hungarian shows a wide variety of strategies for forming decads anyway. ’20’ is a separate word húsz (in 1438 usso ~ us), which has cognates in Ob-Ugric, Permic and Mordvinic; ’30’ is harminc, with a suffix -inc that has been compared with Permic /-mɨs/ in ‘8’ and ‘9’ (though I would think the –c is again from ’10’ with similar contraction later, and this means that the nasal could also have some different origin entirely [1]); ’40’, ’50’, ’60’ have a suffix -van ~ -ven (negyven, ötven, hatvan; in 1438 negiuem ~ neguieun, ethuem ~ octauen and otovan ~ otouem) that is normally compared at least with the decad endings in Komi (/-mɨn/) and Mansi (/-mən/).


Worth mentioning while I’m at it: the original point that led me to Simonyi’s article is that this phrasebook is apparently one of the last sources (maybe the last source?) that still displays retained word-final vowels in Hungarian, as we already see in sase, esere for modern száz, ezer. The former could in principle be an orthographic device to indicate voiced /z/, the latter however seems patently genuine: it can be contrasted with hamor and not anything like **hamoro for mod. hamar ‘soon’. This seems to be another sign that the Hungarian informant spoke a nonstandard dialect. To my knowledge, 1400s Hungarian codices otherwise no longer contain any trace of the word-final short vowels as they appear in the earliest Hungarian texts from 1055 (the Tihany abbey charter) and the 1190s (Halotti beszéd…). Also, while these two early sources seem to reflect a word-final u for several consonant stems of modern Hungarian, in the phrasebook this is more typically now an o; a front-harmonic equivalent e is also well attested. One word recorded at both stages of development is the adjective ‘big’: HB nogu > 1438 nogio > mod. nagy. This of course is just the same change as the lowering of Old Hungarian u to modern o (when from Proto-Uralic *u and probably standing for a short reduced /ʊ/) as also found inside word stems. Some cases of seemingly unlowered u still appear too though, e.g. burso ~ borth ‘pepper’ > mod. bors; harum ‘3’ > mod. három. Probably the reflex of Old Hungarian *ʊ was at this point still a high-mid-ish vowel [o̭] that was partly heard as /u/ by the Italian author of the phrasebook (when adjacent to labials? this would kind of parallel the modern Northern Mansi spelling of unstressed [ə] as у before labials, as in e.g. хӯрум /χūrəm/ ‘3’).

There is also evidence of consonant-stem nominals already, such as aram ‘gold’ (> mod. arany), assem ‘woman’ (> mod. asszony) (m for word-final /ń/ appears to be regular in the phrasebook for some reason), bor ‘wine’, fos ‘penis’ (> mod. fasz); nevet accusative of ‘name’ (but leginto acc. of ‘young man’ > mod. legényt, napotu acc. of ‘day’ > mod. napot). A possibility that these bring to mind is that word-final vowels may have been already regularly lost in words of some shapes such as *CVRV or *CVCVNV, while most retained cases in the phrasebook seem to follow obstruents; or a consonant cluster in embre > mod. ember ‘person’, olno > mod. ón ‘tin’. The available corpus of data is lamentably small though and I would also not rule out that some words like bor (coming via Turkic from Middle Persian bōr) simply were always consonant stems in Hungarian.

[1] Even *harm-van-c > *harmanc with a cluster simplification *mv > m could be worth considering, but this would leave -i- very mysterious.

Tagged with: , , , , ,
Posted in Etymology

Details of some vulpine words in Uralic

A recent open access paper by half a dozen Leiden Indo-Europeanists: Palmér, Jakob, Thorsø, van Sluis, Swanenvleugel & Kroonen, “Proto-Indo-European ‘fox’ and the reconstruction of an athematic -stem” presents a very thorough analysis of various core IE words for medium-sized carnivores (h/t Languagehat). The main conclusion is that these constitute two etyma rather than just one: *h₂lop-eḱ- ‘fox’ ≠ *wl̥p-i- ‘wildcat’ (surely not **ulp-i-?), even though some reflexes of the latter do end up with the meaning ‘fox’, namely Latin vulpēs and Albanian dhelpër. The latter has been included here thru a dissimilation *v > dh / _V(C)p (another tally to the already lengthy list of Weird-Ass Albanian Sound Changes™, but the other mentioned examples dhampir ‘vampire’ and dialectal dhespër ‘evening’ do look watertight to me).

The paper includes also a lengthy digression on loanword reflexes of the former etymon in Uralic. Despite the unusually-large-for-linguistics author team however, none of the writers seem to be Uralic specialists. They have had some good help on this at least; Petri Kallio has been thanked for consultation and Sampsa Holopainen’s 2019 thesis treatment of these loanwords is also referred to repeatedly. I would still add a few details to the account of the Uralic data though, as they seem to illustrate several novel or less-known phenomena in phonology and morphology.

1. Finnic

Palmér et al. start their discussion of Finnic by asserting a back-harmonic Proto-Finnic **rpoi behind North Finnic *repoi. I however do not see any grounds for this. Second-syllable *o was neutral with respect to vowel harmony in PF; key data for this phonological interpretation comes from two corners of the southern part of Finnic language area, where we still find even an explicitly disharmonic vowel comination ä–o in languages that otherwise follow vowel harmony. The first is Votic, showing e.g. tšäko ‘cuckoo’ (< *käkoi), pääsko ‘swallow’ (< *pääskoi), sälko ~ śalko ‘foal’. Note that these also cannot be explained as later loanwords, since their cognates in North Finnic do end up re-asserting harmony (Fi. käkö, Ing. käkö(i), Krl. Lud. kägöi; Ing. pääsköi, Livvi piäsköi ~ piätšköi; Fi. sälkö, Krl. proper šälkö ~ šäľgö). Secondly this vowel combination has been retained also in South Estonian. Besides pääsokõnõ ‘swallow’ (no reflexes of *käkoi, *sälko), cf. at least näio ‘maiden’ from PF *näito(i) (> core Finnic *neito(i) > Fi. Vt. neito, Ing. neitoi, Krl. Lud. ńeidoi, Veps ńeidō) and räbo ‘junk’ ~ Est. räbu; also Fi. räp-eä, Krl. räp-äkkä, Veps räb-ed ‘brittle’ (different derivatives but affirming original *ä). Vt. repo and also SE rebo ‘fox’, neglected in the paper, can be therefore taken to directly continue PF disharmonic *repoi.

The “clipped” derivation *rebäs → *repoi is certainly unproblematic: this is very typical for *oi-diminutives in Finnic, already found among the oldest examples such as *jänis ‘hare’ → *jänoi > NF *jänöi ‘bunny’, *kaunis ‘beautiful’ → *Kaunoi ‘name of a cow’, and perhaps (the semantics seem off) *talas ‘platform, shed’ → *taloi ‘house’. In later, more localized examples we find all sorts of stem-final or even root material dropping off, like Ingrian hanoi ‘goose’ ← han[hi], Ludian ohtoi ‘thistle’ ← oht[ikaz], South Ostrobothnian Fi. Torstoo ‘name of a cow born on Thursday’ ← torst[ai] ‘Thursday’. [1] There is also some minor evidence of stem-final *-(a)s : *-aha- being reanalyzed as a suffix eventually at least, since we find it sometimes secondarily attached to native stems, e.g. Fi. lippa ‘overhang, visor, etc.’ → lipas : lippaa- ‘chest’. This leaves some space for an analysis similar to Hungarian (cf. below).

Other stem variants present two other problems, which to me appear to largely cancel each other out however. For one, while scarcely attested PF *rebäs could indeed regularly continue earlier *rebäś < *repäć(ə), to me this would not seem to predict an inflectional stem **repäh(e)-: there is no positive evidence that the early lenition *-s- > *-h- between unstressed syllables applied to secondary *s from palatalized *ś < *ć. I believe an explicit counterexample is at least the North Finnic conditional mood marker –isi-, which I would derive from pre-PF *-j-śə- < *-j-ćə- (*-j- from the imperfect stem); not from a suffix *-ŋćə- with an original nasal (the Samic potential mood marker *-ńće̮- I would consider to get its nasal from the PU potential mood marker *-nə-). For two, the authors note that forms like Estonian rebane could continue a diminutive *repäh-inen, but that Veps rebāńe does not quite support this, pointing instead to PF *repäinen. This is not a problem though if the PF paradigm of *rebäs originally did not have forms with *-h-! I would instead consider an earlier West Uralic *repäć(ə) first giving *repäś : *repäśə-, evolving by late Proto-Finnic into a paradigm *rebäs : *repäise-, with *-i- by palatal unpacking. The latter would then have been readily interpretable as the oblique stem of a diminutive *repäinen, motivated also by the fact that by far most bisyllabic nouns ending in *-s had either an oblique stem in *-hE- (if from pre-PF simple *s) or *-ksE- (if with the PU noun-deriving suffix *-ksə).

A similar reshuffling of an unalternating *s-stem into two different paradigms seems to have taken place also in the other example where we can clearly reconstruct a noun ending in pre-PF *-Ać(ə). This is the word for ‘male pig’: Fi. oras ~ orainen ~ oraisa, Krl. orattšu, Veps oraž(a-) ~ oratš(u-). These have their origin in West Uralic *worać(ə) ← Indo-Iranian *warādźa- (cf. Holopainen 2019: 313–314); whence also Moksha /urəś/, dim. /urəź-i/ (with voicing alternation pointing to a pre-Mo. consonant stem *oraś : *oraśə-). From this it seems to me that “reconstructing forwards” would yield PF *oras : *oraise̮-; the first form of these then later gaining an analogous inflected stem *oraha-, the second an analogous nom.sg. *orainen. This last-mentioned form would have been further folk-etymologically interpretable as a derivative of ora ‘awl’, leading to the creation of two further variants ora-isa, ora-ttšu.

Tangentially, I think this mechanism also explains the two different shapes of the word for ‘crow’ in Finnic: Fi. Ing. Krl varis (: varikse-), Lud. Veps variž , Livonian vaŗīkš ending in *-is, versus Est. and SW Fi. vares (: Est. varese-, Fi. varekse-), Vt. varõz (: varõ(h)sõ-), SE varõs (: varõ(s)sõ-) ending in *-e̮s. While words for ‘crow’ display a wide variety of different suffixes across Uralic altogether — e.g. Erzya /varaka/, Hungarian varjú, Southern Khanty /wărŋaj/ (< pseudo-PU ? *wara-kka, *warV-ja, *warV-N-woj) — evidence for a suffix with *-ć- can be found in both Samic (*vōre̮ć) and Mordvinic (? *varśəŋ > Er. /varćej ~ varśej ~ varkśij/, Mk. /varśi ~ varći/). It would seem to be possible to reconstruct already a common West Uralic *warə-ć(ə). From this I would expect to see in PF a paradigm *vare̮s : *varise̮-, again with clean depalatalization syllable-finally vs. palatal cheshirization medially. The stem was then maybe reworked to *varikse̮- already early on; there seems to be no evidence for a reanalysis as a diminutive **varinen (maybe avoided due to the crow being a relatively large bird).

2. Samic

The Samic reflexes do not receive a separate discussion in the article. The main question raised is if a suggested Proto-Samic *reapēš should be considered a recent loanword, and if so, where from.

At least the suggestion of *š being a substitute of North Karelian š in a lost **reväš seems anachronistic to me. PS dates to ca. 2500 BP, the shift of *s > š in NKrl. to ca. 1000 BP at the earliest, if taking place right around the split-up of Old Karelian. The distribution of this variant of the word in Samic (South thru North, with no Eastern Samic reflexes) does not match with a Karelian origin either, either old or more recent. Examples that Palmér et al. bring up of the type PS *še̮lmē ‘eye of an axe’ ~ PF *silmä ‘eye’ (~ inherited PS *če̮lmē ‘eye’) or PS *še̮ltē ~ silta < PF *cilta ‘bridge’ (← Baltic), where PS *š seems to continue a Finnic *s, probably represent mostly an allophonic palatalized realization of Proto-Finnic *s as [sʲ] when adjacent to *i. To me the simplest loan source would therefore seem to be the Finnic inflected stem *repäise- (whether or not it already had *repäinen as its nominative). The suffix *-ise- indeed later regains phonemic palatalization in quite many Finnic varieties, already so in Karelian and Eastern Finnish. This interpretation also accounts for the retention of *-p-, as in a hypothetical late loan from an unattested NKrl. **reväš we’d probably expect reflexes like Lule Sami **rievij rather than the attested riebij.

3. Permic

Following Holopainen, Palmér et al. consider Permic *rući̮ an independent back-vocalic loan. — For a preface before continuing, I write the vowels here as they would be in the classic reconstruction of Itkonen and Lytkin; the paper instead follows Zhivlov’s most recent sketch of Proto-Permic reconstruction in reconstructing *roću̇, on which suffice to say I am not especially convinced of it. I do not wish to get bogged down in details of PP vowel reconstruction schemes here though, as I agree with the point that Komi /u/ would regularly reflect a PU non-close back vowel *a/*e̮/*o and not front *e. This would be itself a sufficient reason to not derive PP *rući̮ from the preform *repäć(ə) indicated by Finnic, Mordvinic and Mari.

The authors however also advance the claim that medial *-ć- should have been voiced and that therefore a preform with a geminate is required, along the lines of *rApaćća. I believe this is an overreach. An underappreciated fact of Permic historical phonology is that word-medial lenition only fully applies post-tonically! The best-known examples of the development later in a word are the possessive suffixes: cf. Komi-Permyak 2PS /-ɨt/, 3PS /-ɨs/ << PU *-(n)tə, *-(n)sa, and the ordinal suffix: KP /-ət/, Udmurt /-et/ << PU *-mtə, which remain voiceless (with secondary voicing of *t in Zyrian Komi /-ɨd/, /-əd/). The possessive suffixes do end up as /-ɨd/, /-ɨz/ in Ud., possibly originating e.g. as positional variants after secondary stress; but in any case note that despite voicing, we do not find this feeding into further lenition *-d- > *-ð- > ∅ as is the fate of root-medial *-t-. Some derivational suffixes show this same development too, most clearly the adjectival suffix /-ɨt/ << PU *-ətA, as in examples like Ud. /peľmɨt/ << PU *piďm-ətä >> Fi. pimeä ‘dark’; perhaps also the adjectival suffix Ud. /-eś/, K. /-e̮ś/ (from PU *-ća?). There also appear to be examples among the few trisyllabic word roots that can be reconstructed for PU, such as K. /rɨnɨš/ < PP *ri̮ŋi̮š ‘threshing ground’ < PU *riŋəšə, PP *ľaŋes ‘birch bark vessel’ < PU *ďäŋäsə. [2]

Lack of voicing of the affricate in *rući̮ is therefore no problem even if going back to something like *rApaća, borrowed already roughly from Proto-Indo-Iranian. We do need to date it as younger than the deaffrication *ć > *ś that is represented in oldest II loans like late common Uralic *ćarwə > *śarwə >> PP *śur ‘horn’, though. This “new” *ć that survives into modern Permic probably also should be able to continue not just a PII *ć but also a slightly later Proto-Iranian depalatalized *c. Permic has never had a native dental affricate, and even some early Russian loans into Komi end up substituting ц as /ć/ (IIRC including in nonpalatalized positions, but I don’t have a list of these readily around).

4. Hungarian

In Hungarian, ravasz ‘cunning’ (OHu. ‘fox’) and róka ‘fox’ represent additional clearly independent loanwords. Following Holopainen, who in turn follows early less assertive suggestions by Sköld and Joki, we can easily agree that at least the former is likely to come from later Alanic, insted of by any kind of ad hoc backing development from *repäć(ə).

I would indeed also rule out an early loan with PU *s. The example of fészek ‘nest’ < PU *pesä is not really itself well-explained enough to make a precedent for retention of *s as sz /s/. The only real suggestion that has been advanced for this is a somewhat ad hoc blocking of *s > *h before a word-initial fricative f-, which is itself not clear without knowing how early *p- > f- is exactly, nor does it not strike me as clear if voiced -v- or slightly earlier *-β- could be assumed to have had the same effect as voiceless *f-. There is one seemingly exact parallel to this dissimilation, fasz ‘penis’ ← PII *pásas (a loan etymology re-defended by Holopainen, 185–186); but this has an apparent Samic cognate *pōče̮, pointing to PU *ć and not *s, which IMO leaves also the loan etymology uncertain. For this word I would actually not even entirely rule out the suggestion of Rédei, who in one of his last papers [3] suggested relatively recent loaning from an archaic but unattested Old High German reflex *fas; which is certainly at a disadvantage though since only a derived reflex, in OHG fasal ‘offspring’ (> modern German Fasel) seems to be actually attested in Germanic, and with not much trace of the meaning ‘penis’.

(I have also wondered if all this is maybe barking up the wrong phoneme and fészek should not be segmented as fész-ek, but rather fé-szek; where the second component could then perhaps represent a reduced reflex of szék ‘chair, seat’, cf. in Indo-European nest << *ni-sd-os ≈ ‘down-seat’. However this is not quite matched by the oblique stem fészke-, demonstrating that also the nominative singular continues earlier *fészk < *fēskĭ. Dialect forms such as fécek with an affricate might be an additional problem, though really equally also for any proposal that sz < PU *s.)

Back to foxes though: for modern róka it is indeed easy to analyze -ka as a diminutive suffix added to an earlier *raw-. This would on first look seem to represent similar “clipped” derivation as Finnic *rep-oi. While this is not the typical application of -ka in Hungarian, there are still examples, say JóskaJózsef, this usage perhaps motivated by the homographic and “homophonological” (even if not exactly homophonic) Slavic diminutive -ka. But I do like Palmér et al.’s proposal via reanalysis: ravasz would have been analyzable as containing the rareish suffix -asz and would have allowed *raw-ka to be formed by suffix alternation instead. [4] If I’m not mistaken, most examples of -asz and also the front variant -esz are nouns though — and the phonologically closest match is maybe tavasz ‘spring’ — so dating this change specifically after the shift ‘fox’ > ‘cunning’ in ravasz does not strike me as necessary at all. For that matter, this might be also too late for root-medial *aw > ó to be operative even analogically anymore, since the sense ‘fox’ is still attested for ravasz as late as 1403, ‘cunning’ only from about there on out.

Postscript: ‘Wildcat’ in Uralic (?)

After finding the Indo-European ‘fox’ borrowed, thru Indo-Iranian, directly or indirectly into half a dozen Uralic branches (including also relatively straightforward reflexes in Mordvinic and Mari that I don’t comment on specifically here), it is interesting to note that probably also *wl̥pi- ‘wildcat’ seems to have made the leap. These are the Samic and Finnic words for ‘lynx’: PS *e̮lpe̮s (narrowly distributed: North albbas, Lule albas) ~ PF *ilbes (pan-Finnic: Es. SE. Fi. Krl. ilves, Vt. ilvez, Lud. Veps ilbez, Liv. īlbõks). This time, retained *l and front-vocalism seem to point towards Baltic (Lith. vilpišys) and not Indo-Iranian. Only the loss of *w- would readily create a problem.

To my knowledge the comparison of these with Indo-European remains unpublished, but I’ve heard it from a couple of colleagues (for the time being please do not cite me on this). Its first public presentation might have been by Mikko Heikkilä at the 2017 conference Contextualizing historical lexicology — narrowly missed by Kroonen, who was scheduled to participate but IIRC had to cancel entirely. I’m not sure if his proposed routing thru an additional Uralic substrate in the northwest is at all necessary though. If the word was originally loaned as *wülpəs/š- or the like, the Samic word would reflect this entirely natively (*wü- > *ü- feeding into *ü > *i > *e̮ is known also in *wülä- > *e̮lē- ‘up, above’ — possibly the only word in Samic that retains a trace of PU *i/*ü contrast). In light of the apparently rare suffix -iš- in Lithuanian, final *-s maybe more likely continues earlier *-(k?)š, which again regularly gives Samic *-s, but would be expected to give **-h in Finnic. (One of Palmér et al.’s two other examples of this suffix is takišys ‘weir’, whose preform has also been borrowed into Finnic as *tokəš > *toge̮h > Fi. Ing. toe (: tokee-), Vt. tõgõ, Es. tõke ~ tõge, Liv. to’ggõd; or, since apparently there’s no good IE or even Balto-Slavic etymology, is it perhaps a loan from (pre-)Finnic into Baltic instead? [5])

A Samic loan already into Proto-Finnic would be unexpected though. All known words of Samic origin in Estonian have made it there late thru the mediation of Finnish, and I don’t think any are known in Livonian at all. The same is the case for “Language X” of some supposed non-Samic and non-Finnic hydronyms however, which (by the current evidence) seems to have arrived in Karelia / inner Finland / Sápmi via a northwestern route, not thru the Baltic. It’s also not clear to me why wouldn’t Finnic have simply borrowed the word itself from Baltic straight away? when it’s commonly thought that even Baltic loans in Samic were mostly mediated by early Finnic.

Word-initial *wi- does in general survive in Finnic, e.g. *viici ‘five’, *viimä(-) ‘end’, *viska- to throw’, which at first seems to weigh against direct derivation from IE. However I wonder if there could simply have been a conditional loss here. Proto-Uralic is known to have lacked word roots of the shape *PV(C)PV with two bilabials *m, *p in consecutive onsets; and also *w…m seems to lack any good examples for it. (Note that PF *viimä is a derivative with a contracted long vowel < *wiŋə-mä; ditto for e.g. *vaima ‘heart’ < *wajŋ(ə)-ma.) Perhaps by early Finnic, this constraint was then further extended to *w…p. This sequence still occurs natively in PU *woppə- ‘to observe’, but then *wo- simplifies to *o- in Finnic anyway, indeed also Samic, Mordvinic and in most cases Mari. Thus it does not seem out of the question to me that we have simply an early Baltic *wilpi(k)ši- borrowed into pre-Finnic as *ilpəksə. It would be also possible to then treat the Northern and Lule Sami words as (earlyish?) loans from Finnic rather than archaisms dating already to pre-Proto-Samic times. But this remains a hypothesis that could still use parallels, especially since there are also some Baltic loans into Finnic that do retain *w…p or similar pairings, e.g. *virpi ‘branch, rod’.

[1] For loads more examples, see e.g. Rapola, Martti (1920), “Kantasuomalaiset pääpainottomain tavujen i-loppuiset diftongit suomen murteissa“.
[2] I would suspect that this general point has been made before, but offhand I can only find partial statements, e.g. the classic Uotila, T. E. (1933), Zur Geschichte des Konsonantismus in den permischen Sprachen only really discusses the case of PU *t (p. 92 on).
[3] Rédei, Károly (2005), “Szófejtések 351–358“, Nyelvtudományi Közlemények 102.
[4] Another point that I also could believe to have been made before already, but I’ve not gone digging into the Hungarian literature.
[5] There is one nominally compatible PU root that could be considered as a source for this: *čoka ‘shallow, dry’; weirs are best built in relatively shallow rivers. This is reflected only in Samic and Selkup though, the latter reflex also being *če̮kə- ‘to dry’ rather than expected *čwe̮kə(-), which leaves almost everything in this comparison not especilly compelling.

Tagged with: , , , , , , , ,
Posted in Commentary, Reconstruction

Koibal Addenda

In the recent years, Tamás Janurik has been releasing online numerous papers, small surveys and reference materials on the Uralic languages, particularly Samoyedic and Hungarian (all mainly thru his academia.edu page). Last week the roster has been joined by what seem like two particularly notable works: Kamassz szótár and Kojbál szótár, two “doculectal-comparative” dictionaries that aim to arrange together and morphologically analyze all currently available lexical material on these extinct Samoyedic languages. Despite titles and introductions in Hungarian, the bulk of both dictionaries actually use German as their main metalanguage. Conveniently (if not for anglomonoglots), basic glosses are also provided in no less than three languages: German, Hungarian and Russian.

The haul is respectable: 1456+114 word groups for Kamassian and 570 for Koibal (with Russian loanwords in Kamassian listed separately from the “native Siberian” word stock [1]). A comparison that easily springs to mind is with the etymological lexicon of Helimski’s Die matorische Sprache, documenting 1134 word groups across all varieties of Mator, and at least the Koibal dictionary might reach similar status as a standard lexical source. For Kamassian there still remain unpublished archive materials though, some already from the main field researchers Castrén, Donner and Künnap. Given their close relationship, in principle it might be also a good idea to eventually arrange all Kamass–Koibal material in a single etymological database or the like.

So far I’ve been poring over the Koibal data and its etymological remarks. Going back to the original sources of Spasskiy and Pallas (and also cataloguing their later appearences especially in the works of Klaproth), Janurik turns out to identify a good couple dozen more Koibal cognates for Kamassian and other Samoyedic languages than are listed in earlier reference works. No more than four of these lack Kamassian equivalents altogether, though: from Spasskiy корламъ ‘to ask’ (PS *kå-), пысва ‘rotten’ (PS *poså- ‘to rot’), тугуламъ ‘to gnaw’ (PS *t¹okɜ-); from Pallas chailàn ‘gull’ (PS ? *kələjə). This could be though in part due to how Janurik does not seem to propose any entirely new Proto-Samoyedic roots, and limits himself to adducing new Kamass and Koibal reflexes for previously known ones. This still leaves a good number of unetymologized vocabulary awaiting further research. All these are now at least well identified and collected together. Janurik employs an admirably detailed scheme of marking each word group with an etymological code: P1–P5 for words that seem native to some extent within Samoyedic, L1–L3 for post-Proto-Samoyedic loanwords, XX for entirely isolated words. The distinction between his layers P1 (Proto-Uralic) and P2 (Proto-Samoyedic) is not quite up to speed on 21st-century research, but this is a minor detail here. Similarly I wonder about at least the naming of his group P3 (Proto-South Samoyedic), when it is Janurik himself who has presented one of the clearest arguments against assuming such a subgroup. [2] But it is certainly of some value to distinguish Kamass–Koibal words with and without northern Samoyedic cognates, as the latter e.g. might be more likely to turn out to be areal loanwords rather than actual common inheritance.


The newly identified cognates so far already provide food for thought anyway. For a simple example, the aforementioned chailàn ‘gull’ seems to be slightly off compared to the earlier PS reconstruction, suggesting rather something like *kəjələ. A slightly better match in root structure could be actually UEW’s *kaja(-ka) ‘gull’; or, since PU *a > PS *å > Kamass–Koibal a is a minority development (normally *å > o, u) and incompatible with the potential Nenets and Selkup cognates that certainly require *ə, maybe the best solution would be independent formation after all from a mimetic root √kaj-.

A second bird name that leaves me thinking is Km. šēgə ~ Kb. сега ‘cuckoo’. This could be derived entirely regularly, together with cognates in Selkup, from PS *käkV. Clearly this is another old mimetic term, at least predating the assibilation of PS *k to *š; but how old exactly? Several compareable words for ‘cuckoo’ turn up again also further west in Uralic, including Khanty *käɣii, Udmurt /kikɨ/, Komi /kɤk/ and Finnic *käki (the first three reported but considered improbable in SSA). The medial consonants and vowel correspondences do not entirely behave though. At best Khanty and Finnic would point to *käkə, Samoyedic and Permic to *käkkä; or maybe Samoyedic and Khanty to *käkä. This all might not be fatal in a bird name; some of this could be reshaping to retain a more iconic shape for the word (whereas e.g. from *käkə we would otherwise expect *kä in Samoyedic). But then we could ask as well if this is not due to the words being independently formed; or borrowed even: the Finnic words have been often considered to be loaned from Baltic (cf. Lithuanian gegužė with a dialect variant gegė), though this remains uncertain too for similar reasons. — Really the entire distinction between “reshaping” and “independent formation” seems somewhat vacuous when dealing with words of this sort that have had an iconic motivation available all along. Quite likely Proto-Uralic did have a name for the cuckoo that was something like #kVkV, but if this has actually survived in an expected regular shape anywhere would have to be guesswork. [3]

Next up, the case I find the most interesting are the Kamassian and Koibal words for ‘son-in-law’. I’ve already noticed earlier that the former would go well with a hypothesis I have on the reconstruction of this word in Proto-Uralic, and Janurik’s newly adduced Koibal cognate seems to support the idea further. Actually even the Kamassian cognate has not appeared in etymological references earlier as far as I can tell. This is not a major surprize, since the form is malmi, quite far from either SW’s Proto-Samoyedic reconstruction *wiŋə or UEW’s Proto-Uralic reconstruction *wäŋe.

The first key to this puzzle is provided by Kamassian alma ‘dream’. Nominally this comes very close to Ugric forms for the same (e.g. Hungarian álom : álmo-), and UEW goes as far as to support a wild proposal of a loanword from Khanty. Janhunen in SW however suggests a different solution. Within Samoyedic a clearly different root can be reconstructed for ‘dream’: *äŋwå, and the Kamassian word could be derived from this via assimilation–then–dissimilation, *ŋw > *ŋm > lm. Such a sound change series would already provide more grounds for comparing malmi with PS *wiŋɜ (note also that *w- > *b- > m- before a word-internal nasal is a known regular sound law). The Koibal cognate identified by Janurik comes in at exactly this point: we find here the form манмемъ (most likely an 1PS possessed form ‘my son-in-law’), suggesting that also this instance of Km. /lm/ has indeed evolved from *ŋm. I would not be certain on if this should be taken as still containing /ŋm/ however (thus Janurik) or, as it can be read prima facie, /nm/. This latter could be still archaic with respect to Kamassian of course, i.e. in more detail we would have *ŋm > /nm/ > /lm/. (The other possible routing I guess is *ŋm > *ɫm > /lm/, slightly more awkward since there seems to be no reason to assume a distinct velarized *ɫ at any point in the history of Kamassian.)

Where would this word-internal *m < *w come from then? I suspect it has actually been there all along. For one, we already have various forms like Finnish vävy and Mator mijüh (миюгмэ) pointing to some kind of an original labial element near the stem vowel, which has already led to newer reconstructions along the lines of PU *wäŋəw(ə) rather than bare *wäŋə. [4] For two, the Samic reflexes of this word shows a long-standing minor problem: they indicate Proto-Samic *vivë, with a seemingly Finnic-like development *ŋ >> *v. I would suggest that this issue is due to incorrect segment alignment: that Samic *v does not continue the original 2nd-syllable *ŋ, but instead the 3rd-syllable *w, and original *ŋ has been instead lost to a vocalization process of some sort. If correct, this would show direct evidence for a reconstruction *wäŋəwə (i.e. ruling out anything like **wäŋü with a labial vowel in PU already), making the PU shape of the word actually a relatively good fit at least for the consonant skeleton of Kamassian and Koibal. I could even suggest reconstructing for PU a morphophonologically alternating paradigm, with a vowel stem *wäŋwə- (> Samic, Km–Kb) : consonant stem *wäŋəw- (> Finnic, Nenets, Nganasan etc.); though this is motivated also by some other considerations that would take us fairly well afield from the current topic.

There is definitely still room for skepticism about this however, and in particular the vowel correspondences continue to be quite irregular: in the first syllable, none of PU *ä, PS *i and Kamass–Koibal a regularly corresponds to each other, while in the 2nd syllable, Km. -i ~ Kb. -e most typically continues PS / PU *-ä, not *-ə.

So far I have not started any systematic investigation of the entirely unetymologized Kamassian and/or Koibal vocabulary remaining. However, for closing, one simple observation on this front: kuro- ‘to be angry’ (in both Km. and Kb.) probably continues PU *kurə ‘anger’.

[1] i.e. native Samoyedic words, Turkic and Mongolic loanwords, and all vocabulary of unknown origin.
[2] Janurik, Tamás. 2012. Volt-e a déli-szamojéd (PSS) alapnyelv?Per Urales ad Orientem. Iter polyphonicum multilingue: 145–162.
[3] A further complication still is the potential Mator cognate / reflex: géihe in Pallas, кига in Müller, per Helimski suggesting PS *-jk- rather than plain *-k-. However the precedent of PS *äjmä ‘needle’ > Kamassian ńīmi ~ Koibal неме would maybe then seem to predict ˣšīgə for ‘cuckoo’, and we are right back in not knowing which way irregular correspondences in iconic or onomatopoetic vocabulary should be interpreted.
[4] This final *-w(ə) is strictly speaking not segmentable, but it is probably originally the same formant as also in two other in-law terms: PU *käləw(ə) ? ‘sister-in-law’ and *nataw(ə) ? ‘brother-in-law’.

Tagged with: , , , ,
Posted in Commentary, Etymology, News

Analogy Is Not Phonology

While my blogging here has been firmly within historical linguistics, every once in a while I do go poking around self-styled formal linguistics blogs too. [1] This tends to be a frustrating exercise though. By now, supposedly deep problems discussed around such parts tend to strike me as, frankly, dumb questions that only exist due to particular “theoretical commitments”, and which could be trivially resolved or avoided within better-grounded frameworks of understanding language. People stuck in generativist bubbles in particular, however, seem to be often unaware that any other types of approaches would exist at all.

As I’m rather more informed about the ground-level facts of phonology than e.g. syntax, this is going to be the more profitable area for me to comment on in any real detail, though generative syntax has also struck me as having foundational flaws roughly analogous to the foundational flaws of generative phonology. (I presume open-minded syntacticians should be even able to figure out these, ahem, analogies themselves, without me having to do all their work for them.)

At any rate, a good majority of questions attracting protracted debate in phonological theory that I have seen are immediately solved under the traditional non-generativist approach: “phonological processes” or “deep structures” do not exist as such. They are only grammatographical shorthand; rules of thumb, not rules of Grammar. [2] Where non-allophonic “phonological alternations” actually exist is within the lexicon, not within phonology.

A standard counterargument to this seems to be the fairly simple observation that loads of obviously non-allophonic alternations are, in fact, still productive to this or that extent in loads of languages. Checkmate, lexicalists?

No, of course not. This simply shows a particularly pernicious systematic failure of generative linguistics — a lack of understanding of language change, particularly that language change, including linguistic creativity, does not take place solely inside a box of “Grammar”, but also within the lexicon. Phonological alternations are easy to approach in this fashion, as they are generally not actually productive in the sense of immediate, universal applicability (as they say in Generativistland, they can be opaque). Moreover quite typically they are “productive in spats”, creating new forms one by one, now and then in the speech of particular speakers, not everywhere constantly. And the range of applicability for any one process is very finite really: while everyone creates novel noun phrases practically daily, I would wager that most people do not create any entirely novel strong verb forms over their entire life. [3] In historical and historically-informed linguistics, our default assumption is to attribute these kind of changes to the process of lexical analogy, and understanding it is vital to understanding patterns that arise and exist in language.

What we can actually observe is that any arbitrarily deep alternations can indeed inspire the coinage of new instances of the same, and therefore they can remain “productive”. If desired, I can readily coin all sorts of cases like longlengthoblongoblength or singsungwingwung. But then nothing stops me from creating folk-etymological examples either, say choosechoicesnoozesnoice. These also fade organically into snowcloneish blends, e.g. thanks, antsthantshello, horseshellorses; spelling pronunciations, e.g. tentacles ∶ /ˈtɛntəkəlz/ ∷ Pericles/ˈpɛɹɪkəlz/; or (mis)etymological nativization, e.g. English wrong ∶ Swedish vrång ∷ En. to wring ∶ Sw. vringa. Crucially, what needs to be noted is that this is an extralinguistic cognitive skill that should not have any bearing on the development of purely linguistic theory. Already etymological nativization refuses to respect the confines of a single language, and I think most theories of mental grammar would likewise not attempt to account for spelling pronunciations. We can also easily advance loads of more or less formal analogies in areas that have nothing to do with language, from mathematics (2 ∶ 20 ∷ 5 ∶ 50; square ∶ cube ∷ triangle ∶ tetrahedron) to the natural world (nitrogen ∶ ammonia ∷ oxygen ∶ water; the Congo ∶ leopards ∷ the Amazon ∶ jaguars) and human society (evolution ∶ Darwin ∷ relativity ∶ Einstein; punks ∶ pop punk ∷ ravers ∶ happy hardcore). This, I think, demonstrates beyond reasonable doubt that analogy in fact is a general skill that humans possess, and hence there’s no point in trying to reduce its applications in language into some kind of specifically linguistic primitives.

(Note BTW that while all my examples above are phrased as classic proportional analogies, this also should not be assumed to be the only possible or even the main mechanism of analogy.)

Once we accept the existence of analogy as an explanation for some cases of morphophonological productivity, this provides also a direct path into rich gains in parsimony. My linguistic examples above have been chosen to be on the “clever” side, i.e. building on only marginal precedents, partly to be sure that they’re indeed novel (at minimum to me!), partly to make it seem more convincing that they should not be modelled by inserting additional epicycles into English (morpho)phonology. But the mechanism of analogy works perfectly well also on any pedestrian phonological alternations out there. What is, say, the plural of oblength? It’s clearly oblengths — but then we could model this conclusion as having been drawn purely on the analogy of lengths, or also tenths, shibboleths, Beths, etc., without needing to assume any distinct, exclusively linguistic machinery behind this. The putative outcome oblengthes, just like also morphologically clearly different options like oblengthim or oblengtha, can be predicted to be unlikely already due to the lack of bases of analogy that could lead to them. [4] That all sorts of other coinages also follow the same pattern could be likewise explained already by the extremely strong precedent for the English plural marker to be -s. In principle even the regular phonologically conditioned allomorphy between -s and -es could then turn out to be simply emergent within the English lexicon, if we enrich it with sufficiently many plural forms stored as lexemes. This approach allows cutting out a hefty amount of costly theoretical complexity assigned to phonology in theories that fail to recognize that analogy exists.

Spending one further moment within philosophy of science, there is certainly also an apparent countercost of presuming the existence of some words like lengths as separate from length (or sung from sing, etc.). However, given that lexicons already indisputably exist, and contain many, many thousands of items anyway (and that, given the phenomenon of suppletion, these indisputably can be syntactically specified as particular inflected forms, etc.), just a few hundreds more to “seed” it with generators of morphophonology should be unambiguously considered the superior solution. Extra stuff is free.


It would be indeed possible to go further still and to propose that e.g. even the realization of oblengths as specifically /ɒblɛŋθs/ with /-s/ (and not /-z/) will be inferred by analogy from other English plural forms. It’s hard to rule out that this could not be the case for some people. [5] But I do grant that this at least is not an approach that could be fully generalized. Analogy generally allows for multiple solutions, some of them perhaps much less probable but still possible (e.g. if we take a cube as a prism with a square base, not as a polyhedron entirely made of squares, then the triangle analogue will be a triangular prism, not a tetrahedron; and maybe it should be heorses /hɛəɹsɪz/ rather than hellorses). Allophony by contrast is, by all appearences, subconscious enough that speakers find it difficult to create or perceive forms departing from it, and it clearly calls for a different kind of cognitive machinery.

[1] That’s the {self-styled formal linguistics} blogs; what they call themselves is, apparently, just “linguistic blogs”, with the common if vaguely cultish stance that only their branch of work actually constitutes Real Linguistics.
[2] As far as I can tell, a lot of trouble indeed comes already from the failure to fully distinguish descriptive grammar from mental grammar. Much of the early history of morphology and syntax quite transparently consists of attempts to formulate rigorous definitions for concepts of traditional Greco-Latinate grammatography like “subject” or “word”, but with little attention paid on if this even should be done: a priori there is no reason to expect mental grammar to have any building blocks at all in common with traditional descriptive grammar (much like how, say, biochemistry is not under the obligation to follow any views of Aristotelean natural philosophy). Modern theory of phonological processes indeed also looks like as if it largely amounts to applying the same mistake ultimately to Pāṇini’s descriptive (morpho)phonology of Sanskrit, although the road from there to Chomsky & Halle is not clear to me.
[3]
i.e. “novel to English (or German, etc.) as a whole”. E.g. (a soup has been) wung might be a new creation for me just two days ago (‘prepared without a prior plan or recipe’, if you must know), but even before checking I am certain that others have stumbled on this same territory before. — Oh yes, no question about it: it’s even on Wiktionary already, with attestations going back to 1881.
[4] But, of course, not impossible. As e.g. advanced linguistics students faced with the wug test will readibly demonstrate, sufficiently large numbers of contrarian smartasses will eventually end up creating any form imaginable, no matter how “ungrammatical”. Almost nothing in language is actually impossible. This is perhaps the most clearly so when a phenomenon is “impossible” (rather, inacceptable) in one language variety but business as usual in another.
[5] Definitely not for me though. As an L2 speaker whose native language has no voiced fricatives, I ended up adopting the English plural marker(s) as just /-(i)s/ back in the day, and though I can by now make conscious effort to use [z] instead, I will be still quite content to speak of [windous], [siːliŋs], [hɑusis], [tʃʰiːzis], [nɑiʋs], [dɔgs], etc…

Tagged with: , , , , , , ,
Posted in Commentary, Methodology

Examples of reductive primary splits

On a whim I have started reading the Oxford Handbook of Historical Phonology. At about two and a half chapters in I have finally reached some discussion of practical questions in some detail, and the first claim to have struck me as empirically interesting is that “primary split can also reduce an inventory”.

For those not up to speed with or just not recalling the lingo (this is after all one of those terrible user-unfriendly terminological conventions along the lines of “type 1 / type 2 error“), I remind that a “primary split” is a conditional sound change that creates a sound (or rather, phoneme) already present in a system, contrasted with a “secondary split”, a conditional sound change that creates a sound not previously present. (I would advocate for using the more descriptive terms “split with merger” and “split without merger.) [1] They are distinct from a simple unconditional merger, or for that matter, from an unconditional non-merger. [2] Any particular change can fall under any one of these depending on the language.

My first thought were cases such as the fate of the labiovelar stops in Greek. Depending on their environment, these are reflected as any of the three “basic” stops (e.g. *kʷ > /p/, /t/, /k/; similarly for Proto-Greek *kʷʰ and *gʷ), and hence they ultimately disappear from the phoneme inventory. This kind of a situation does not seem to really show that primary split could eliminate a segment from a language’s inventory, though. Although any one of the changes could be stated conditionally, in reality one of the three changes must be the most recent chronologically — and at this point this change is then no longer a conditional sound change, but simply an unconditional merger. (I believe this status belongs to *kʷ > /p/. [3]) A similar sleight-of-hand could be really pulled whenever a sound eventually develops into multiple different reflexes: phonological inventories only offer a finite number of relevant environments, and even if there in fact is a default reflex, it can be also stated in terms of a set of particular environments. E.g. the development of PIE labiovelars in Indo-Iranian or Slavic could be stated roughly as “palatals before front vowels, velars before consonants and non-front vowels”, although only the former development is conditional, the latter instead unconditional; and indeed even feeding into, rather than independent of, palatalization before front vowels. (I.e. *Kʷ >> *Č / _E is, properly speaking, not a sound change but a sound correspondence, consisting of (at least) two sound changes: *Kʷ > *K followed by *K > *Č / _E.)

But it turns out the claim is actually something simpler. Proposed in a 2012 article “Primary split revisited” by Robert Blust, the idea is instead: if the segments involved are subject to positional constraints, afterwards it may be now possible to analyze either one of them as being now an allophone of something else. (He also passingly considers exactly the same example of labiovelars in Greek, with citation to a 2000 textbook by Sihler, but without noticing the flaw from chronology.) So the actual sounds involved do not disappear from a language’s phonology; they merely now end up in a complementary distribution, and the number of phonemes can be argued to have fallen. Certainly this should be possible.

Curiously, Blust presents this analysis as only a theoretical exercise, and ends up unable to propose any actual examples of the phenomenon. Google also tells me that Blust’s term “reductive primary split” still finds no additional hits out there. I take it upon myself to therefore offer a few examples.

1. Loss of *ŋ in Proto-Finnic

Proto-Uralic had an inventory of four nasals, *m *n *ń *ŋ. The Finnic branch has however reduced this inventory to just two, *m *n. [4] The fate of the palatalized nasal *ń has been simple, merging into *n (I believe with some vowel-coloring effects word-medially; but this is tangential to the point). The fate of the velar nasal *ŋ is more diverse. The most typical intervocalic reflexes are zero (with lengthening of the preceding vowel) and *v, presumably thru earlier *w; in consonant clusters, *Cŋ > *Cv, *ŋC > *uC, both presumably again thru *w. I would additionally posit an even earlier intermediate stage *ɰ behind both the zero and *w reflexes.

One exception to all this is found: the cluster *ŋk, surviving phonetically intact into Proto-Finnic and indeed into the modern Finnic languages. Phonologically looking, however, it would seem that there has been a change here as well. *[ŋ] cannot be reconstructed for Proto-Finnic in any other environment, and hence we now have reason to interpret [ŋk] as /nk/ (or if we really wanted, /mk/, or even /Nk/ with a neutralized placeless coda nasal). Thus the splits-with-merger *ŋ > ∅, *ŋ > *w and/or *ŋ > *ɰ have been reductive: even though they leave some instances of [ŋ] unscathed, */ŋ/ as a contrastive phoneme is still lost. All of this has been already noted at least as early as by Posti (1953).

This reductive primary split also in fact functions somewhat differently from Blust’s toy example. He suggests an example of a language contrasting /t/ and /s/ only before /i/, showing elsewhere only [t]; if, then, [ti] shifts to [si], the result will be the loss of this contrast — thus yielding /ti/ rather than /si/. In Finnic, it is however not the contrast *ŋ | *w that ends up lost; and what allows the final phonological reanalysis is not the earlier distribution of either of these consonants, [5] but rather the limited distribution of the “third wheel” consonant *n, which earlier did not occur in the position before *k.

2. Loss of *ɣ in Proto-Ugric

A reductive primary split that does function similarly to Blust’s example might be also found in Uralic. The current conventional reconstruction of Proto-Uralic includes a rare consonant *x, occurring only intervocally (and when followed by 2nd-syllable *ə, though this proves to be inessential to the point). Its reflexes across Uralic point towards a velar obstruent of some sort, though it does seem to have been distinct from *k, the other, well-established velar obstruent. We also find that the reflexes of *x and intervocalic *k indeed coincide to a large extent across Uralic. In some cases, the reflexes are inconveniently either zero (thus Permic, Mari, Samoyedic) or merged with something else still (thus Mordvinic). Here we cannot clearly rule out the option that it is *x that is first unconditionally lost or merged, followed by *k along the same trajectory only later. A merger to a distinct velar reflex *ɣ can be however found in the two Ob-Ugric language groups, Mansi and Khanty. The third Ugric language, Hungarian, has been proposed to also have passed thru a similar stage. If we suppose *[ɣ] was indeed the original sound value of “*x”, we would seem to have here exactly Blust’s situation: the contrast *ɣ | *k originally occurred only intervocally, and therefore the result of an intervocalic lenition of *k to [ɣ] will be counterintuitively phonemically */k/, not */ɣ/.

This situation would have been quite temporary, though: in all three Ugric groups, degemination of *kk probably introduced quite soon a new medial *k, leaving *ɣ again (?) a contrastive segment of limited distribution. At least the apparently parallel degemination of *pp, *tt however cannot be reconstructed for Proto-Ugric: it must be preceded by the lenition of medial *p and *t in Hungarian, which yield modern v, z, presumably thru intermediate *b, *d > *β, *ð. In Ob-Ugric by contrast *p, *t remain without lenition. Thus probably also *kk still remained in Proto-Ugric; and in any case degemination of *kk must postdate at least the lenition of single *k.

Of course a worse problem still is that the analysis depends on not particularly certain details of PU reconstruction. If *x was not *[ɣ] but something else, like *[x] or *[q], it would have been possible that it is intervocalic *k that first lenites to *[ɣ], after which *x simply unconditionally merges with this new allophone of *k.

But we can perhaps try again:

3. Loss of *ɣ⁽ʷ⁾ in Khanty varieties

The saga of *ɣ continues further in Khanty, with some rather similar development as in the previous case. From here on, the contrast with *k seems to be generally maintained (though we do find both of them giving /χ/ as a conditional reflex in Western Khanty). Instead it is the contrast *w | *ɣ that trends towards neutralization. One example could be found in Eastern Khanty, where intervocalic *w develops to *ɣʷ; and *ɣ splits, at least in the Surgut dialect group, to [ɣ] ~ [ɣʷ], the latter following most (but not all) Proto-Khanty labial vowels. We have some reason to consider the latter change older than the former: it is shared also with Western Khanty (with further *ɣʷ > /w/) and it could be reconstructed as an allophonic change already for Proto-Khanty. If so, *w > *ɣʷ in Proto-Eastern Khanty would be a reductive primary split: its result will be that *[ɣ] ~ *[ɣʷ] are now in a complementary distribution with word-initial *[w], and therefore they can be considered allophones of a single phoneme.

This situation, however, is not reflected as such in either of the two main branches of Eastern Khanty. In Surgut Khanty, mergers such as *ü > /i/ have now left /ɣ/ distinct from /w/ (= [w] ~ [ɣʷ]); in Far Eastern (Vakh-Vasjugan) Khanty, medial *p has been lenited to a new [w], while my proposed intermediate *[ɣʷ] has lost its labialization, likewise leaving /ɣ/ a clearly distinct phoneme from /w/. It would be also possible to suppose that *[ɣʷ] actually occurred in Proto-Eastern Khanty only as a medial allophone of /w/, and later [ɣʷ] as a reflex of *ɣ is an innovation of Surgut Khanty in particular, perhaps only at most areally connected with Western Khanty. Something like this is indeed suggested by Proto-Khanty roots of the shape *PÜɣ- (with a bilabial initial and a front rounded vowel) — these give /PIɣʷ-/ as expected in Surgut Khanty, but in several varieties of Western Khanty (Southern Khanty, transitional South/North dialects of Nizjam and Šerkaly), cheshirization to *ɣʷ > *w either fails to take place or is reverted, giving instead /PIɣ-/. Similarly, Proto-Khanty *-ăɣ- (probably with *ă being labial [ɒ̆]) gives Surgut /-ăɣʷ-/ but Western *-oχ-. Here too we don’t seem to have much evidence of a common Proto-Khanty development to *ɣʷ, and we should probably assume a separate labialization in Surgut (though something like *ɣʷ > *ʁʷ > *χʷ > /χ/ is at least theoretically conceivable).

The traditional reconstruction of Proto-Khanty (see e.g. Honti 1999: 75–77), actually goes even further and does not recognize distinct medial *w at all. In such a system, *ɣ would appear to have been an allophone of /w/ already at this point. This though implies positing a conditional merger *w > *[ɣ] already between Proto-Ugric and Proto-Khanty, which itself will be then a reductive primary split. — But I do find it preferable to assume that Proto-Uralic and Proto-Ugric *w was simply maintained as distinct all along in Western Khanty, especially since it seems to be possible to identify minimal pairs; one is Southern /sŏw/ < *sŏw ‘pole’ vs. /sŏχ/ < *sŏɣ ‘skin’.

———

I could probably think of several further examples of reductive primary splits in various languages — these have simply been the first three examples to come to my mind straight away. I can easily agree with Blust that perhaps this theoretical possibility has gone so far unrecognized due to an overreliance on just a few canonical examples mostly from Indo-European in discussions of the typology of sound change.

Literature

Honti, László. 1997. Az ugor hangtörténethez. Az ugor alapnyelv kérdéséhez: 31–39. Budapest.
Honti, László. 1999. Az obi-ugor konszonantizmus története. Szeged.
Posti, Lauri. 1953. From Pre-Finnic to Late Proto-Finnic. Finnisch-Ugrische Forschungen 31: 1–91.

[1] The “primary” / “secondary” terminology moreover seems to me to be kind of backwards. “Primary” splits appear to be unnecessary to assume as a separate phenomenon on the phonetic level at all, since it seems to me they can be always modelled as a series of two sound changes: a “phonetic secondary split”, followed by an unconditional merger of the newly created allophone.
[2]
I have not seen this fourth option identified often, but it seems to be appropriate for any sufficiently advanced “phonetic drift”, taking a segment so far off-field that it cannot be identified anymore with its original phonological value. E.g. although no conditioning or merger has been necessarily involved, it would seem to be not at all appropriate to characterize West and Northwest European [ʁ] or [ʕ] as either a trill or a coronal, despite its origin from earlier /r/ (of course, meaning those varieties where a guttural fricative is the typical realization — when we do still find intermediate [ʀ], we could at least argue that this is the target realization and [ʁ] realizations are merely speech errors).
[3]
The bilabials /p pʰ b/ are the result before a consonant, as well as before the “noncoloring” vowel /a/ and the “weakly labial” vowels /o ɔ/, i.e. environments where there is not much motivation for a conditional development. The dentals /t tʰ d/ are instead triggered by following front vowels /i e ɛ/ (assimilation), the velars /k kʰ g/ by a following or preceding close labial /u/ (dissimilation).
[4] The /ŋ/ encountered in modern Finnish is a later development, primarily by consonant gradation from *ŋk, later reinforced by loanwords. The native origin still leaving the interesting trace that singleton ˣ/-ŋ-/ remains foreign; only the geminate /-ŋː-/ is found intervocally. Similarly, /nʲ/ in modern Eastern Finnish, Karelian, Veps etc. arises by secondary palatalization, most widely thru apocope of *i.
[5] *ŋ does have a more-limited-than-average distribution in PU, being barred from the word-initial position. However, nothing in the analysis would change if we assumed that there did exist a word-initial *ŋ- that likewise changed to *w > *v in PF.

Tagged with: , , , , , ,
Posted in Commentary, Methodology

Some Recent Vogulology

(By current standards this perhaps should be “Mansilogy” or “Mansi Studies”, but “Vogulology” just has a good sound to my ear.)

1. Word-final vowels

This summer has seen the publication of the Festschrift Ёмас сымыӈ нэ̄кве во̄ртур э̄тпост самын патум [1] dedicated to our (i.e. of Finno-Ugric Studies at University of Helsinki [2]) professor Ulla-Maija Forsberg / née Kulonen. This includes my paper “Notes on Proto-Mansi word-final vocalism“, where I mostly focus on the somewhat elusive category of Proto-Mansi *ə-stems. These can be consistently directly distinguished from plain consonant stems only in 18th century Mansi records from assorted southerly dialects, but I argue that their former existence however leaves indirect evidence in a fairly large number of places.

  • They condition / phonemicize the rise of the new vowel length split in Central Mansi (as first recognized by Mikhail Zhivlov): originally long in open syllables, short in closed syllables, thus *CVCə > /CVːC/, but *CVC > /CVC/.
  • Coda spirantization of *k in Central Mansi takes place already before apocope: *CV(C)kə > /CV(C)k/, but *CV(C)k > /CV(C)x/, and *CVkCə > /CVx[ə]C/, but *CVkC > /CVːk[ə]C/ (probably *CVk[ə]C already to begin with).
  • Nasal cluster simplification also takes place already before apocope: in Southern and Central Mansi *CVNTə > /CVNT/, but *CVNT > /CVT/, affecting all nasal+obstruent clusters (in Southern further *CVŋkə > *CVŋk > /CVŋ/); in Northern Mansi only *CVNF > /CVF/, affecting only the nasal+fricative clusters *nč > *nš, *ńć > *ńś, and (though I ended up forgetting this from the paper) *ŋq > *ŋχ.
  • Conditional retentions: Southern Mansi *CEĆə > /CEĆiː/ (i.e. *ə > /iː/ following palatal vowel + palatal consonant); possibly Northern Mansi *CU(C)Cə > /CU(C)Ci/ (i.e *ə > /i/ following a close vowel = /u/ or /i/).

There are some complications to the first three lines of evidence, since they only affect / happen before coda consonants. They therefore create new morphological alternations in inflected stems, such as nom.sg. *pōt ‘pot’ : nom.pl. *pōt-ət ‘pots’ [3] >> Western Mansi /put/ : /puːtət/. Later on, these alternations have often been levelled out in favor of one “grade” or the other in individual dialects. This is probably the reason for occasional apparent irregularities such as *kōnt ‘backpack’ > Pelym /kunt/ (rather than expected ˣ/kut/), although I have not combed for them in detail. — This would really require also a discussion of the same changes in verb inflection and word derivation, where they can arise also depending on consonant-initial versus consonant-final suffixes. At least //NF// simplification in Northern Mansi is well-described in standard references already (e.g. Keresztes’ 1998 handbook description mentions the examples ľuuńś-i ‘weeps’, suns-i ‘looks’, χaaŋχ-i ‘climbs’ : ľuuś-səm ‘I wept’, sus-səm ‘I looked’, χaaχ-səm ‘I climbed’). Eichinger’s new grammar of Western Mansi (see below) recognizes all three of vowel length alternation, //NC// simplification and x ~ k alternation, the last interestingly in an inverted form from the historical derivation: stem-final //x// → k before a vowel-initial suffix. For the rest I would need to look up a variety of sources and see how much of this they recognize.

There would be also some implications whose discussion I have left for later work altogether. E.g. the loss of *-ə appears to leave Central Mansi /x/ in fact marginally phonemic in all varieties. However, it has been treated as only a free variant in some (chiefly Hungarian) works. The most notable offender is the UEW, where e.g. Pelym /kulx/ ([kuləx]) ‘raven’ is given as “kulk“; /ńoxʷs/ ‘sable’ is given as “ńoks“; /püxń/ ([püxəń]) ‘navel’ is given as “pükəń” (note also inconsistent treatment of schwa). Thus, there is a lesson here against trying to apply overly strict methodology to the segmental phonological analysis of poorly documented language varieties. The limited corpus of Central Mansi varieties may not have allowed finding minimal pairs, but this should not be taken as grounds to ignore the distinction entirely. This problem has come up before in phonological analyses of Ob-Ugric varieties as well. Other such cases include e.g. the status of labialized velars /kʷ xʷ ŋʷ/ all across Mansi, discussed already by Kálmán (1976) [4], the short vowels /e ɶ ɤ u/ in Eastern Mansi and the open rounded vowels /ɔ œ/ in Far Eastern Khanty.

2. Archival Mansi

Julia Normanskaja has in the last few years published reports and analyses of several archival materials of Mansi in the journal Ural-Altaic Studies (now added to my sidebar). The earliest came out in volume 19 (4/2015), covering a 1905 dictionary of the Pelym dialect as well as new 2013 field records on the Middle Ob and Jukonda dialects — the latter perhaps the last records of Eastern Mansi, collected from two recently found elderly speakers. Instead of integrating these with the established framework of Mansi historical phonology though, she has opted to compare them only with the Sosva-based Northern Mansi written standard, ending up with a very reduced seven-vowel reconstruction of Proto-Mansi or maybe rather Proto-Non-Southern Mansi (“core Mansi” as I have called it) that unfortunately doesn’t seem to be very functional for anything else. A follow-up article in volume 26 (3/2017) explores the Pelym material a bit more, but it does not turn out to show any previously unknown features at least in its phonology. Presumably it would have more value for the lexical documentation of Mansi, perhaps even for etymological research.

Two further works this year I have found more interesting. In volume 36 (1/2020) she treats an unpublished 18th century Mansi dictionary that appears to not fit within the current classification scheme of Mansi: it shows some innovations typical for Northern Mansi (*kʷä- > ко-, *aɣ > оу, *q > х) but fails to show some others (*ä, *š retained as е, ш instead of being simplified to a, s). Even transitional development can be found in по́улъколъ ‘bathhouse’: *äɣ > оу here is distinct from the reflexes in all later dialects (S päwl-, W E päɣl-, N puwl- ‘to bathe’). Specifically non-Northern innovations do not seem to be found though, and I at least would thus simply consider this variety to represent early Northern Mansi before the rise of some more recent innovations. A brief comparison with the older Mansi materials available to me does show the same archaisms in some other early NMs records as well, e.g. *šëëtə > schat ‘100’ (later > sāt).

Most recently, volume 38 (3/2020) now treats several further 18th-century wordlists, namely some very southwestern ones from within the current-day Perm Krai, which she identifies as their own dialect group, though still affiliated with the Tavda dialect that has later on been the “type specimen” of Southern Mansi. I cannot agree with all aspects of her analysis here — e.g. graphical ‹а› for Proto-Mansi *ëë I would think most probably only reflects an inability of 18th-century Russians/Germans to distinguish [ʌː] and [ɑː] — but the overall point seems to be sound: the dialect differentiation of Mansi can be expected to have begun already in the south. A feature that does appear to constitute a shared innovation is the lowering of short *u to ‹о›, but probably this is not yet enough by itself to set up much of a common Southern Mansi dialect area covering both these and Tavda.

2.5. A *č in Proto-Mansi?

All this attention to 18th century Mansi also got me started on assembling an overall overview of the data. Most of it is still not published anywhere, but Gulya’s 1960 article first noting the retention of final vowels [5] cites seemingly all available evidence for a list of about 100 words. This could already have some value for surveying in more detail the development of the Mansi dialect areas over the 18th and 19th centuries.

I can also already submit one initial observation: a few varieties seem to show an affricate, ‹ч› or ‹tsch›, corresponding to usual Proto-Mansi *š. Even more interestingly, this seems to only happen for *š deriving from Uralic *č, not for *š deriving from Uralic *ś (or *ć, as now alternately reconstructed):

  • ‘knee’: M19 ‹tschäntschi›, VTur. ‹ча(н)чи›, SSo. ‹Tschândsche-›
    — cf. Khanty *čäänč;
  • ‘town’: VTur. ‹оча›, SSo. ‹ootsche› (M19 ‹óscha›)
    — cf. Khanty *waač;
  • ‘100’: M19 ‹schäta›, VTur. ‹шата›, SSo. ‹Schôtt›, ‹Schätte›
    — cf. Khanty *saat;
  • ‘heart’: M19 ‹schìima›, VTur. ‹шимъ›, SSo. ‹Schinn› [sic]
    — cf. Khanty *säm.

I would think that this is therefore an archaism: Proto-Mansi had both *č and *š, retained in these three varieties [6] but merged as *š in the others. This of course makes me particularly interested in getting my hands on fuller versions of these three sources in particular and seeing if the pattern keeps up.

3. Three Western Mansi Grammars

I recently discovered also Victoria Eichinger’s PhD thesis “Westmansisch anhand der Textsammlungen von Munkácsi und Kannisto” from 2017. As per the title, this is not an up-to-date language-documentation study but instead a slightly more philological analysis, based on late 19th / very early 20th century fieldwork on the language. It’s a good addition to Mansi grammaticography too, as the now-extinct western dialects have not been subject to much discussion. For an analysis of limited materials it’s fairly thorough, treating also topics that have been mostly left on little attention so far, e.g. morphophonology (the still-living Northern Mansi has much less of this anyway than Western Mansi did). The organization into alternating chapters on the Pelym and Middle Lozva dialects is a bit jarring at first, but seems justifiable enough, especially given brief comparison chapters at the end of each section. The other three WMs dialects thet were recorded more fragmentarily by Munkácsi and Kannisto are generally left out, not a bad option in a generally synchronic grammar. [7] (I do think at least their phonology would eventually deserve a more detailed historical analysis though than what has been done so far.) They only make a small appearence towards the end where Eichinger outlines the main morphological differences between Western and Northern Mansi, even then in a more contrastive than comparative fashion. She does regardless show that parts of a list of seven features that has been suggested to define WMs in earlier research are insufficient, and proposes an amended version.

To me it seems though that even a few of the features given by Eichinger should be still removed from consideration. Two repeating issues are retained archaisms (e.g. accusative case) or heterogeneity (e.g. replacement of the ablative with either postpositions or the lative). Also a bigger open question still might be the direction of comparison. Distinguishing Western and Northern Mansi still remains quite easy. The closest affinity of WMs is instead with Eastern Mansi, forming the Central Mansi group, and among the traditional four-way division of the Mansi varieties, it is the West / East distinction that appears to me to be mostly conventional and not that firmly established. There instead seems to be a cline of increasing innovativeness towards the west, overlaid also with contact effects from Komi and Khanty… It’s probably not necessary to assume the existence of either “Proto-Western” or “Proto-Eastern”, only a single “Proto-Central”. And if so, perhaps some different original dialect cleavage could be assumed for this instead? — At least we now have more good materials for eventually surveying this issue too.

[1] Northern Mansi: /jomas/ ‘good’, /sim-əŋ/ ‘heart-ADJ’, /neː-kʷe/ ‘woman-DIM’, /woːr-tuːr eːtpos-t/ ‘forest-lake month-LOC’ (also /eːt-pos/ ‘moon, month’ readily parses as ‘night-light’), /sam-ən pat-əm/ ‘eye-LAT begin-PTCP’, altogether: “Goodhearted girl born in August”.
[2] She is currently posted instead as the head of the Institute for Languages of Finland though, and has earlier spent quite a while also as the vice-rector of the university. I was happy to catch some of her Mansi courses taught between these some years ago however.
[3] Before anyone wonders in the comments: yes, these might be cognates, depending on how much you like explanations like deriving Northwest Germanic *pottaz from an unattested early Samic reflex of PU *pata (expected PS **pōtē > common Western Sami **puohtē) or anything going back to Indo-Uralic. No loan etymology from IE into all across Uralic seems to be possible though.
[4] Kálmán, Béla. 1976. “Van-e a labio-palatoveláris mássalhangzó-fonéma a vogulban?” — Nyelvtudományi Közlemények 78: 359–363.
[5] Gulya, János. 1960. “A manysi nyelv szóvégi magánhangzóinak történetéhez”. — Nyelvtudományi Közlemények 62: 33–50.
[6] VTur. = Verkhoturye, SSo. = Southern Sosva (Gulya’s “DSzo.”). As far as I can tell, he does not explicitly explain his abbreviation “M19” anywhere, but I think it might mean an unlabeled source, thus microfilm #19 out of the 24 wordlists his paper covers.
[7] The work does clue me in that also a similar Master’s thesis on the Northern Vagilsk dialect was prepared by Eichinger’s project colleague Anna Wolfauer.

Tagged with: , , , , ,
Posted in Commentary, News

First-syllable *ə in Proto-Mordvinic?

The following is, currently, more of a hypothesis I wish to record than an actual result.

Out of the two Mordvinic languages, Erzya shows the simple vowel inventory /i e a o u/ (plus a recent marginal /ɨ/ phonemicized by Russian loanwords). Moksha adds to this firstly an open front vowel /ä/, but also a reduced vowel /ə/ with front and back allophones. In noninitial syllables this corresponds to vowel-harmonic /e ~ o/ in Erzya, or in some dialects instead /i ~ u/. There are two main reconstructions of the Proto-Mordvinic situation: the Finnish/Hungarian approach, which posits Moksha-like original *ə, and the Russian approach, which posits Erzya-like original *i ~ *u. In terms of phonetic typology, the latter seems simpler from the Mordvinic dialectology viewpoint: *i ~ *u > /ə/ is trivial vowel reduction, while *ə > /i ~ u/ is rather less common, and also runs counter to typical vowel inventory trends in the region. [1] The former, on the other hand, seems simpler from the wider Uralic viewpoint: PMo *ə quite typically continues PU unstressed *a ~ *ä, and routing reflexes like *kota >> /kudo/ ‘house’ thru a stage *kudu with a close vowel appears unparsimonious. I have tended to follow the *ə reconstruction already since I mostly talk about Mordvinic within the Uralic context. A second motivation that appears reasonable to me are Erzya dialects where PMo *e *ä yield /ä e/ (minimal pair: /käď/ ‘skin’, /keď/ ‘hand’ ~ Mk. /keď/, /käď/ respectively), a “flip-flop” that seemingly demands some feature in addition to height for distinguishing these. We could posit that *e, *o were, at least phonetically, reduced vowels *ĕ, *ŏ, which would then also suggest that *ə was their unstressed neutralized allophone.

But most of this seems to be further complicated by a look at initial-syllable /ə/ in Moksha. This most typically corresponds instead to /i/ and /u/ in Erzya, including in dialects with /e ~ o/ corresponding to Mk. non-initial /ə/; sometimes we even find both close vowels represented in Erzya dialects; relatively often Uralic sources of such vocabulary would predict **e or **o; sometimes we find loss of the vowel altogether, either in just Erzya or also in Moksha dialects. A few examples:

  • Er. /kirta-/, /kurta-/ ~ Mk. /kərta-/ ‘to singe, scorch’ < PU *kor(p)-tta- (predicted PMo **kurtə-);
  • Er. /turva/ ~ Mk. /tərva/ ‘lip’ < PU *turpa (predicted PMo **torva);
  • Er. /troks/, /truks/, /turks/ ~ Mk. /tərks/, /turks/, /truks/ ‘across, thru’ < PU *tora-ksə (predicted PMo **turəks)
  • Er. /srado-/, /strado-/ ~ Mk. /səradə-/ ‘to be strewn’ < PU *sira- (predicted PMo **sora-).

Generally I’ve seen the /i/ ~ /ə/ and /u/ ~ /ə/ correspondences explained thru new secondary vowel reduction in Moksha. But this really fails to explain why we should have any doublets like /kirta-/ ~ /kurta-/ within Erzya as well. Given this and the cases of syncope, my current hypothesis is that perhaps we should be treating Moksha /ə/ as older, already Proto-Mordvinic, and the Erzya full vowels as secondary. This would obviously confirm that unstressed /i ~ u/ in Erzya also has to be secondary compared to Moksha /ə/; but this comes at a cost: it would also seem to mean that we now have some reason to suspect a contrastive Proto-Mordvinic *ə at least in the first syllable. Many, though not all, cases of such an *ə seem to be further followed by a full vowel /a/. Stress retraction onto full vowels is typical in the region, and so instead of setting up a new vowel quality contrast, a stress contrast might be possible: *tərvá = */tOrvá/ for ‘lip’, versus e.g. *tólga (= Er Mk /tolga/) ‘feather’. Non-initial stress placement like this is in fact attested from both Erzya and Moksha. — But then what of cases like ‘across’? Would we also need to set up contrasts like *təróks = */tOróks/, versus *mórə = */mórO/ ‘song’ (> Er /moro/ ~ Mk. /mor/)? Or even, since reflexes like /turks/ also occur (but not ˣ/turoks/, ˣ/təruks/ etc.), do we perhaps need to set up a syllabic *r̥ here??

All of this should be also further compared with words showing syncope in both Erzya and Moksha. If first-syllable *ə was allowed in Proto-Mordvinic, it seems quite possible to me that words like Er. /pŕa/ ~ Mk. /pŕä/ ‘end, head’ < PU *perä (predicted PMo **piŕə) should be reconstructed not just yet with an initial cluster, but rather as something like PMo *pəŕa, and with syncope only incidentally taking place in both languages later on in this kind of auspicious positions, i.e. where syncope would produce a typologically natural initial consonant cluster (the same environment as initial-vowel syncope in Udmurt).

[1] I would propose solving this by routing the /i ~ u/ dialects thru the mainline /e ~ o/ type: after “de-reduction” of *ə to full vowels, these dialects would have gone thru vowel reduction again, but this time not of the centering but rather inventory-reducing type: unstressed *e > /i/, *o > /u/. This is well paralleled by unstressed /e/ × /i/ > [ɪ] in Russian, which of course has been the most significant contact language of Erzya for the last several centuries already.

Tagged with: , , , , ,
Posted in Reconstruction

Enter your email address to follow this blog and receive notifications of new posts by email.