Back in 2009, a very interesting paper was put out by Jaakko Häkkinen, then an early-stage PhD student:  “Kantauralin ajoitus ja paikannus: perustelut puntarissa“. While no longer especially up to date (I will probably follow up on this claim in another post soon-ish, once one major paper in the works has come out in a future issue of Diachronica), this still remains a notable work that has turned out to be an impetus for quite a lot of discussion over the 10s and ongoing, on our basic assumptions about the early history of the Uralic languages. One of Häkkinen’s suggestions is to attribute some of the shared Finnic–Mordvinic vocabulary to a common southwestern substrate language. He outlines this on the basis of just six words that can be suspected to be of substratal origin per their semantics: three deciduous trees with a southern distribution (the word families of Finnish tammi ‘oak’, vaahtera ‘maple’, pähkinä ‘nut’ < *’hazelnut’ ), two species of high importance to agricultural societies (Fi. vehnä ‘wheat’, lehmä ‘cow’), and one innovative numeral (Fi. kymmen(en) ’10’), and which all also show novel phonotactic features: the word-medial consonant clusters *-mm-, *-kšt-, *-šk-, *-šn-, *-šm-, per him not attested in the Uralic comparative data reaching into the Ugric or Samoyedic languages. Häkkinen mentions also some more narrowly distributed substrate loan candidates with similar phonotactic features (e.g. with geminate nasals: Fi. konna ‘toad’, nummi ‘heath’; Northern Sami lidnu ‘eagle owl’, dápmot ‘trout’) that had been identified already in still earlier studies probing the possibly substratal vocabulary of Finnic or Samic in particular. But as far as I can tell, the idea of a common substrate vocabular layer extending also further east to Mordvinic, partly even Mari and Permic, was a new key innovation.
Increasing phonotactic complexity towards the (south)western end of Uralic is quite apparent really as soon as you pay attention to the topic. Already in one of my earliest posts on Freelance Reconstruction in 2013 I outlined the branch-level distribution of the clusters *šk, *kš, *kšk and *kšt across the Uralic comparative material. Heavy emphasis on Finnic, Mordvinic and Mari, but also not the northwestern Samic, is immediately evident. So there probably should be quite a lot of material that might be attributable to this “Agricultural Substrate” if we went looking for it in detail. 2014-ish I started collecting some additional data on this, taking particular semantic fields as my starting point. Before this reached sufficient completion though, a few other publications already ended up paying more attention to the same vocabulary stratum. I first saw Ante Aikio’s take, in a preprint version of his article “The Finnic ‘secondary e-stems’ and Proto-Uralic vocalism“. This singles out the consonant *š already by itself as a marker of vocabulary of possibly substratal origin (with 25 examples given; about 10 of them not otherwise phonotactically suspect) as well as proposes 9 other cases more on the basis of general phonological irregularity. As he had worked already earlier extensively on the Samic substrate in Northern Finnic and the pre-Uralic substrate of Samic, perhaps some of this was discovered independently though… Aikio only refers to Häkkinen’s paper passingly, not as a main inspiration.
Before Aikio’s paper officially coming out in 2016 , another version still was also outlined by Mikhail Zhivlov in a small conference paper “Неиндоевропейский субстрат в финно-волжских языках“, which identifies 20 items, likewise on the grounds of phonotactic novelties, the general presence of *š and some phonological irregularities; with substantial overlap with Aikio’s list. Taken together, these were already about as much I had assembled too, and I haven’t done much more on my draft since. Not much else seems to have happened on this topic in the late 10s either.
Last fall however, Carlos Quiles, an archeology/genetics/linguistics blogger at Indo-European.eu now seems to have put together a somewhat more substantial review of this and also some other data relevant for Uralic linguistic archeology, in a series of about ten blog posts starting here. This is nominally aimed more at locating the Proto-Uralic homeland — though it is easy to notice that Quiles relies mostly on secondary sources so far, and seems to miss a decent amount of relevant basic data in his chapters working more towards this goal. E.g. already the section on fishing technology is missing at least *sopśə ‘net needle’ and *tulkV ‘dragnet’; perhaps because these are traditionally identified as “Proto-Finno-Ugric” (only found up to Khanty in the east) and thus absent from earlier sources attempting to apply linguistic archeology to Proto-Uralic specifically. I also wonder about some geographic claims like Udmurt supposedly being spoken within the range of the Siberian pine. Probably today if we count migrant dialects further east and/or planted Siberian pines, but to my knowledge it’s certainly not native to Udmurtia (not even most of Komi Republic).
A full review of this whole topic would be a more involved question than I want to go into on the blog though, and anyway I am also not highly impressed by the overall precision of linguistic archeology as a method. It works just fine for ruling out places like the Circum-Baltic, the Arctic coast or the Caucasus as the Proto-Uralic homeland, but finer details like the long-standing debate on Volga-Kama versus Western Siberian homelands don’t seem like they can be easily resolved. At least two reasons conspire to make further progress difficult. One, if a language family starts off as (a part of) an only slowly expanding or even in situ diversifying dialect continuum, we might have trouble distinguishing “common Family” vocabulary from true proto-Family vocabulary. If any newly incoming vocabulary avoids hitting all the earliest isoglosses within the family, or is etymologically nativized across them, it may end up gaining a wide distribution and an appearence indistinguishable from native. Cases like the common Algonquian calque ‘firewater’ for ‘whisky’ that can be identified as much too recent on cultural grounds are just the tip of the iceberg here. Others could include cases like Proto-Finnic *lohi ~ Proto-Samic *lōsë ‘salmon’, which happen to fall into the outlines of Uralic comparative phonology just fine and would point to a common proto-form *lošə. Both are probably instead more recent loans from Baltic, either independently or in Samic thru Finnic; thus so even of they did really go back to this form in both lineages. From some language pairs like North Estonian ~ South Estonian (last common ancestor ca. 500 BCE), or indeed dialect pairs like Western Finnish ~ Eastern Finnish (LCA ca. 500 CE), with heavily parallel and mutually reinforcing trajectories of historical development up to today, we could probably find examples of this type by the thousands. (I call this phenomenon “convergent parallel loaning” and hope to one day treat it in more detail than just the one presentation in Finnish from 2016 so far. Cf. also Häkkinen’s spin on this under the name “invisible convergence“.)
I also consider it probable that our efforts on Uralic reconstruction so far on many points stops at the common Uralic stage, maybe especially in vocalism, not quite yet reaching Proto-Uralic proper. This is evident when attempting to reconstruct the proto-forms of several core vocabulary items, e.g. ‘heart’. West Uralic (Samic, Finnic, Mordvinic) suggests *ćüdäm(ə); Udmurt /śulem/ suggests *śedämV; Komi /śëlëm/ suggests *śädämV; Ugric suggests *śiďVmV or even *śijVmV; Samoyedic *säjä suggests *śäďä or *śäjä. We have no especially good way to explain most of this kind of “proto-variation” or to decide which of any of these variants might be the most original (of course at least the vowel difference between Udmurt and Komi is likely to be recent). The suggestion first made by Zhivlov that traditional PU *ś comes from an earlier *ć that was preserved in Samic, but replaced in areal vocabulary by a new *ć in Permic and the three Ugric branches, is probably right at least though. “*ś” is then basically a Common Nonwestern Uralic (maybe even just Nonsamic Uralic?) but not the proper Proto-Uralic reconstruction. (On structural grounds the same proposal has been made earlier also by at least Janhunen and Abondolo.)
Two, linguistic archeology cannot even in principle pinpoint an origin outside of a family’s current or historical range. Under the basic assumptions behind linguistic archeology, any terminology for e.g. natural realia exclusive to an “external homeland” would have to be either lost or repurposed in all descendants. This would even hold if one of the daughter lineages ended up re-entering the original territory. (Northern Sami speakers moving to Helsinki are not going to magically recover the lost but presumably once extant Proto-Samic words for things like ‘maple’ or ‘eel’.) Suppose for the sake of the argument that Uralic first expanded in a northward fan from someplace around the southern end of the Urals, near Orenburg or Magnitogorsk; southeast of the current range of Permic and Mari, well south(west) of the current range of Mansi. What kind of vocabulary evidence would we even expect this to leave, as distinct from an already originally more northern homeland?
But I believe that’s enough said for now on attempts to locate Proto-Uralic (again, watch for the upcoming issues of Diachronica for news on this). Going back to the Agricultural Substrate, Quiles identifies four semantic areas which would show prominent influence from this:
- tree names and related botanic terms;
In terminology related to animal husbandry and textileworking he gets together a few possible examples too, but contrasted with a more substantial number of loanwords from Indo-European.
I agree with most of these assessments as well. The one exception is apiculture, as the words actually comprising this layer (*mekšə ‘bee’, *metə ‘honey’, *śišta ‘wax’; unreconstructible #käras ‘honeycomb’ ) all have good Indo-European / pre-Indo-Iranian etymologies, unlike the vast majority of the others, and the cases of *š appearing in these can be well derived by RUKI. Even if *š might be often a marker of the Agricultural Substrate, this does not imply that all cases have to be so, and in particular this does not provide reason to abandon well-established loanword etymologies coming from actually attested language families. By a similar argument, I am likewise unconvinced with trying to reinterpret words like *šiŋərə ‘mouse’ (with regular reflexes in all three of Hungarian, Mansi and Khanty) as having anything to do with the Agricultural Substrate. The key motivation for setting this hypothesis up in the first place has after all been the highly limited distribution of words of certain semantic categories or with certain phonetic features. If we start including occasional etymologies that reach also Ugric or Samoyedic, we can no longer maintain the original explanation for why other words of this layer do not do the same (i.e. that the Agricultural Substrate was never in contact with these branches of Uralic). This indeed would come close to abandoning any reason for treating this layer as non-native in Uralic in the first place!
An additional issue that I seem to notice at this point is that, out of the possibly substratal cases of *š, quite few also occur in RUKI environments. The cluster *kš is particularly prominent: *makša ~ *mäkšä ‘rotten wood’, *päkšnä ‘linden’, *wakštVra ‘maple’, maybe *päkškV ‘hazelnut’ and *tekškä ‘ear of corn’ (surfacing as *šk ~ *kš vacillation). There is also a phonologically similar though clearly non-IE *š after *ŋ in *jaŋša- ‘to grind’, maybe also behind *riŋəšə ‘threshing ground’. Examples of *ks or *ŋs also do not seem to occur. I suspect that this points to the Agricultural Substrate actually coming to Uralic second-hand, and that it was instead first adopted into an extinct para-Balto-Slavic and/or para-Indo-Iranian language that, as expected per general Indo-European dialectology, regularly retracted *s to *š at least after velars; including in words that it had earlier adopted from the Agricultural Substrate proper. This hypothesis gives us also some more wiggle space in identifying the substrate in the archeological record: even archeological cultures that were probably Indo-European-speaking could be considered as the source.
Speaking of the ultimate identity of the substrate, Quiles has an interesting new suggestion on this, too: he seems to have found parallels for a number of the involved words in the West Caucasian language family, and attempts to sketch ways it could have been in contact with Uralic. This I think would be worth further exploring. Some more data to this effect might be also findable from Bernát Munkácsi’s 1901 monograph Árja és kaukázusi elemek a finn-magyar nyelvekben. While Uralic–Indo-European loanwords studies have been an extensive and productive field for long, on the topic of Uralic–Caucasian comparison of almost any flavor this remains just about the most recent even halfway serious overview. — Directionality, however, is not obvious to me. As Quiles notes, the WC ~ Uralic parallels center on technology and metalworking terminology. It seems to me they could be well explainable, besides pure accidental resemblance, also as a set of recent Wanderwörter, or parallel loanwords from a lost common source. There is thus barely any evidence yet to speak of a West Caucasian substrate language specifically.
By now I would have also more detailed comments on numerous individual etymologies proposed to belong in the Agricultural Substrate by one researcher or the other. This task will be best left for another time however, in many cases maybe also for another context entirely, and I might return to the topic only after having gotten more of these forthcoming etymological etc. observations out to print individually. Substrate languages are a fascinating topic, but they really are not highly feasible to tackle head-on: they emerge only from the dark corners of linguistic reconstructions, generally identifiable more by what is absent than by what is present.
 While Häkkinen continues to be active in our field and has a lot to say especially on the topic of the relative and absolute chronology of Uralic languages (recently e.g. coauthoring an article on Southern Sami with Minerva Piha in the latest Sananjalka), his PhD though unfortunately still remains unfinished.
 Part of the Finnish / Swedish grouping jalopuut, jalot lehtipuut / ädellövträd ‘noble (broadleaf) trees’. Other generally agreed members include the elm, ash, linden, beech and hornbeam. This might be convenient to calque into English too. Delimiting it in a context wider than just the Nordics has some difficulties though… would we only accept species whose distribution overlaps with the taiga zone at least within gardens, ruling out the likes of plane trees; and would we follow the main practical motivation of the term and rule out softwood broadleaf trees like the poplar?
 Nominally regardless claiming to be in the 2015 issue of Suomalais-Ugrilaisen Seuran Aikakauskirja. I wonder how often these kind of delays, between when a periodical is dated and when it actually comes out, are due to printing queues and how often due to actual editing issues.
 Mordvinic *käŕas, Mari *käräš, Udmurt /karas/; none of these can be native as such. The Mordvinic and Udm. words show a ⁽*⁾front vowel in the first syllable plus a ⁽*⁾back vowel in the second (PU unstressed *-ä- > Udm. /e/), and such disharmonic vowel combinations always result from either recent derivation or recent borrowing. The Proto-Mari vowel *ä then is non-native entirely. Probably mostly likewise for those cases of pre-Permic *ä that end up retracted to /a/.