Etymology squib: riipustaa

I happened today upon a small etymological review article “Lat. scrībere in Germanic“, which argues that this is indeed a loanword rather than a cognate, but a relatively early one, already roughly into Proto-West Germanic. This got me thinking about a possible modern Finnish reflex: riipustaa ‘to scribble letters’. This is not a word of particularly wide currency, and does not even cover the general meaning ‘to write’, which is usually expressed by the native term kirjoittaa (more rarely also piirtää, whose main meaning is ‘to draw’). The word seems to be in fact marginal enough that it is not treated in any Finnish etymological work at my disposal. Regardless resemblance to Latin is quite apparent.

There is room for dout here already to start with. For one, riipustaa exists beside a variant raapustaa, which is in my impression (and by Ghits) even the more common one. Formally both could be also analyzed as derived from the more basic verbs riipiä ‘to rip, pluck, tear, scratch’, raa(p)pia ‘to scrape, scratch’. However the fairly specific semantics regardless lead me to think that the similarity of scrībere and riipustaa is not accidental. If this were a straight-up loanword, raapustaa could have come about as a contamination with raapia (or perhaps its further derivative raaputtaa). As for riipiä, the cognates across Finnic and also Germanic (← *rīfan-, *rīpan- [1]) only seem to show the meanings in the range ‘to rip, pluck, tear’. The dialectal Finnish meaning ‘to scratch’ could be rather by the influence of raapia and/or riipustaa.

It should be also noted that most loanwords from Latin into early Germanic have sooner or later continued their trek further into Finnish: e.g. enkeli ‘angel’, kauppa ‘store, trade’, keisari ’emperor’, kellari ‘cellar’, kori ‘basket’, kyökki ‘kitchen’, luumu ‘plum’, pannu ‘pan’, penni(nki) ‘penny’, piippu ‘pipe’, pytty ‘pot’, säkki ‘sack’, tiili ’tile’, viina ‘spirits’ / viini ‘wine’, ämpäri ‘bucket’, äyri ‘monetary unit’. Most of these are newer loans from Swedish, but earlier loaning roughly to late Proto-Finnic or early Common Finnic would not be unthinkable. One such more widespread case is kattila ‘kettle’, which appears to be reconstructible for Proto-Finnic (> e.g. Veps katil, Livonian kaţļā, South Estonian katõl’, with regular development [2]). Another candidate is ‘pound’: besides Fi. punta there are also Est. pund, Liv. pūnda, which could also all derive already from PF *punta. This word however has undergone so few sound changes on any side of the equation that, in the absense of cognates in Eastern Finnic, I would not rule out later parallel loaning. [3]

This possibility is relevant since quite early loaning would have to be assumed for riipustaa in order to account for -p- (contrast modern Sw. skriva, Low German schrieven). Outright retention of the plosive all the way from Classical Latin seems still unlikely (already Old Norse shows skrifa), but given that also Germanic *f turns up as /p/ in Finnic for a while, substitution of Germanic *-b- as early Finnish -p- seems like it should also continue to be possible even after lenition to *-β- > *-v-. Especially if I’m right about the hypothesis that the introduction of the substitution pattern fv is due to the onset of the sound change *w > v in Finnic and not anything changing on the Germanic side.

Morphologically however riipustaa could not be ancient in this scenario: before the heavy formant -sta- a weak grade would be expected (cf. e.g. riippu- ~ rippu- ‘to hang’ → *rip̆pu-sta- > ripusta- ‘to hang up’, lintu ‘bird’ → *lindu-sta- > linnusta- ‘to hunt birds’), which would here demand a root √riipp- (clearly underivable from Germanic). The other option would be to consider this a fairly recent formation, similar to the likes of maku ‘taste’ → makusta- ‘to savor a taste’ (vs. older mausta- ‘to spice’), julkinen ‘public’ → julkista- ‘to publish’ (vs. older julista- ‘to proclaim). The immediate source of derivation would then probably have to be a noun *riipu ‘scribble’ (from a verb *riipa-, *riipat-, *riipi- or even *riipe- ‘to scribble’). So this idea of riipustaa as a loanword is but a “root etymology”: only the root syllable riip- could possibly derive as an old loan based on scrībere, all else needs to be more recent.

Assuming all this eventful history within Finnish is also unfortunately getting rather convoluted. If it is only the semantics of this word that end up nicely matching with Latin, perhaps a more economical solution would be to reverse the direction of the various semantic contaminations I’ve assumed above:

  1. a derivative raapustaa ‘to scrabble, scratch’ comes about in Finnish;
  2. through the influence of riipiä, this develops a by-form riipustaa;
  3. through the influence of Swedish skriva (and, why not, also riimu ‘rune’?) this acquires the meaning ‘to scribble letters’ in particular;

with the net result being that riipustaa is much younger altogether, built up within the last few centuries.

The third point would probably require that skriva gets borrowed into Finnish first, presumably in the form ((s)k)riivata.

At this point I need to finally turn my eye from reconstructive speculation to real data. And a narrower chronology turns out to be vindicated by the Finnish dialects. While Suomen Murteiden Sanakirja has not yet reached R- (probably won’t until a few decades from now; finishing L alone has been taking some five years), it is already possible to see that (s)kriivata ‘to write’ is in fact attested (including even a by-form kriipata)! At least the initial cluster skr- cannot be here anything else but a sign of a recent loanword. [4] Various other evidently related formations have attestations with a cluster kr- as well: e.g. kriipu ‘scratch’, kriiputa and kriiputtaa ‘to draw lines’, and indeed also kriipustaa ‘to scribble’, kriipustus ‘scribble’. Tracing the rise of the forms with -p- rather than -v- is not obvious without access to the entries of the clusterless variants, but I’d still imagine the source is in the end raapia — this, too, indeed also with a variant kraapia (it’s a loanword from Sw. skrapa ‘to scrape’ after all).

Case closed, I think: riipustaa is a recent Finnish-internal formation, evolving on the basis of Swedish skriva, and only accidentally comes again close in form to its ancient Latin original. Seeking a deep-reaching root etymology for a geographically isolated word turns out to be a demonstrably bad idea once again.

[1] Today explained as Kluge’s Law doublets, though I suppose back-loaning from Finnic would also work.
[2] The SE form and North Estonian katel show syncope-then-epenthesis through *kattila > *kattľ, similar to e.g. *akkuna > *akkn > NE aken ~ SE akõn’ ‘window’, *taikina > *taikn > NE taigen ‘dough’. This took interestingly enough only place between a heavy first syllable and a light 3rd syllable. Otherwise V2 survives, e.g. *satula > NE sadul ‘saddle’, *rusikka-s > NE rusikas ‘fist’, *palmikkoi > NE palmik ‘plait’.
[3] This moreover has apparent reflexes even in the Volga region, usually explained as loaned from Gothic: Erzya /pondo/, Moksha /ponda/ ‘measure of weight’, Mari /pundə/ ‘money, capital’. The former interestingly enough appears to have come in early enough to have participated in the lowering of native *u > *ʊ to /o/… I wonder if early Slavic *pǫdъ could work as an alternative more recent loan source (nasal vowels are still reflected as /Vn/ in several early loans to Finnic as well).
[4] For plain kr- in western dialects an onomatopoetic origin might be conceivable, cf. e.g. rapista ~ krapista ‘to rustle’, rätistä ~ krätistä ~ prätistä ‘to crackle’.

An Old Komi inscription

From A. J. Sjögren’s Die Sürjänen (orig. 1829), more exactly the reprint in his Gesammelte Schriften I (1861, edited by F. J. Wiedemann):

An Old Komi inscription.

I do not have a transcription or any other analysis to offer, alas. For one, while “Old Permic” is in Unicode as of 2014, I still have no typeface capable of actually displaying it. For two, transliteration just off the cuff would be difficult too: while the Old Komi alphabet is in principle Cyrillic-derived (developed in 1372 by St. Stephen of Perm), most familiar-looking letter shapes are actually red herrings. E.g. the reversed-Г letter is /i/, the lambda/rho-digraph-esque letter is /š/, and the hourglass is /v/ (derived from В). The only letter that is readily correctly identifiable with just knowledge of Cyrillic might be yat Ѣ, observable e.g. at the start of the 4th line. For anyone who’d want to try their hand themselves though, has a paleography chart from the one authoritative modern source, Lytkin’s Древнепермский язык (1952).

For three… we actually hear in Uralistics very little about Old Komi at all. I for one would enjoy seeing at least a bit more coverage in basic sources. Yet the only fact that comes up with any frequency is the claim of Old Komi being the only direct and consistent source for differentiating Proto-Komi *e and *ɛ. [1] Otherwise, e.g. Raija Bartens’ 360+page handbook Permiläisten kielten rakenne ja kehitys (2000) only discusses the Old Komi written records for no more than half a page, with no direct examples given.

There is regardless not much available to begin with: in the 1998 handbook on Uralic, A.-R. Hausenberg reports that the known corpus on Old Komi only measures 225 words, which would be on par with minor epigraphic languages such as Thracian or Lydian. I suppose the inscription above would therefore already constitute maybe a good 10–15% of the total corpus.

[1] The Komi-Yazva variety also distinguishes these however, as /i/ versus /e/.

The origin of the Finnic long vowels: An outline

Continued from my thesis release post, as is perhaps appropriate now that I finally have wrapped up my graduation as well. To make it a bit more convenient for readers, I provide here an English outline of the specific topics I discuss in my thesis, even though writing out my argumentation in detail would take much more work.

Comments in [brackets] are additions specifically made in this post, not found in the thesis itself.

pp. 2–8: Methodological basics [§ 2]
A honestly fairly scattered lookover at the usual meat-and-potatos of the Comparative Method. Language relationship means descent from a common protolanguage; word relationship means descent from a common proto-form, and constitutes a comparative etymology; comparative etymologies imply sound correspondences; regular sound correspondences allow reconstruction, i.e. the postulation of proto-forms, and sound changes leading from them to attested forms; any given set of comparative data allows many different reconstructions, and parsimony must be appealed to to choose between them. On p. 5 I also sketch the structure of sound change in terms of 2+2+2 factors [a topic I think I would like to expand on in future work on the ontology of historical linguistics]: input/output, enabling/blocking conditioning, and location in time/space.

pp. 8–11: Research data [§ 3]
Early on I had datamined the etymological literature for all cases where an etymon shows a long vocoid (long vowel or a diphthong) and could feasibly date back to Proto-Finnic. This compilation comes out at about 1500 items. For now I treat in full detail only the oldest layer of long vowels however, numbering a few dozen. So there would be much further work to do also (some of which in fact is already done and will be simply delayed for future publications).

pp. 12–17: Finnish to Proto-Finnic [§ 4]
This era of history is pretty much established. There is a consensus view of the vowel system of late Proto-Finnic (tables 1–2), save for one issue: whether there were harmonic equivalents *e, *ë (= FUT *e̮, orthographic õ), as in Estonian / Votic / Livonian, or a harmony-neutral *e, as in Finnish–Ingrian–Karelian and Ludian–Veps (pp. 15–16).

[The latest word on PF *ë has actually recently come out in Petri Kallio’s Festschrift, where Jaakko Häkkinen argues, with further commentary from me & Santeri Junttila, in favor of the harmonic reconstruction.]

The main phonological changes from PF to modern Finnish are *eü > öy, long mid diphthongization *ee *öö *oo > ie uo, and various syllable contractions due to consonant losses (introducing unstressed long vowels and recreating stressed ee oo). Innovations not affecting the vowel inventory per se but still adopted in Standard Finnish include coda vocalizations (e.g. *mükrä > *müɣrä > myyrä ‘vole, mole’) and some partially unexplained vacillation in vowel length (e.g. *kärmeh > käärme, SW kärmes ‘snake’).

[There are also many further shifts like conditional *e > ö or unconditional *ä > ɛ, particular to specific dialect areas, today sidelined due to the archaizing nature of Standard Finnish. Most Finnish dialects other than Savonian are overall fairly conservative in their vocalism however.]

pp. 18–22: Preambles for comparative Uralistics [§ 5]
Some known results and concepts:

  • (pp. 18–19) the sprawling terminology of Uralic “intermediate” proto-languages, from Proto-Finno-Samic to Proto-Finno-Ugric, and whether any of this is really necessary (hardly).
  • (pp. 19–21) the Proto-Uralic word root as canonically bisyllabic, but with impoverished second-syllable (“stem”) vocalism; including the debate on if the non-open stem vowel should be reconstructed as *i, *e or *ə. In particular, extensive metaphony and vowel reduction across the Uralic languages means that most of the time it is convenient to operate with vowel combinations such as *i-ä or *ä-ə, instead of individual vowels.
  • (p. 21) the recent result that Finnic *a-ë actually continues three different PU vowel combinations: *a-ə, *ë-ə and *ä-ä.
  • (pp. 21–22) the distinction between primary and secondary long vowels in Finnic. The latter arise by syllable contractions after loss of “vocalizing” consonants *j, *w, *ŋ, *x, while the former do not — they are usually held to constitute the oldest group, and it is them that my work focuses on. The post-Proto-Finnic myyrä, käärme type could be probably furthermore called tertiary long vowels.

pp. 23–24: Early Proto-Finnic [§ 6.1]
(Now entering the extensive research history section.)
The main Finnic-Samic and Finnic-Mordvinic vowel correspondences were worked out already in the late 1800s. It turns out that the 16-member Proto-Finnic vowel system has retained more original contrasts than the 9-member Proto-Samic and the 6-member Proto-Mordvinic systems (well, duh); and, per a “prestructuralist” result due to A. Genetz, reshuffling other than simple mergers in the latter two has mainly taken place by metaphony, so that variable correspondences between them and Finnic do not require reconstructions like deriving Finnic *o partly from an *o and partly from an *ɔ. Later loanwords from Finnic into Sami greatly obfuscate this though, and it is imperative to use evidence from other Uralic languages to figure out what is really old native vocabulary and what is not. These add up to set up the Finnic system as looking essentially archaic when compared within West Uralic.

pp. 25–26: W. Steinitz (1944) [§ 6.2.1]
An overview of the earliest big system of Proto-Uralic vowel reconstruction, due to Wolfgang Steinitz and based primarily on Khanty and Mari. This included no long vowel subsystem, but instead a reduced vowel subsystem (*ĕ *ö̆ *ŏ = [ɪ ʏ ʊ]), supposedly retained in these two key languages. For the primary (non-contracted) long vowels in Finnic, three important distributional limitations were noted:

  • only *ii *ee *oo *uu seem to occur
  • only in open syllables
  • only in *ə-stems

In other words, words like piika ‘maid’, näätä ‘marten’, huosta ‘care’, tyyni ‘calm’, kaari ‘arc’ would have to be either not native vocabulary at all, or to involve secondary i.e. contracted long vowels.

This setup was proposed to result from a conditional development of *i *e *o *u into long vowels, versus the reduced vowels then giving short *i *ü *u; *ĕ partly also *e. No real solution was offered on where Finnic *oCe- would come from. *a *ä were supposedly left outside this lengthening rule.

pp. 27–33: E. Itkonen & followers [§ 6.2.2]
The previous was quickly countered by Erkki Itkonen, citing extensive counterevidence for the supposed vowel lengthening rule in Finnic, as well as evidence for reduced vowels in at least Mari clearly being secondary development from non-reduced *i *ü *u. What was set up instead was the “Finnic icebox” theory, according to which Proto-Finno-Permic (defended in detail by Itkonen), Proto-Finno-Ugric (defended more sketchily) or even already Proto-Uralic (only as a suggestion and not by Itkonen himself) had a defective long vowel system with just *ii *ee *oo *uu. Often but not always Steinitz’ two other distributional restrictions were included also. Over the 40s to 60s, evidence was gradually dug out from pretty much all across Uralic that the Finnic primary long vowels have distinct correspondences from their short counterparts, and so cannot be derived from the corresponding short vowels by a late lenghtening rule. However, it was never argued why these correspondences should be reconstructed with the same values as in Finnic — aside from the circular assertion that “the Finnic vowel system is archaic”. To be fair, Itkonen had originally based this on the evident archaicity of the Finnic root structure, i.e. the fact that Finnic maintains PU second-syllable *-a and *-ä, but the idea had slowly grown from there into dogmatic insistence on general archaicity.

pp. 33–35: M. Lehtinen (1967) [§ 6.2.3]
Just one paper from the 60s seriously considered the reconstruction of the correspondences of the Finnic long vowels from a new angle. This included the insight that since already in Samic and Mordvinic, the Finnic vowel combinations *ee-e and *oo-ë have the same correspondences as *ä-e and *a-ë, the former two should be assumed to result from raising of earlier *ää and *aa, and that this could be then used to revive the vowel lengthening theory in part. This, alas, went by with little attention (and what little there was included a takedown by Itkonen that glossed over the key insights entirely).

pp. 35–38: J. Janhunen (1981) [§ 6.2.4]
Basic comparative work on Uralic reconstrunction was rekindled after Juha Janhunen had in 1977 released a usable reconstruction of Proto-Samoyedic. An earlier work, Sammallahti (1979), focused more on the consonant system, and for vocalism only involved very vague “structural comparison” of the first-syllable vowel inventory of PSmy with the ice-box reconstruction of PFU, yielding no substantial results. Janhunen’s own contribution instead begins by dividing the data into vowel combinations rather than just first-syllable vowels in isolation, which allows him to uncover a number of metaphony developments: several in PSmy, and at least one that he tentatively dates to Proto-Finno-Permic. Not through any analysis of Permic data in particular however, but rather “inherited” from the fact that this is how far back detailed comparative work by Itkonen had reached. Both these works also reinstated a back unrounded vowel *ï (FUT *i̮) into the proto-system. This had already been proposed in several early-1900s works including Steinitz’, then sharply denied by Itkonen. Sammallahti and Janhunen give no references whatsoever to any of this earlier work, though. Janhunen has one passing mention that Ugric evidence might also support the reconstruction of *ï, but even this could be an independent rediscovery.

Janhunen had a new proposal entirely for the Finnic primary long vowels: similar to Indo-European, these would come from vowel + a laryngeal, transcribed by him as *x (= FUT for “unknown consonant”, not IPA [x]). These *Vx sequences were on their other leg based on Samoyedic *Və, including then the possibility that this “laryngeal” was indeed already a vowel to begin with. Or rather, had been vocalized already in PU. The normal Uralic root structure is *(C)V(C)CV, and *x would best fit here in the coda consonant slot. This would also naturally result in long vowels being restricted to open syllables. To explain the apparent lack of *CVVCA, Janhunen proposed shortening of the FP long vowels in this root type. The lack of *aa and *ää was explained by raising to *oo and *ee, i.e. same as Lehtinen, though perhaps again reinvented independently. No explanation was given for the lack of primary *üü.

pp. 38–39: Summary of theories [§ 6.2.5]
The previous four views constitute the main theories presented for the origin of the Finnic primary long vowels. I note in Table 3 some commonalities between most of the two: e.g. Steinitz and Itkonen agree in deriving the *ii : *i, *uu : *u contrasts already from an original quantity contrast, with or without reshuffling; Itkonen and Janhunen agree in setting aside distinct vowel combinations as the source of the PF long vowels. It may be additionally worth noting that while Steinitz, Itkonen and Lehtinen all comment on one another, Janhunen’s work started off as “disconnected” from the discussion, nominally by the excuse that he was researching Proto-Uralic rather than Proto-Finno-Ugric.

pp. 39–42: E. Tálos [§ 6.3.1]
Something looking like a fifth theory was also published in a somewhat little-known paper by Endre Tálos in 1983; so just two years after Janhunen’s. On closer reading of his formula-dense presentation though, this is rather a much-derived update on Lehtinen’s theory, combined furthermore with taking Ugric vocalism as more archaic and western Uralic as more innovative. Notably Tálos sides with Itkonen in rejecting Steinitz’ and Lehtinen’s idea of Finnic *ee and *oo being in some cases derived from earlier *e and *o, but still sides with Lehtinen’s (and Janhunen’s) idea of them being however derived from earlier *ä and *a. This is further combined with deriving *ii and *uu from (the same source as) *e and *o. At this point, then, Tálos’ theory actually turns out to fulfill also Itkonen’s much-insisted “boundary condition” that long *ii *ee *oo *uu always have a different source from short *i *e *o *u, respectively — but starting from a more parsimonious original system.

pp. 42–: Sammallahti (1988) [§ 6.3.2]
A massive synthesis of Janhunen, Itkonen and Steinitz’ reconstructions was sketched out by Sammallahti in the 1988 handbook The Uralic Languages. He jettisons his own earlier 1979 ideas and rather takes Janhunen’s Proto-Uralic framework as the starting point, and demonstrates that slight variants of Steinitz’ and Itkonen’s reconstructions, now labeled as Proto-Ugric and Proto-Finno-Permic, are derivable from it. These results, among other things, cement the reconstruction of PU *ï, now clearly based on evidence also from Ugric and Permic. The model ends up as a compromise of “least quarrel” rather than of maximum parsimony, though, with the three different prominent reconstructions siloized in their own taxonomic units; and without a single word on either Lehtinen’s work, or on Steinitz’ and Itkonen’s points of substantial disagreement.

pp. 44–45: Addenda [§ 6.3.2]
Come the 2000s, some fine-tuning of Sammallahti’s vocalism model was eventually given as well. This perhaps had required the earlier publication of articles critizing most of the traditional Uralic subgroups such as Finno-Permic. The outcome was mainly to expand “Janhunen’s territory” at the expense of Itkonen — e.g. Ugric and Mordvinic do not show evidence for the shifts *ä(x) > *ee and *a(x) > *oo, and therefore, rather than presuming back-developments *ee *oo > *ä *a, the qualities *ä and *a in these languages should be simply taken as archaisms.

pp. 45–50: Lehtinen’s Comeback [§ 6.3.3]
The evolving standard model of the PU vowel system had up to this point remained a Janhunen / Itkonen / Steinitz compromise — or by 2010, essentially just Janhunen / Steinitz, with Itkonen’s model having been gradually pushed to irrelevance (as I show in detail in the list on pp. 47–48). Lehtinen’s model was however (re)introduced into the discussion only at the beginning of the current decade, in a 2011 article by Reshetnikov & Zhivlov. This quickly led to an article by Ante Aikio the next year, which ended up replacing the until then expansive region of Janhunen’s reconstruction with what could be by symmetry called “Lehtinen’s reconstruction”: a PU vowel system with no extant or incipient vowel length at all, with length arising entirely secondarily in Finnic. This involved also presenting a new origin, independent of the Finnic primary long vowels, for Samoyedic *Və sequences, the only other source of evidence Janhunen had been able to propose for his *Vx reconstruction.

Even more impressively, Aikio shows that Lehtinen’s conditions for *a > *oo don’t just apply to “regular” *a, but also to what was by then still considered “irregular” *a from secondary sources (they were subsequently promoted to regular just a few years later; cf. § 5). Instead of three different sound changes having the exact same conditioning factors and output, we clearly should then assume only one sound change — but this then requires that it in fact takes a normal short *a as its input, and not anything else like *o or *aa.

p. 50: Lehtinen’s Law [§ 6.3.3]
We should take a step back to appreciate what has been accomplished here: an originally humble suggestion has, about 45 years after its first proposal, turned out to vindicate a relatively parsimonious eight-vowel reconstruction of Proto-Uralic. Other things being equal, this is clearly an improvement over either Steinitz’ or Itkonen’s systems, both of which required more extensive 11-vowel proto-systems. The key has been the identification of somewhat intricate but still plausible conditioning factors for the rise of long vowels in Finnic. This latter development surely then deserves being promoted to the status of a named Sound Law, even though such results have not often been recognized in Uralic studies.

[While “Lehtinen’s Law” is my own coinage, originally back on this blog in 2013, the first published use is actually not: that milestone goes to Patrick O’Rourke in a 2016 article.]

[Additionally perhaps worth noting: I have seen @Laws_of_IE recently note that even in the much larger field of Indo-European studies, so far essentially all named soundlaws have been credited only to men. Lehtinen’s Law would not however be the first example against the tide: in Uralistics, Edith Vértes can already claim a third-of-a-credit for a “rule” that was nascently named in 1973 (see my footnote 14). Moreover, full primary authorship of a Law has by now been granted, at minimum, to Betty Chang in the 2011 article “An Inventory of Tibetan Sound Laws” by Nathan W. Hill.]

pp. 50–53: The Close Vowels [§ 6.3.4]
Some initial thoughts on a topic that ended up being mostly cut from the finished thesis. An important starting point that should be recognized is that while the cases of mid *ee and *oo are symmetric to one another (with respect to backness), similarly *ii and *uu to one another, there is no reason to require that the history of the long mid vowels and long close vowels should be isomorphic to each other! Plenty of languages have long close vowels without having long mid vowels. Examples already within Uralic include Tundra Nenets and much of Eastern Finnic. Hence, even if Finnic *ee and *oo can be parsimoniously derived from earier short vowels, this is not an automatic licence to deny the reconstruction of *ii and *uu for Proto-Uralic.

A brief review of proposed sources for the long close vowels allows however putting together an interesting heuristic argument, based on how there is no primary long *üü in Finnic, even though PU is agreed to have had *ü among its close vowels. If *ii *uu were from something like *ij *uw, or *ix *ux, we should expect to find also something like **üü < *üw or *üx alongside. However, there is one proposal out there that would lead to a lack of **üü naturally: *ii *uu < *ej *ow (since PU had no **ö among the mid vowels). Interestingly this proposal has for long mostly been investigated in loanword studies, which can demonstrate some fairly clear examples like pre-II *pey-men- ‘milk’ → *pejmä > Proto-Finnic *piimä, yet not in works on PU reconstruction.

pp. 53–54: Summary of research history [§ 6.4]
The main arc of research history on the Finnic long vowels has probably been the introduction of the idea that the primary long vowels would be archaic inheritance, gradually pushed further and further back in origin; and then just as gradually, them getting pushed forwards again, and now ultimately considered only an innovation particular to Finnic. While this has resulted in a new, more parsimonious reconstruction of Proto-Uralic, it has also left behind quite a bit of baggage: the existing overviews of the historical phonology of the more western Uralic groups (Samic, Mordvinic, Mari, some versions of Permic) as published in the last decades of the 20th century all begin from a system with the Finnic long vowels still hanging around. I cannot hope to clean up this mess entirely in subsequent thesis chapters; just working the Finnic history itself into a more streamlined shape will have to do. But much other work lays open for the taking too.

pp. 55–59: The Proto-Uralic vowel system and its development [§ 7.1]
Moving onto comparative reconstruction, I take as my starting point the current standard eight-vowel inventory, as due to Janhunen & Sammallahti (table 5). I give a very condensed literature summary on the reflexes of the non-open vowels *e *ä *a *ë *o in the daughter language groups (table 6), including a few comments of my own on disputed or nontrivial developments. [Some of these may warrant expanding into full articles somewhere down the line.]

pp. 59–65: Lehtinen’s Law: Phonology [§ 7.2.1]
From a phonological angle, Lehtinen’s Law has more than a few interesting features. Vowel lengthening being limited to open syllables is trivial, at least. Phonetically natural but non-trivial are lengthening being limited to the open vowels *a *ä; its being limited to stems where it is followed by a weak stem vowel *ə and no other stem-forming material (even CVCVC stems like *śadək > *sadëk ‘rain’ are unaffected); and being limited to stems with a voiced medial consonant. The output seemingly being mid vowels *oo, *ee is best explained by an intermediate stage with long open vowels *aa, *ää, followed by unconditional long vowel raising. The most interesting question however might be why *j, *w, *ŋ, *x do not trigger LL. I propose that this set of segments can be transformed into a natural exception class of semivowels, if we assume sound changes *ŋ, *x > *ɰ before LL. This hypothesis can be supported by how *ŋ and *x in fact have identical reflexes everywhere in Finnic (either lost or vocalized to *w), aside from the cluster *ŋk, which is the only environment where [ŋ] survives into late Proto-Finnic (and there’s no contrast with *m or *n here anyway).

The phonologization of the long vowels resulting from LL is hard to date. *d > *t in *aadə > *aatə (> *vooci) ‘year’ suffices to turn the conditioning opaque, but this does not introduce any contrast anywhere: this is the only example of LL before *d, and there are no roots of the shape **Catə. Syncope in inflected forms such as the partitive case could work also, but is itself difficult to date.

pp. 66–69: Long vowels in loanwords: *oo [§ 7.3.1]
Loanword studies allow dating the phonemicization of long mid vowels in Finnic quite well, however. Long vowels show up in non-LL positions already quite early in Indo-European loans such as *soola ‘salt’ — and it seems possible that phonemic long vowels were essentially originally introduced as loanword phonemes. ‘Salt’ and many other examples actually derive from a late PIE *ā and not *ō (cf. *sooja ‘protection’ ← PII *sćāyā, *vootna ‘lamb’ ← Baltic *āgna-, etc.), which then provides independent evidence for my phonetically assumed intermediate stage *aa.

My new intermediate reconstruction stage even seems to solve a minor morphophonological mystery: Fi. suola ‘salt’ shows an irregular plural stem suoloi-, while normally a-stems have a plural stem in -i- when following a labial stressed vowel; plural stems in -oi- normally only occur after illabial vowels (cf. e.g. sola : soli- ‘gap, pass’ | sara : saroi- ‘sedge’). I propose this plural stem is a retention from the *saala stage, when the word still indeed had an illabial vowel in the first syllable!

pp. 69–71: Long vowels in loanwords: *ee [§ 7.3.2]
Scraping together similar evidence for an *ää stage is much harder though. Most potential examples along the lines of miekka ‘sword’ are immediately discardable from consideration, as they have back harmony (PF *mëëkka) and projecting them back to disharmonic preforms like **määkka would be in violation of pre-Finnic vowel harmony. In Germanic it seems clear that we can reconstruct an open-ish *ǣ as being more original than close-mid *ē; in Balto-Slavic this is less obvious. Elsewhere in Indo-European also Albanian shows a shift *ē > *ā [and, as I’ve learned since then, ditto Phrygian], which could be taken as evidence for *ē having been somewhat open originally. If so, then probably at least a few loanwords into Finnic indeed originally came over with *ää and were then raised to *ee.

pp. 71–74: Shortened long vowels in loanwords [§ 7.3.3]
A topic that also needs to be addressed are the cases where IE long vowels yield Finnic short vowels. Some oldest cases could perhaps predate LL entirely. I propose however that cases with *ā → *a, *ē → *ä could be from the stage of Finnic where long vowels from LL were now *ee and *oo, but new secondary *aa and *ää (from contractions like *aŋə > *aɰə > *aa) did not yet exist, so that only short *a and *ä would have been available as substitutes for [aː] and [æː].

A clearly newer layer are words that reflect PIE *ē > Northwest Germanic *ā as Finnic *a. These and some other examples seem to involve mostly phonotactic limitations that were active even up ’til Proto-Finnic, such as no long vowels before sonorant + sonorant clusters.

pp. 75–79: Long *oo in native vocabulary [§ 7.4.1]
I have assembled the vowel correspondences of Finnic *oo across Uralic in Table 7. They show a fairly clear three-part division, mirroring the fact that also Finnic *a-ë has three separate origins, and quite well demonstrating the same reflexes: my group 1 reflects PU *ë-ə, group 2 reflects PU *a-ə, and group 3 reflects PU *ä-ä.

pp. 79–87: Etymological commentary on words with *oo [§–4]
Some of the etymologies involved require in-depth discussion. I e.g. propose a new soundlaw for Permic: *waRV > *wȯRV; summarize/extend a so far unreleased argument from Kallio that Suomi ‘Finland’ is indeed after all cognate with Sámi, as has long been suspected, both derivable now quite straightforwardly from a protoform *sämä; assemble a new etymology for nuori ‘young’ from bits and pieces already known in earlier literature; and propose as a more speculative idea that vuori ‘mountain’ would actually come from a meaning ‘hillfort’, and be in turn a loan from Iranian *wāra- ‘fort’.

pp. 87–90: Long *ee in native vocabulary [§ 7.4.2]
Table 8 similarly collects the vowel correspondences of Finnic *ee across Uralic. These are more homogeneous than those of *oo, and almost all point to original *ä. Only the Permic evidence requires closer treatment, and I show that *ä-ə > *ɨ seems to be quite regular, including also several non-LL cases (e.g. *jäŋə ‘ice’ > Finnic *jää ~ Permic *jɨ). Even some seemingly different cases showing *ä-ə > *i (e.g. *kätə ‘hand’ > *ki) can be assumed to have gone through intermediate *ɨj. [I have however not treated here some other yet smaller exception groups that also exist, such as *jäsnə ‘joint’ > *jȯz.]

pp. 90–96: Etymological commentary on words with *ee [§]
Some etymological fine-tuning is required again, such as an extensive discussion of words in the semantic area of ‘turn, twist, tie, wrap, rotate, round’, and another new loanword proposal: Livonian kēv ‘mare’ ~ Eastern Sami *kiəvə ‘reindeer cow’ ← Indo-Iranian *gāw- ‘cattle’ [as already previously covered on this blog].

pp. 96–98: Exceptions [§ 7.5]
Some discussion on two remaining exceptions to LL in seemingly native vocabulary, namely *panë- ‘to put’ and *ääni ‘voice, sound’. One exception for both of *a and *ä should surely not suffice to disprove a soundlaw, though I consider some [frankly ad hoc] possibilities to explain these away too.

p. 99: Chronology [§ 7.6]
A summary of the rise of vowel length in Finnic, as worked out above in sections 7.2.–7.4., and its relative chronology compared to some other early Finnic innovations in either vocalism or consonantism.

p. 100: Closing Words [§ 8]
[Nothing to comment directly on here, but I must say I never noticed before how my thesis seems to have a very round-numbered structure — exactly 100 pages of main contents, plus with Lehtinen’s Law, maybe the key result I build on, being reached and named exactly halfway on page 50.]

pp. 101–111: Literature [§ 9]
Some statistics:

  • Total sources cited: 188
  • Total authors cited: 93
  • Source languages: English, German, Finnish, Hungarian, Russian & one appearence each of Estonian and Swedish
  • Oldest sources: Anderson (1893), cited for a minor etymological detail; and Setälä (1899), a collected edition of a two-part work originally from 1890/1891.
  • Most recent sources: Guillaume & List “(forthcoming 2018)”, actually finally only published early this year; and Kallio (2018), published some five months after my thesis, and changed from “forthcoming” to published only in the online fine-tuned version.
  • Most cited journal: Virittäjä, with 18 citations spanning from 1934 to 2013. (Regardless I’ve not followed the conventional abbreviation as Vir.: the name seems short enough to me already, especially compared to the likes of SUSA = Suomalais-Ugrilaisen Seuran Aikakauskirja.)
  • Authors with most sources cited: Ante Aikio (11½) closely followed by Erkki Itkonen (11), who I could perhaps declare the main protagonist and main antagonist, respectively, of the thesis’ research history section.
  • Author with highest relevance to sources cited ratio: Meri Lehtinen, now with a soundlaw to her name, even though she seems to have never published any research at all other than the one 1967 article.
  • Source with least relevance: Bańczerowski (1972), cited for minor etymological detail that I’ve by now found out is actually taken from Otto Donner’s old comparative dictionary (1874–1888) without proper referencing.
  • Technically unpublished: Häkkinen (2007), perhaps the most cited Master’s thesis in Uralistics so far, and Aikio (2013a), an unusually thorough conference handout (though both of these can be found available online if required)
  • Weirdest title: Tálos (1984), typographically complex enough to not be reproducible in this post.
Musings on the sociolinguistics of dialect levelling

In Probing the roots of Samoyedic I note that already the clear fragmentation to separate languages demands a deeper age for Samoyedic than for the other Uralic subgroups. H.-W. Hatting asks in the comments a good argument-sharpening question, and writing my answer out in detail may fit better in a new blog post entirely.

His comment goes:

For that argumentation to work, the elimination of intermediate dialects in a continuum by replacement by other dominant dialects or by other languages would have to depend on the age of the language family, while it actually is caused by socio-economic factors that have nothing to do with that age. If Finnish would have been replaced starting from the Middle ages by (say) Swedish and Russian on a much larger scale than actually happened and at 1800 only a handful of dialects from extreme ends of the continuum would have survived, wouldn’t that look similar to Samoyedic? Or am I misunderstanding what you are saying here?

Very roughly, I’d say the issue is that a “fracturing” sociolinguistic phase does not yield itself very well to the accruement of further diversity across a language family. But we will also need to consider what are possible models for the fragmentation of Samoyedic at all.

At first pass, the biggest difference between this Finnish hypothetical and the situation in Samoyedic is of course that if an intrusive language has recently broken up a dialect continuum, we should be able to see the intruder. Yet Russian comes too late on the scene to be blamed for any of the boundaries between the six Samoyedic main groups. Take e.g. take Enets and Nganasan: they have “always” been right next to one another as far as the (meager) historical records go. Any possible transitional dialects must’ve been directly levelled out by Enets and Nganasan themselves, and/or before the Enets and Nganasan lineages arrived on the Taimyr peninsula. I don’t know of any other language either that could be plausibly blamed for splitting Selkup and the northern Samoyedic lineages off as their own groups. Sayan Turkic is the likely reason for leaving Mator and Kamassian in isolation… but as the two show no evidence of forming a common subgroup (their common isoglosses are all arealisms shared with Turkic!), even there this cannot be the only explanation.

This in mind, what looks to me like the fastest possible way to break Samoyedic apart into discrete lineages would be to assume that, after some initial dialect continuum development, some unknown language arrives on the scene and assimilates most of the core Samoyedic area. Then, once transitional varieties are on the way out, a handful of surviving marginal Samoyedic dialects re-gain momentum and expand again to become Nenets, Selkup, Kamassian–Koibal, etc.

The diversity we can observe today would mostly come about during the first and the third phase, and should sum up at least to the same two-ish millennia that less diverse families like Finnic and Samic show. But we also need to add some centuries for the whole fragmentation phase. Linguistic innovations would not be diffusing anymore between the late Common Samoyedic dialects at this time, slowing down the “native” component of language change. Superstrate influence can build up at a clip, but this should be roughly the same everywhere, not creating any substantial isoglosses.

The mystery intruder language does not need to go itself later extinct, at least not under the second expansion phase of Samoyedic. Something like Proto-Turkic could hence also work, with the end result being that none of the Samoyedic languages is anymore spoken anywhere near the original Proto-Samoyedic homeland. (Although in this particular case I’d expect to see much more Samoyedic influence ending up in the Turkicized varieties. Come to think of it, how much traces has the known Samoyedic substrate left in Sayan Turkic?)

Another option is that language boundaries within Samoyedic could be mostly “native”, based on secondary expansions like that of Tundra Nenets. This I think would fare even worse for a shallow dating, however.

I already expect linguistic expansions to be mostly serial rather than “explosive”. Cases like the European colonization of North America are rare (even that shows some spurts and lulls if you look closely). Since Finnish was raised as an example, it alone has gone through at least four major expansion phases:

  • the initial phase, circa 0 CE, in establishing presence in the Kumo and Aura river valleys (likely at the expense of early Germanic speakers);
  • the early middle ages’ agricultural expansion westward (probably mainly at the expense of early Sami varieties, tho I wouldn’t rule out some older “Lakelandic” substrate groups still being around too);
  • the northwestern expansion around the Bay of Bothnia and into Lapland starting from the 15th century (again at the expense of Sami);
  • and the Savonian expansion from the 17th century on, backed by the huuhtakaski method plus tax benefits from the Swedish crown (mopping up the last remaining Sami groups of southern Finland in the process).

Expansions always have some socioeconomic motivations too, which in almost all cases take time to come about or dissipate. (Eastern Finnish could only end up realistically marginalized and thoroughly splintered by 1800 if the main Savonian expansion never happened at all.)

But we would also need to explain how serial expansions could lead to fragmentation into separate languages entirely in the case of Samoyedic, unlike cases like Finnic. When a secondary expansion within a language family comes head-to-head with closely related dialects, the typical result is dialect mixing, not complete levelling. The Savonian expansion shows this well: towards its western edges, the northern Tavastian dialects of central Finland and the central Ostrobothnian dialects close to the coast (up as far as Oulu) end up picking up several Savonian / Eastern Finnish features, but also retaining several western features. The result is more diversity, not less.

Family-internal preliterate expansions [1] that eliminate dialect diversity can still happen: examples that come to mind are Slavic levelling out West Baltic, Dutch & Low German splintering Frisian, and indeed, Tundra Nenets levelling out Yurats. I get the impression however that this requires a more specific sociolinguistic setup. This would seem to only happen when a language variety expands so that it comes newly in contact with a relative that’s already different enough that they cannot re-converge back to a continuum anymore. Within a timespan of some two thousand years, most language families do not manage to pull this off even once! Assuming for Samoyedic some three-to-six of this event right in a row would be pretty unparsimonious. It seems inescapable to me that we would also in this scenario have to allow more time for Samoyedic to “stew”… so that e.g. a Proto-Selkup expansion can run into Para-Nenets or Para–Kamassian dialects that have already developed in rather different directions, and assimilate them without much trace of their earlier affiliation.

[1] By far the easiest way to level out dialect diversity is to roll in mass literacy + a standard literary language. This is not even remotely applicable to Samoyedic, of course.

Etymology squib: quəččə

A nice discovery: today I ran into a proposal in Róna-Tas’ “Turkic Influence on the Uralic Languages” (The Uralic Languages, 1988) that Mongolian qota(n) ‘fence, town’ might be an old loan from early Selkup through early Kyrghyz. Indeed, there is a Selkup word that Janhunen gives as qëtty ‘town’ and derives from Proto-Samoyedic *wåč → *wåč-əjə ‘fence’.

On the other hand, this qota(n) has been long also included in the whole ‘house, hut’ Wanderwort bundle (stretching from ocean to ocean: English hut to Ainu kotan), which includes also Uralic *kota. And there’s a consonant mismatch in Selkup that points rather in this direction! Alatalo gives (and does not connect with each other) the common Selkup forms *kuəču ‘tributary’ (#1903), [1] but *quəččə ‘town’ (#1912), with the velar / uvular contrast clearly attested also in the descendant dialects. Since Selkup *quə- is usually from Proto-Samoyedic *kå- rather than *wå- (e.g. *quət- ‘to kill’ < *kåə-tɜ-, but *kuətə- ‘to raise, grow’ < *wåtå-), we could trace the latter also back to a PSmy *kåt¹ɜ < PU *kota. The semantic development ‘fence’ > ‘town’ is maybe common enough, but could be here only an accidental similarity: no other Samoyedic languages seem to show this.

PU *t >> Selkup *čč [2] looks off, but not extraordinarily so: there are other examples of evidently secondary *č in Selkup too, most prominently PU *sënə > PSmy *t¹ën > Selkup *čën ‘sinew’. Maybe contamination from the ‘fence’ word is possible…? especially if Mongolic also shows this sense (though a quick look at Tower of Babel does not mention it).

In any case we seem to end up with the following results:

  • we have cleared out one of the exceptional cases where *w- supposedly > *q- in Selkup;
  • *kota ‘house’ does have a reflex even in Samoyedic;
  • the Turkic and Mongolic words can be also derived already directly from Proto-Samoyedic, or even outright Proto-Uralic, without needing to wait for *w- > *k- in Selkup.

[1] Per -u this is surely derived though. Apocopated forms like Tym kuədž ‘dam’ probably reflect more original *kuəčə < Proto-Samoyedic *wåčə.
[2] It seems to be not really dateable if an irregular *t > *č shift took place in Proto-Samoyedic, in Proto-Selkup, or even in just Southern–Central Selkup.

Probing the roots of Samoyedic

Last year I participated in a fruitful session on loanwords from Turkic into Samoyedic. I am now honored to see that the final article — P. S. Piispanen 2018, Turkic lexical borrowings in Samoyed, Acta Linguistica Petropolitana 14(3) — ends up incorporating + crediting in detail several of my suggestions. [1]

I would like to add here some detail on one of my four views to have made it into the paper. Footnote 4 mentions my proposed date of as far back as 3000 years of age (= 1000 BCE) for Proto-Samoyedic. This is not directly built on just my WIP database of Proto-Samoyedic though: it’s also informed by morphology and phonology. [2]

Samoyedic does seem to be the most internally lexically divergent branch of Uralic. We often find native Uralic roots continued in just 1-2 languages, [3] in contrast to the situation elsewhere in Uralic, where a native Uralic etymology also predicts good dialect distribution. This fact alone could probably be explained as some kind of a serial-substrate effect though: suppose substrate 1 in Proto-Samoyedic leaves an effect of replacing some Uralic core vocab, substrates 2a and 2b some more in Proto-Selkup and common Northern Samoyedic, some third-generation substrates still some more in Nenetsia, Taimyr, etc.

But it is also the case that Samoyedic is clearly divided in at least six branches with clear boundaries of intelligibility between them. This is quite different from all the other eight Uralic “main” branches, which all show dialect-continuum structure. Maybe the only other really clear within-branch language boundaries are Livonian vs. rest of Finnic, and Udmurt vs. Komi (although also later dialect shuffling has created other opaque language boundaries like Northern vs. Skolt Sami, or Standard Finnish vs. Standard Estonian). In Samoyedic, although there is a base layer of some old crisscrossing isoglosses, and probably late areally shared phenomena, all six of the Samoyedic groups also have a large share of unique distinguishing innovations. E.g. in consonant phonology:

  • Nenets(ic): *nt > n
  • Enets: *-C > -ʔ (general)
  • Nganasan: *ŋt > jt
  • Selkup: *j *w > *ć *k
  • Kamassian: *NP > *NN (general)
  • Mator: *kʲ × *sʲ > k

This selection is also not at all unique. Similar lists could be built also of solely vowel phonology, or inflectional morphology, derivational phonology, core vocabulary, loan vocabulary, notable semantic shifts — pretty much any one component of language. This is the key point that I see putting Samoyedic one “grade” ahead of the historical development of the other subgroups of Uralic. Within a group like Finnic or Khanty, no obvious taxonomy of this sort is possible. We can chart out a bunch of prominent local innovations (Western Finnish *ð > r, Southern Khanty *ɬ⁽ʲ⁾ > t⁽ʲ⁾…), but usually not even cover the dialect area by these, let alone divide it. There are always transitional dialects either lacking or overrepresenting putative branch-defining innovations. More damningly yet, in dialect continuum cases there’s not much coherence between the “phonological branching”, “morphological branching” etc. E.g. the plural genitive isogloss across Finnic (west *-den ~  east *-i-den) ends up being just one isogloss among a dozen or so that have been proposed as grounds for a primary division of the group.

Quite feasibly transitional Samoyedic varieties once did exist, but died out eventually, due to other groups such as Russians, Yakuts, Evenkis enroaching on the rather extensive Samoyed area; or due to “secondary expansions” within the family itself. (Yurats works as a proof-of-concept, assimilated to Tundra Nenets rather than Russian.) This does not make for a counterargument, though, since we by now see the same process playing out within the “younger” groups as well: all but Northern Mansi is gone, Southern Khanty is gone, Kemi and Akkala Sami are gone; Ume, Pite and Sea Sami are moribund, Votic and Ingrian are moribund; many traditional Finnish and Estonian dialects are rapidly assimilating into the standard. Assuming that the Samoyedic expansion ran out of steam (turned into a recessive, low-status language family) much faster than others sounds unwarranted too, especially given that it has in the end reached a much wider area than the other Uralic subgroups.

We do not have any historical-philological evidence for dating the early stages of Samoyedic. But we can do the same with Samic and Finnic, by leveraging the well-known history of Germanic (and even Latin) through loanword evidence. The results come out, in both cases, as showing that the first isoglosses within S and F start appearing already in the second half of the first millennium BCE, and clear dialect areas have been established by 0 CE, though many common innovations continue to diffuse across the dialect area until as late as the first major round of Slavic influence circa 1000 CE. [4] We know dialect continua can fracture into multiple clearly distinct languages quite rapidly (most of Finnic was still a single dialect continuum circa 1900, and is looking headed for just a handful of surviving discrete daughter languages by 2100) — but we also know Samoyedic was “discrete” already as early as about 1800. As a conservative estimate, I’d therefore then add about 400 years more age for Samoyedic. This adds up to a minimum age of about 2600 years BP for Samoyedic, which I then round up to the accuracy of one decimal, due to the numerous uncertainties involved.

This is all still a lower age limit. The only real upper limit seems to be that Samoyedic was still a single dialect continuum by the time of contact with Proto-Turkic, usually dated somewhere around 0 CE… but “standing” dialect continua can easily reach ages of a millennium or two! So 3000 BP really isn’t even a maximally bold suggestion. A pitch like 4500 BP would however start to have further implications: I’d obviously also have to backdate Proto-Uralic closer to the traditional 6000 BP than the recently proposed “shallow” chronologies branching off only at about 4000 BP.

Proto-Samoyedic also seemingly shows substantial general divergence from Proto-Uralic, but this does not mean that a “long” chronology would demand an outright Mesolithic dating for PU. Again as seems to be the case also in Samic and Finnic, various pan-Samoyedic innovations could be also re-dated into their common dialect continuum phase. Helimski’s vowel system updates (retained *a, *ä, *e in PSmy rather than Janhunen’s *ä, *e, *i) already point in this kind of a direction, as does the phenomenon of native Uralic roots being often restricted to a single Samoyedic language (this means that many may have been lost in parallel in all). I think two likely additional candidates are the sound change *ľ > *j, found even in isolated loans from Tungusic; and “coaffix insertion” into the local cases, which has been long known to have proceeded differently in Nganasan than in the rest of Samoyedic (and as I’ve recently learned from Valentin Gusev, Nenets and Enets have some quirks in this too in the possessed paradigm).

I will readily admit that none of the above discussion takes any direct archeological evidence into consideration. Again (cf. footnote 2), this is intentional. Archeology cannot date languages, not even identify them: it can only create a sociohistorical backdrop that we can attempt to pin language expansions on. At a pinch, all that really happens here is that we draw one directed graph indicating known relationships of archeocultural descent and influence; another directed graph indicating known linguistic relationships; and attempt to fit the latter as a minor of the former. If culture A begets B which begets C, a priori it would not be parsimonious to assume A, B and C to have all spoken different languages entirely; but it may also prove necessary to fit other pieces of the big picture in. If the proposed language of culture B has clear contact influence from language L, we’d like to assign also L to have been spoken in a culture that was actually in contact with B. Everything else, e.g. cultural reconstructions on Proto-Samoyeds as copper traders or reindeer nomads or hunter-gatherers or what have you, comes downstream of linguistics/archeology pairings based on the “topology of chronology”.

The recent decades’ paradigm shift on the origins of Finnic and Samic is again instructive, I think. The same language expansions were varyingly pinned on multiple known material-cultural expansions, with details filled in with assumptions where necessary. What had changed was not the archeological evidence: the new picture emerged due to new linguistic evidence, with results such as the early divergence of South Estonian and Livonian, the existence of a para-Sami substrate across most of Finland and far east into Russia, and the unviability of a common Finno-Samic node (itself done in maybe primarily by loanword research showing many “Finno-Samic lexical innovations” to be loans back-and-forth, or in parallel from Indo-European). These changed the topology of the Uralic linguistic family tree enough that it could no longer be fit into the “archeological family tree” in the same location.

And for Samoyedic, we don’t have a clear enough picture of this area of the family tree yet. There’s no consensus model for the branching of Samoyedic, nor for its splitting from Uralic. Those who side with an East Uralic group will be able to find a roughly suitable archeological assignment for it; so will those who side with a Finno-Ugric group; etc.

The fact that language does not have to coincide exactly with culture also helps to create a lot of wiggle space here. For one, linguistic descent can happen also through cultural “contact”, rather than cultural “descent”; for two, linguistic splits can happen invisibly, without any corresponding cultural split (especially if we’re talking about just basic dialect diversification); for three, cultural expansions can pull along multiple linguistic lineages at the same time. The last two in particular combine to form a situation where even if we could match cultural and linguistic lineages accurately, we still cannot use splits in one to date the splits in the other. I believe this is indeed the case in Samoyedic. There is strong archeological evidence to assume that Northern Samoyedic arrived on the Arctic coast only in the ballpart 1000 years ago; [5] but this does not allow us to conclude that the language spoken at the time was really unified Proto-NSmy. I would think that at minimum a pre-Nganasan dialect and a pre-Nenets-Enets dialect already existed separately at this time, to allow for certain cases where Nenets-Enets shares isoglosses with southern Samoyedic branches like Kamassian or Mator. Perhaps more varieties yet, existing first as clan or family dialects before ballooning into full-blown languages.

I do not believe I am ending up with a radically different approximation for the age of Samoyedic from previous researchers — e.g. Janhunen in his 1998 handbook article guesstimates that “proto-Samoyedic seems to have dissolved as recently as the last centuries BCE”, i.e. in the same millennium as my conservative assumption does (or, for what it’s worth: Blažek’s recent glottochronological calculation comes out at 250 BCE). But as comes to the deeper end, I do make one methodological basic assumption that I do not think other linguists always properly appreciate: a proto-language is by definition unitary, and it is broken up already by the first emerging dialect isogloss. Not upon the emergence of more major division lines such as daughter ethnicities (identities are malleable and can easily also re-coalesce), or “language-type” rather than “dialect-type” boundaries (whatever that may mean), or loss of mutual comprehensibility (not a binary distinction anyway). A proto-language only has its strong methodological value if it is reserved for the truly common ancestor, a stage that precedes the rise of all areal variation; otherwise we lose the ability to reconstruct innovations, and can always appeal to almost any arbitrary modern variation having “already existed in the proto-language” (so, ever since humans first invented speech?). All isoglosses have a finite age, and when we seek to date a family’s break-up, we are seeking to date the oldest isogloss observable within the family — or at least, the oldest theoretically somehow dateable isogloss. And it is these roots that I believe could run quite deep compared to the conservative approximations.

[1] Really I wonder if I should start keeping a list of publications I have been credited on. Eventually this would be pointless I’m sure, but as an early-career researcher, maybe not…
[2] Comparative syntax, especially clause-level, I must admit I know roughly jack shit about (in general, not just re: Samoyedic). This is an intentional omission of effort: maybe my core subfield is comparative phonology, which does not have much overlap with syntax at all. At most there would be third-degree repercussions through morphology / classification / areal linguistics, hardly any more than from fields like paleography or folkloristics.
[3] Examples (far from an exhaustive list): PU *uwa ‘flow’, *kuwakka ‘long’ reflected only in Nganasan; *ekä ‘big; father’ only in Enets; *muja- ‘to smile’, *säńćä- ‘to stop’ only in Nenets; *kajə ‘hair’, *këččə ‘bitter’ only in Selkup; *porə- ‘to eat’, *suwďa ‘finger’ only in Mator. Works the other way too: I’ve a list of the most widespread Uralic vocabulary, and their average distribution across Samoyedic, when present, seems to be clearly lower than across any other branch.
[4] I may do a fuller post on this eventually, but I believe the supposed “Slavic loanwords in Proto-Finnic” like *pappi ‘priest’, *risti ‘cross’ well postdate the breakup of PF. Several other loanwords from essentially the same phase of Slavic already show dialect divisions existing: mainly via differing sound substitutions, such as *netäli ~ *nätäli ‘week’, *värttinä ~ *värttenä ~ *värttänä ‘spindle’, *šauki- ~ *šaukë- ‘pike’. A few cases like ‘priest’, ‘cross’ may appear uniform just due to their phonological simplicity, therefore making up a case of what I call “convergent parallel loans“.
[5] Dated more accurately actually, but I do not have the details on hand.

Recontextualizing Mansi

Currently I’m looking a bit into older research on Mansi. Coverage on the language has not been optimal in the past, mainly due to most of the existing field research materials being rather slow to be released. The main sources on no less than a 100+year-delay! — Bernát Munkácsi’s 1880s records coming out in dictionary form in 1986, Artturi Kannisto’s 1900s records in 2014, and Antal Reguly’s 1840s records I’ve not seen any decent edition of at all. I think this has left etymological research in particular in a limbo. Mansi specialists with direct access to one or more of these field research corpora (e.g. Steinitz, Liimola, Kálmán, Honti, and of course Munkácsi and Kannisto themselves) have for long been able to dig out comparisons and publish their findings, but us more general Uralicists not so much.

Many of these Mansi specialists have also been working with Khanty, whose primary comparative lexical source, K. F. Karjalainen’s dialect dictionary likewise built on 1900s field research, came out already in 1948, making the language more accessible for investigation. This has, I believe, led to a kind of an “overlooked middle sibling” status for Mansi, creating a more Khanty-colored picture of the language’s history than is warranted. Comparisons between the two languages are much more readily apparent than more distant cognates. Yet it can be also suspected that many of these are not common Ob-Ugric inheritance, but rather newer loans (Ms → Kh, Kh → Ms, or from some common third source). We also know of a cautionary example from the western end of the Uralic family: untangling Finnic loans from true cognates, with the help of more distant relatives, has been integral to working out the history of Sami. This line of work has by now revealed that just about all especial commonalities between Finnic and Samic are either archaisms, loans, or areal, and that from a proper cladistic point of view, a Finno-Samic subgroup is really no stronger supported than some different hypotheses such as Finno-Mordvinic would be.

For Mansi and Khanty, this work has so far not been done … but I strongly suspect the results would have a similar lean. Extensive areal sharing of some secondary isoglosses is already well-documented along the Mansi–Khanty contact zone. There are also a number of known Mansi–Hungarian and even Khanty-Hungarian isoglosses, as well as several “Proto-Ob-Ugric innovations” that appear essentially out of the blue.

These considerations suggest some steps for going forward. One that could be done without too much trouble with just the existing materials would be to “re-root” the historical phonology of Mansi in Proto-Uralic. E.g. as has been established at least since Sammallahti (1988) (more debatably already since Steinitz 1944), the regular reflex of Proto-Uralic *ä in Mansi is *ää — a development that surely represents simple qualitative retention, and not a detour through a Proto-Ob-Ugric *ee (as per Honti) or *eä (as per Sammallahti). Corresponding mid *ee in Khanty is most likely an independent innovation (likely even post-Proto-Khanty, as per the reanalysis due to Tálos of Surgut Khanty /ä̆/ as more original than other varieties’ /e/).

But etymology will require work too. A Mansi analogue of Steinitz’ comparative-etymological dictionary of Khanty would be quite desirable, now that the main sources are finally out and available for easy consultation. This would doubtlessly take an additional long while to assemble though. Also, from the comparative Uralist’s view, this would involve lot of work being spent on clearly secondary material: compounds, derivatives, relatively recent Russian and Tatar loans, etc.

I have at this point a shortcut of sorts in mind. The Munkácsi and Kannisto materials have been the main sources for comparative research on Mansi for the last 140 years, and we might assume they have been already reasonably mined through for comparative purposes. They’re far from the only materials on Mansi though. Older collections could be still expected to maybe have some archaisms in them that have been lost in later times. We again know from precedent that this line of research is likely to bear some fruit. On historical phonology, the 1970s-80s “Hungarian school” (L. Honti, K. Rédei, E. Sal) revamp of Proto-Mansi reconstruction has been based on 18th-century records that show some retained word-final vowels, pointing to stem-type contrasts CVCə | CVC and CVCCə | CVCəC (from the 19th century on, collapsed to just CVC and CVCəC). This then can be leveraged for some reanalysis. — On etymology, there is so far at least a small 1991 article by Katz: “Altsüdwogulisches” (FUF 50), [1] which identifies from 18th-century records previously unknown Mansi reflexes for PU *kota ‘hut, house’ and Indo-Iranian → Ugric *täjɜ ‘milk’.

The 18th century materials are, alas, still not well-documented in print. The Hungarians mainly refer to a manuscript Altwogulische Dialekte by J. Gulya, which I believe ended up never being published (though some of the data is briefly covered in his articles in NyK 60 and 62). So I’m casting my hopes into the 19th century instead. There is too at least one smaller primary source to have been released relatively timely: A. Ahlqvist’s materials starting since the late 1850s, a wordlist of which was released in 1891, as the second SUST volume Wogulisches Wörterverzeichnis (and by now available digitally; also on, IMO in better scan quality than the National Library of Finland version). The usability of this data is limited somewhat by various dialect forms being given without specifics — perhaps Ahlqvist’s original records would have this info? — but with modern Mansi dialectology in hand, the big picture is clear enough. I am not aware of any later reappraisal of this material, and it seems likely that a close look could turn up some new etymological insights.

As a promising initial result, from the A section I have already run into an entry aidentantqtam ‘to vomit’. As Ahlqvist seems to render unstressed schwas varyingly as a, e, i, , [2] as well as coda /ɣ/ often as a vowel i or , we can thus see this as a reflex of PU *oksənta- ‘to vomit’ > PMs *aaɣtəntə- (showing several regular developments: *o-ə > *aa, *s > *t, *kC > *ɣC).

In overall phonology it is also interesting to note how, while most of Ahlqvist’s data seems to be Western Mansi, he has also numerous forms showing the Northern Mansi development *ä > /a/ (e.g. mań ~ mäń ‘daughter-in-law’, ńäl ~ ńal ‘handle’; notice also the inconsistent lemmatization), sometimes quite tellingly further combined with also typically Northern *š > /s/ (sam ~ šäm ~ šem ‘eye’). Yet, his examples of the combination *kʷä- show uniformly only küä-. In newer Northern Mansi this has undergone a shift to /o/, starting from Munkácsi’s materials, but no sign of this appears in Ahlqvist’s materials. Perhaps this is then indeed independent from the usual NMs shifts *ä > /a/ and *a > /o/ (it could be otherwise routed through either), and has instead proceeded as something like /kʷä/ > *[kʷɞ] > /kʷo/ > /ko/?
Edit 2019-01-11: nope: one doublet jelpi̮l-küäl ~ jalpi̮l-kol ‘church’ (lit. ‘holy house’), already seems to show the native NMs reflex. There is also plain kol, though given separately, not coordinated into the same entry with the WMs form küäl.

[1] Why specifically “süd” is unclear to me, given that some of his forms are clearly Northern Mansi.
[2] Theoretically some of this variation could represent real vowel contrasts, neutralized in later times, but that will require a more systematic look at the data, maybe with dialect division included.

CIFU 13 announced

The 13th International Congress for Finno-Ugric Studies, to take place in Vienna in August 2020, is now fully announced: symposia have been settled and paper submission is open. Most people who would be interested in participating likely have gotten also the usual email circulars, but perhaps some readers will be reminded by this post; maybe even to just pop by for a visit to listen to some presentations.

I will be participating too of course. Perhaps with more than one presentation this time, even (but no promises just yet).

The treatment of /f/ in Finnic

Loanwords from Germanic and, more recently, Russian have been feeding *f into Finnic for a good while. Today /f/ has been established as a loanword phoneme in most Finnic varieties (including, I think, all of the literary standards), but for most of the last 2000 years, the consonant has been adapted into native Finnic phonology in various shapes.

Five substitutions are usually recognized:

  1. *f → /p/
    Mostly in oldest loans from Proto-Germanic or Proto-Scandinavian. The oldest examples could feasibly even precede Grimm’s Law, and therefore actually involve *pʰ → *p (the likes of *pëlto ‘field’). Others can be dated as slightly later, e.g. *pasto ~ *paasto ‘fast’ ← Gmc. *fastōn-, showing the probably relatively late *ā > *ō. A few examples are found even in much more recent loanwords such as Fi. porstua ‘porch’ ← Sw. förstuga or förstuva; Fi. upseeri ‘officer’ (perhaps since expected **vo and **hs are or were not phonotactically possible).
  2. /f/ → ∅
    In initial consonant clusters, e.g. Fi. läski ‘(pork) fat’, riski ‘strong’ ← Sw. fläsk, frisk.
  3. /f/ → /v/
    This is found initially (Fi. vaari ‘grandfather, old man’ ← Sw. far ‘father’) and after a consonant (Fi. konvehti ‘confectionary’). I suspect the switch from the first substitution pattern to this marks the onset of *w > [ʋ]. This may have been completed only after Proto-Finnic, since several dialects of Finnish have been recorded even in the 20th century with [w] adjacent to rounded vowels: kuva [kuwa] ‘picture’, vuosi [wuosi] ‘year’, vyö [wyø] ‘belt’ etc. Dialectal variants such as kasva- ~ kasua- ‘to grow’, kivi ~ [kiw] ~ [kiu] ‘stone’ could also speak in favor of *kaswa-, *kiwi and not **kasva-, **kivi as the starting points. Likewise the Estonian metathesis *Vuh > /Vhv/, more easily rewritten as *wh > *hw.
  4. /f/ → /h/
    Found initially preceding a labial vowel (Fi. huotra < *hootra 'scabbard' ← Gmc *fōdra-) and word-internally preceding a consonant (Fi. luhti ‘loft’, sahrami ‘saffron’, uhri ‘sacrifice, offer’).
  5. /f/ → /hv/
    Found between vowels, e.g. Es. Fi. sohva ‘sofa’, Es. kohv ~ Fi. kahvi ‘coffee’ (contrast though Livonian and dialectal Fi. kaffe, Karelian koffi ~ koofi ~ koufi etc.)

Altogether we have, in the newer layers, /h/-substitutions for preserving voicelessness, /v/-substitutions for preserving labiality and continuancy, and /hv/ for covering both.

There’s however also a sixth that I have usually not seen mentioned: substitution as /uh/ (~ /yh/), unpacking the consonant in the opposite order from the kahvi type. At least two other examples appear to be known. One is the Russian loan ‘kaftan’: Fi. Krl. Izh. Vot. kauhtana, Ludian–Veps kauhtan. (Estonian has rather kahvtan.) The other is Fi. Krl. Izh. tiuhta ‘reed; awn’ ← Gmc. *stifta- (> Sw. stift), with the sound development remarked on in LÄGLOS in the word’s entry (in the 3rd volume), but not in the foreword overview of sound substitutions (in the 1st volume). I think a few additional examples could be adducible too:

  • Fi. Izh. vyyhti ‘weft’ (← Gmc. *wifti-), whose vowel length is usually attributed to sporadic lengthening before coda /h/ and labialization to irregular influence from /v/. But Karelian shows viyhti; I think this is likely to be more original. In Finnish and Ingrian, evidently *iü > yy. Finnish and Karelian dialects plus Ludian show also viihti ~ viihť, which could be instead the real example of secondary lengthening (but also maybe a parallel development of *iü).
  • *riuhto-, *riuhtat- ‘to rip, tug’ (Fi Krl Izh Lu), with a variant reuhto- in Finnish. Maybe a derivative from Germanic *rīfan- (> Sw. riva) or *reufan (> Eng. reave) ‘to tear’? For *t-suffixed forms, I only know of the noun rift though.
  • Fi. töyhtö ‘tuft, crest’, Krl. töyhäkkä ‘fluffy’, [1] töyhistyö ‘to puff, bristle up’ (probably ← Fi, per öy). Has immediate resemblance with the English, though Scandinavian only seems to have an s-affixed variant tofs (→ Fi. tupsu ‘tuft, tassel’). Looking at Low German could maybe turn up a suitable loan original?

It can be noted that all of these examples occur in the context *-Vft-. This is probably not an accident: **-fk- does not occur in Germanic (possible enough in Russian though, and giving e.g. colloq. Fi. lafka ‘store’ ← Ru. лaвка), while **-VUhR- does not seem to occur in Finnic. A few rare examples of -VihR- can be found, but usually with simplified variants alongside: in Finnish e.g. (standard form first) kaisla ~ kaihla ~ kahila ‘reed’, laina ~ laihna ‘loan’, raihnas ~ raina ‘decrepit, geriatric’, saiho ~ saihvo ‘corral, pen’.

[1] Krl. töyhäkkä has also a 2nd sense ‘haughty’, which together with töyhteä ‘to fuss about’ are probably better compared with Fi. touhuta ‘id.’, touhu ‘fuss, bustle’ (with typical affective/deminutive fronting).

Dravidian etymostatistics: a rough look

Burrow & Emeneau’s classic Dravidian Etymological Dictionary (DED) has been conveniently available online for a while.

I find the online version a bit too spartan though, at least for browsing purposes: when a dictionary has 500+ pages and 5500+ etyma, one would want to be able to find things a bit more effectively than just leafing through at random. The preface has page numbers for sections by letter sections, but these are, unfortunately, unlinked. The print version has by-language indices, but they have been forgone in favor of a search function in the online version. A search function however only works for finding things one already knows of. Also, the “real” lemma of the entries, according to which they are alphabetized, is actually not even printed anywhere! This is a virtual Proto-Dravidian form (I suspect not necessarily valid for the relatively poorly known Northern and Central Dravidian, so maybe more like Proto-Southern Dravidian). If there is a Tamil descendant, it is usually reasonably close to the virtual PD forms used, but often enough there isn’t.

For some added convenience, I’ve thus put together a page-by-page index of my own, recording the first lemma form occurring on each page, and the first few phonemes of what the underlying reconstruction appears to be (though some of these might well be incorrect).

These 503 page-leading entries (I’m ignoring the Appendix in the analysis below) also work as a random sample of sorts of Dravidian reconstructions, and they allow a rough look at the statistical properties of the data.

For strong results on Proto-Dravidian, full by-language stats on the reflexes would be needed, e.g. to filter out data restricted to particular subgroups. This would be quite a bit of work however. But I have recorded the “lemma language” — the language from which the first reflex is given. DED uses a stable order of languages, and the lemma forms run through this list in preferential order: Tamil if available, if not then Kolami, if not that either then Malayalam, etc. This means it’s possible to get accurate reflexation rates for Tamil from just this single sample.

We can also take a look at the distribution of the lemma languages:

  • Tamil: 353 (≈ 70.2%)
  • Kannada: 49
  • Malayalam, Telugu: both 13
  • Kota: 11
  • Kolami, Kuṛux: both 10
  • Kui: 8
  • Konḍa, Parji: both 7
  • Gondi: 6
  • Pengo, Tulu: both 4
  • Toda: 2
  • Ālu Kuṟumba, Iṛula, Koḍagu, Kuwi, Maṇḍa, Naiki: 1 each

The three other big literary languages unsurprizingly come out on top next to Tamil. Otherwise the order is probably due to factors other than the size and degree of documentation, though. Kolami, as mentioned, is the #2 go-to variety after Tamil, and indeed scores 10 lemma appearences, not far from the larger Malayalam. However Kurux, quite far down the priority list, also reaches the same! This is probably because Kurux is one half of the distinctive Northeastern Dravidian group (together with Malto, which does not appear here), which seems to have a decent amount of unique vocabulary, without parallels elsewhere in Dravidian. Similar cases are Kolami, Kui and Koṇḍa; the first as the largest Central Dravidian language, the latter two in their own distinctive sub-branches of South-Central (or “South II”) Dravidian.

Initial consonants number as follows:

  • k: 100 (Tamil: 72 = 72%)
  • ∅: 99 (Tamil: 73 ≈ 74%)
  • p: 67 (Tamil: 45 ≈ 67%)
  • m: 56 (Tamil: 40 ≈ 71%)
  • t: 54 (Tamil: 35 ≈ 65%)
  • c: 50 (Tamil: 33 = 66%)
  • v: 37 (Tamil: 26 = 70%)
  • n: 27 (Tamil: 20 ≈ 74%)
  • ñ: 5 (Tamil: 5)
  • y: 3 (Tamil: 3)
  • : 3 (Tamil: 0)
  • l: 1 (Tamil: 1)
  • r: 1 (Tamil: 0)

We see here fairly even representation in Tamil, hovering around 70% as could be expected, as well as the phenomenon where only a rather limited selection of consonants have been originally possible word-initially in Dravidian.

For vowels I have counted not just word-initial cases, but rather all first-syllable vowels (so a includes a-, ka-, ca- etc.):

  • a: 152 (Tamil: 107 ≈ 70%)
  • u: 83 (Tamil: 64 ≈ 77%)
  • i: 61 (Tamil: 47 ≈ 77%)
  • e: 46 (Tamil: 24 ≈ 52%)
  • o: 43 (Tamil: 28 ≈ 65%)
  • ā: 44 (Tamil: 33 ≈ 75%)
  • ō: 20 (Tamil: 11 = 55%)
  • ū: 20 (Tamil: 14 = 70%)
  • ē: 19 (Tamil: 13 ≈ 68%)
  • ī: 15 (Tamil: 8 ≈ 53%)
  • (ai: 3, au: 0 — from Tamil only, included here in the a counts)

These now show a fairly different distribution. The cardinal vowels a ā i u ū are represented at about 74% altogether, slightly above the counts from the previous section. The mid vowels as well as ī are by contrast left at only 59% altogether. This difference probably indicates some development of real history. Several possibilities come to mind:

  • maybe Tamil is more archaic, and in the other Dravidian languages, several instances of mid vowels are secondary;
  • maybe the other languages are more archaic, and in (some subgroup including) Tamil, there has been a partial shift from mid to non-mid vowels;
  • maybe the disparity results from differences in post-PD vocabulary that has spread with contacts;
  • maybe the disparity results from the Tamil lexicon being more thoroughly documented, so that e.g. Indo-Aryan “technical” loanwords (likely to have a higher percentage of the cardinal vowels) are better represented.

More detailed comparison would be however required to figure out which, if any, the case is.

Medial consonants are more varied still. I’ve included sub-counts for the nasal+stop clusters and geminates (no other clusters seem to have occurred in Proto-Dravidian; “extended” nasal+geminate cluster series are proposed for PP ~ NP correspondences between languages, but these would take more than just casual eyeballing of lemma forms to identify):

  • : 71 (Tamil: 53 ≈ 75%) (ṭṭ: 27)
  • r: 71 (Tamil: 48 ≈ 68%)
  • k: 47 (Tamil: 32 ≈ 68%) (kk: 18)
  • l: 38 (Tamil: 24 ≈ 63%) (ll: 6)
  • : 35 (Tamil: 24 ≈ 69%) (ṯṯ: 2)
  • : 31 (Tamil: 26 ≈ 84%) (ḷḷ: 7)
  • : 31 (Tamil: 25 ≈ 81%)
  • t: 27 (Tamil: 19 ≈ 70%) (tt: 6)
  • c: 22 (Tamil: 12 ≈ 55%) (cc: 5)
  • : 23 (Tamil: 16 ≈ 70%) (ṇṭ: 13)
  • m: 20 (Tamil: 13 = 65%) (mp: 7)
  • : 19 (Tamil: 18 ≈ 95%) (ṉṯ: 0)
  • y: 17 (Tamil: 13 ≈ 76%)
  • p: 12 (Tamil: 7 ≈ 58%) (pp: 9)
  • v: 11 (Tamil: 8 ≈ 73%)
  • n: 7 (Tamil: 4 = 57%) (nt: 5, including all of the Tamil cases)
  • ñ: 8 (Tamil: 4 = 50) (only in ñc)
  • ∅: 7 (Tamil: 6 ≈ 86%)
  • : 6 (Tamil: 3 = 50%) (only in ṅk)

There are now also some new top scores in representation. The 6/7 count for zero “medials” (in fact mostly monosyllabic roots) is probably just due to pronoun roots & similar grammatical elements often being shorter than proper lexical roots, and being likely to survive more widely.

The coronal nasals seem to indicate a real sound change. Alveolar is highly common by itself, while dental n is absent entirely, aside from the cluster nt (in this sample at least). Cf. that word-initially however only n occurs, not . But it seems likely already from this data that these were allophones of each other at some point. And, quite obviously, no palatal **ñ or velar **ṅ should be recognized as distinct either.

The retroflexes ṭ ṇ ḷ r̤ [ɻ] are fairly strongly represented, while palatals c ñ fairly weakly, but again this could have many explanations.

I’ve taken a brief look at the co-occurrence of these three root phoneme positions too. Nothing really extraordinary turns up, though. *v- plus labial vowel is disallowed, a few phonetically awkward or unstable combinations like *ki- *ke- *-iṭ- *-eṭ- are somewhat rare, Similar Place Avoidance turns up among the C…C combinations. The weirdest-looking total gap is **c…r̤; and on a closer look even this is accidental (the full DED does include a few cases, none just happen to be the first on their page).

