Etymology squib: *puj- ‘back end, point’

In the UEW we find a rough Proto-Ugric reconstruction *pukkɜ ‘blunt end of a tool’, with divergent later semantic development: ‘eye of needle’ in Ob-Ugric, ‘back of hammer/ax/knife/…’ in Hungarian fok. There is reason to suspect though that if related, these words do not go back to a simple bisyllabic root. Mansi *pup could be maybe in principle derived from *puK-p. However, *ɣ in Khanty *poɣ does not correspond to Hungarian k! The normal Khanty reflex of *kk is unlenited *k. [1] This discrepancy clearly shows UEW’s reconstruction to be overly impressionistic. Still, the comparison as such does not have to be abandoned: it can be instead approached as a family of three different derivatives, *puCV-ka, *puCV-pV, *puCV-kka with some lost weak medial consonant.

The identity of this lost consonant has been discovered by now, too. While rooting around for references on Samoyedic etymology, I have found that Helimski in an apparently little-known 2001 paper, “PU *i̮ś- ‘to cause to be, to be’ and some other core vocabulary items in Proto-Uralic”, [2] in passing connects to the Ugric words with a newly set-up Samoyedic *puj ‘blunt end of a tool (eye of needle, back end of sled runner)’. Loss of *-jə- in derivatives is regular in Ob-Ugric; in Hungarian the conditioning might be rather *uj > *u or *jkk > *kk > *k. The meaning ‘eye of needle’ as a Siberian Uralic semantic isogloss is interesting; maybe it is not a common innovation, but rather an archaism that has not survived in Hungarian.

Helimski however does not seem to have noticed that this new reconstruction allows a few further etymological connections too. At least Mansi *puj, Khanty *puuj ‘back part’ is obviously connectable as continuing the underived basic root of all the ‘blunt end’ reflexes. (The Khanty vocalism is, once again, difficult to explain though; maybe it’s a loan from Mansi.) UEW (s.v. *pujɜ) also gives several other generic spatial reflexes from Samoyedic that go back to *puə. Northern Finnic *poo ‘butt’ is often also considered a reflex, but this runs into phonological problems. My expectation would be for *pujə to yield instead *pui or at most *puu.

Given the new evidence of reflexes referring to tools, I would suggest that better Finnic cognates can be found, showing phonological development as expected. For one, there’s the word family including Finnish puikko ‘(narrow) stick, rod’, puikkari ‘net needle’, puikkaa- ‘to stick (in)’, earlier probably something like *’to poke with a blunt tool’. These can be taken as derivatives from a (pre-)PF stem *puikka. It would seem to be an exact equivalent of Hu. fok < *pujə-kka, but different meanings suggest they more likely have been formed independently. A different direction of derivation appears in the adjectives puikea ~ pujea ‘oblong = having a defined end’ < *puj-(k)əta. Furthermore pujo ‘narrow; narrow object’ could belong here (probably as a late derivative within Finnish, something like ancient *pujə-w I’d expect to give **puju). — SSA mentions a different etymology for the *puikka family: derivation from puu ‘tree, wood’ (or rather, from the plural stem pui-), but the supposed semantics in this seem too vague.

Now that there is much more backing across Uralic available, even the old comparison by Setälä of pujo with Samoyedic *pujå ‘nose’ could be rehabilitated, now on the root level: the latter seems to be analyzable as continuing something like *pujə-ja or *pujə-la, roughly ‘pointed thing’ (a relatively typical origin for terms for ‘nose’). The exact details of morphology will have to remain a bit up in air for now though… normally *-ja derives agent nouns, *-la local nouns.

Some phonological considerations on the development of *-Vjə rimes in Samoyedic will be also required, since the divergence between *puj ‘eye of needle, etc.’ and *puə ‘back part’ should be explained somehow. For now I will leave this to a few notes. On one hand, actually many reflexes such as Tundra Nenets /pū/, Mator hu-na- could still derive from either proto-form. It’s maybe conceivable that the two PSmy reconstructions could be just positional variants of each other; e.g. *puj as a self-standing noun vs. *puə- before further suffixes? But on the other hand, at least Nganasan /hüj/ ‘eye of needle, back end’ vs. /huə/ ‘back part’ are difficult to treat in this way. For now it seems more feasible to suggest that, despite looking like an underived basic root noun, the semantically derived *puj actually also goes back to some kind of a derived pre-form; options that would work without too many new assumptions could include e.g. *pujə-ka (akin to Khanty), *pujə-k, *pujə-j.

[1] The longest-known example is probably Kh. *ɭökəmə- ‘to push’ ~ Hu. lök-, Fi. lykkää-.
[2] Published in the workshop proceedings collection Budapesti Uráli Műhely II. This is based on (and covers some, though not all, of the same ground as) the unpublished presentation “Basic Vocabulary in PU and PFU: Remarks to Etymology and Reconstruction” that I’ve seen cited in a few places.

Two steps towards re-rooting Ludian phonology

Historical/comparative phonology of the Finnic languages has reached remarkably thorough coverage already in the mid-20th century. Nearly all major varieties and numerous smaller dialect groups (particularly but not only of Finnish) have had their specific history covered by at least a large article-sized special study, often indeed a monograph. Where there remains more to do, the issue is mostly of patches such as working out relative chronologies, pathways and areal patterns of change, or proto-forms of specific items.

There is however one case where a full rewrite would be warranted: Ludian, treated for ages as a “mixed Karelian–Veps variety”, but recently finally argued in detail by Miikul Pahomov (2017): Lyydiläiskysymys to be in essence instead a more conservative sibling of Veps. Or more strictly speaking, a cluster of such dialects: there seem to be no exclusively Ludian innovations that could be used to define this as a single language to the exclusion of Veps! The definition of “Ludian” has always been by a specific combination of retentions. E.g. going by the most immediately obvious phonological traits, Ludian retains Proto-Finnic *b *d *g as such (per older views: fortites *β *ð *ɣ to stops) (shared with Veps), but also retains long close vowels (shared with Northern Veps and all the rest of Finnic) and diphthongizes rather than shortens long non-close vowels (ie üö uo < *ee *öö *oo shared widely, ua < *ää *aa shared with Karelian and Eastern Finnish). Diphthongization per se remains an innovation here, but it’s too trivially areal to be worth anything for subgrouping. In fact a few sub-dialects even retain ää aa (so do again a few dialects of Karelian and Eastern Finnish), and one of the more poorly documented varieties appears to shorten them, as Veps does. Worth mentioning is also that even the speakers of what is usually called “Northern Veps”, and some of central Veps, in fact call themselves Ludians rather than Vepsians.

So the old two-part monograph on the historical phonology of Ludian by Aimo Turunen — Lyydiläismurteiden äännehistoria I (1946) on consonantism, II (1950) on vocalism — seems to now need an almost complete recontextualization. Perhaps E. Tunkelo’s Vepsän kielen äännehistoria (1946) could use some related updates too. What Pahomov’s work shows is that the Ludian and Veps varieties should be analyzed on one hand together, not separately; and that we should probably attempt a reconstruction of their last common ancestor as well. I will follow Pahomov in using the name “Old Ludian” for this.

A reconstruction of Old Ludian would probably be particularly interesting from a lexical point of view: e.g. how many Germanic loanwords have definitely made it this far by direct inheritance and cannot be treated as mediated by Karelian later on? How many exclusive shared Slavic loans are found? What unique derivatives or semantic shifts are there around? Questions of this sort will be somewhat hard to answer in detail as long as there is no dialect dictionary of Veps though. For Ludian there exists a sizable dialect dictionary Lyydiläismurteiden sanakirja (Juho Kujola, 1944), but on a closer look it is actually heavy only on the northern and central varieties that earlier research calls “Ludian proper” (varsinaislyydi), versus fairly light in the coverage of the southern, more transitional-to-Veps varieties that feature strongly in Pahomov’s argumentation. (He lists several example lexical isoglosses around pp. 163–166 though, but without clearly distinguishing innovations from retentions.) We can hope there to eventually be a dictionary of at least Kuujärv Ludian, the southernmost and today the most viable variety. Still at mere triple digits of speakers though, but aluckily including well-educated language activists like Pahomov.

But I think the new perspective on Ludian would likely force a few phonological reanalyses as well, especially if also keeping an eye back all the way to Proto-Finnic. I cover in the rest of this post two candidates.

1. Final vowels

Apocope is one basic feature that demonstrates well the heterogenicity of Ludian. Generally, all original word-final vowels are lost in northern Ludian, as also in all of Veps; in central and southern Ludian, non-open vowels are partly preserved, while final *a *ä are reduced and surface either as /ə/ (the probable intermediate stage before loss) or as /u/ ~ /ü/ (as in Olonetsian; this has been explained as a fortition from *ə due to the influence of an Old Karelian superstrate).

Original preconsonantal vowels are however uniformly preserved, including vowels followed by Proto-Finnic final *k. Yet, *-k has itself been lost everywhere in eastern Finnic. The fact that a contrast regardless remains in northern Ludian would at first look seem to demand reconstructing preserved *-k for Old Ludian. A few near-minimal pairs from the Sununsuu dialect:

  • *-ak > -a: PF *polkëdak > poɫgeda ‘to tread’
  • *-a > ∅: PF *valkëda > vaɫged ‘light, lit’
  • *-äk > : PF *imedäk > imedä ‘to suck’
  • *-ä > ∅: PF *pimedä > pimed ‘dark’
  • *-Ek > -e: PF *lähtek > lähte ‘spring’
  • *-i > ∅: PF *tähti > ťiähť ‘star’

But this is not the only option, and does not actually seem like the most parsimonious approach.

I would suggest that rather than forms like *polkedak, *lähtek, a good starting point would be a contrast between lax and tense vowels: *polkedà, *lähtè versus *valgedă, *täähtĭ. This finds some degree of evidence already from the central and southern varieties of Ludian, where some records do show reduced final vowels in bisyllabic stems. E.g. Kortaš akkᴇ̆ ‘married woman’, Nuomoiľ & Teru Priäžä ehtᴇ ‘evening’, Viidan & Kuujärv buťkɪ ‘umbellifer plant’, with devoiced ɪ (= IPA [e̥ i̥]), and explicitly marked as short in the first case; but e.g. Kš TP lähte, V Kj lähtə with no devoicing.

An IMO still more convincing parallel is provided by Natalia Kuznetsova’s recent research on Ingrian, most prominently in “Evolution of the Non-Initial Vocalic Length Contrast across the Finnic Varieties of Ingria and Adjacent Areas” (2016, Linguistica Uralica 52(1)), where she demonstrates that it is exactly through this path that some Ingrian dialects end up with the apocope of some but not all final vowels, and where reduced vowels can be observed to be devoiced next to voiceless consonants.

Devoiced reduced vowels have been reported from a third Finnic area showing apocope, too: Southwestern Finnish, cf. Ojansuu (1901: 24–25, 195–197). He however suggests a different mechanism for explaining the retention of final vowels in words that had PF *-k (and also *-h) — presence of a closed syllable “in some cases”. I presume this means mainly sandhi, so e.g. *läktek_meccässä > *lähðem_meθθäsä̆ > lähre mettäs [-mm–] ‘a spring in the woods’. This seems generally possible too, and may be to some degree complementary with the reconstruction of reduced vowels, but this would require an additional general survey of sandhi effects in the involved Finnic varieties.

I would propose reduced vowels can be furthermore connected with the fact that even standard Finnish shows allophonic V2 length difference between words of the shape CVCV and CVXCV. This allophonic pattern surely been also present already in Proto-Finnic, explaining why apocope with an identical counterintuitive restriction to CVXCV wordforms arises in all of Estonian, Southwestern Finnish, Ludian–Veps (ancient in these three), at least one variety of Tver Karelian (more recently), and in some dialects of Ingrian (no earlier than 19th century). Of course some of these may be areally connected, to each other or to common contact languages, but if given a “preadaptation” within the common prosodic system inherited from Proto-Finnic, they do not all need to be.

2. Consonant-stem infinitives

Most of Ludian differs from Veps in not undergoing much syncope at all. Pahomov suggests a few exceptions however, including the infinitives of some d-stem verbs: anda- : ant(t)a ‘to give’, kanda- : kant(t)a ‘to carry’, ruada- : ruat(t)a ‘to toil’; though also andada, kandada occur in central Ludian.

This is syncope alright; but it does not seem to be a specifically Ludian phenomenon. As is know, though perhaps not widely, forms such as *antadak must have had already in Proto-Finnic syncopated byforms such as *attak. [1] This is shown e.g. by Old Finnish infinitives such as lentä- : letä ‘to fly’ < PF *lettäk < *lent-täk < *lentä-täk, lähte- : lätä ‘to leave’ < PF *lättäk < *läkt-täk, tietä- : tietä ‘to know’ < PF *tiettäk < *tietä-täk, not derivable within the specific phonological development of Finnish. In particular the simplification of *ntt to *tt (as seen in the first) is clearly no longer productive. The restriction of consonant-stem infinitives to A- and e-stem verbs but not i-, O-, U-stem verbs moreover clearly suggests that it has arisen earlier than the pan-Finnic development of unstressed *i, *o, *U from *əj, *Aw, *əw. While no **ata, **kata are attested in Finnish (only the regular vowel-stem antaa, kantaa) or anywhere else that I know of, this is likely to be simply due to these forms falling away to oblivion: after all modern Finnish today only knows one formation of this type, tuta ‘to know’, and then only as a fossilized relic in expressions, while the productive infinitive is exclusively tuntea.

Directly inherited infinitives of this type are actually widely found in Ludian. This is not a large surprize, since across the eastern Finnic area they have been reported sporadically from Olonetsian and productively from Veps already since Setälä. Besides ruat(t)a, perusing LMS turns up among bisyllabic d-stem verb roots also at least the following:

  • kieldä- : kielt(t)ä ‘to deny’ (< PF *keelt-täk)
  • kiändä- : kiät(t)ä ‘to twist, turn (tr.)’ (< PF *käättäk < *käänt-täk)
  • kuada- : kuat(t)a ‘to pour’ (< PF *kaat-tak)
  • lendä- : let(t)ä ‘to fly’ (< PF *lettäk < *lent-täk, cf. above)
  • löudä- : löut(t)ä ‘to find’ (< PF *leüt-täk)
  • (? nouda-) : nouta ‘to follow’ (< PF *nout-tak)
  • püuda- : püut(t)a ~ püudada ‘to hunt’ (< PF *püüt-täk)
  • siädä- : siät(t)ä ‘to do’ (< PF *säät-täk)
  • sorda- : sort(t)a ‘to fell’ (< PF *sort-tak)
  • souda- : sout(t)a ‘to row’ (< PF *sout-tak)
  • tiedä- : tiet(t)ä ‘to know’ (< PF *teet-täk)
  • tunde- : tut(t)a ‘to know, feel’ (< PF *tunt-tak)
  • vierdä- : viert(t)ä ‘to burn extra wood at a slash-and-burn field’ (< PF *veert-täk)
  • viändä- : viät(t)ä- ‘to dance; to bend’ (< PF *väättäk < *väänt-täk)
  • uurda- : uurtta- ‘to carve’ (< PF *uurt-tak)

In this light, also ant(t)a and kant(t)a are unlikely to represent irregular late syncope from andada, kandada: they should be instead considered analogical reshapings of inherited *atà, *katà! with simple reintroduction of -n- from the vowel stem. The same analogical reintroduction of -n- is found also in ‘to plow’ (kündä- : künt(t)ä ~ kündädä), ‘to push, send’ (tüöndä- : tüönt(t)ä); and without gemination, a similar analogical reintroduction of -h- is found in ‘to leave’ (lähte- : lähtä ~ lähtedä).

In tA-stem verbs though (i.e. in those original *tA-stem verbs where the preceding consonant was voiceless, preventing voicing), only vowel-stem infinitives seem to occur: (-ht-) ahtada ‘to set up’, kiehtädä ‘to bother’, puahtada ‘to roast’, puohtada ‘to clean grain’; (-st-) kastada ‘to dip’, kestädä ‘to stand’, nostada ‘to lift’, püštädä ‘to stick’; (-tt-) ďiättädä ‘to leave’, keittädä ‘to cook’, ottada ‘to take’, suattada ‘to accompany, transport’; (-ɫtt-) poɫttada ‘to burn’. [2] This probably means that the origin of OFi. lätä, Lud. lähtä is to be dated to an even earlier stratum than the Proto-Finnic reduction of *A in the context *t_t. As is quite probable: Proto-Uralic *läkt(ə)- is also e.g. the only consonant-stem verb in Mari that ends in a cluster of two voiceless consonants, and the only consonant-stem verb known in Udmurt at all.

Lastly a second regular vowel-stem infinitive group consists of iďädä ‘to germinate’, pidädä ‘to keep’, vedädä ‘to pull, draw’: reduction and loss of unstressed *-A- following a light stressed syllable is not expected/precedented in any morphological context at all.

Neither of these examples really ends up changing much about the reconstruction of Proto-Finnic itself. The first is on the phonological level indeed simply a “patch” for the development path of apocope, though it is also one piece of evidence for the reconstruction of allophonic vowel length. The second seems to provide the first attestations pointing even indirectly to the infinitive forms *attak, *kattak, though examples like letä, tuta also already allow hypothesizing such forms anyway. But the take-home seems to be that while the segmental phonology and morphology of PF are well-known by themselves, two areas suitable for further work would be prosody and morphophonology. Both of these incidentally also become much less charted territories when looking further back towards Proto-Uralic.

[1] At minimum; it is conceivable that the vowel-stem infinitives are altogether later analogies.
[2] Old Finnish too seems to show no examples of consonant stems for verbs in -htA- or -sta-. The few examples for verbs in -ttA- could be themselves analogical, sometimes even misinterpretations.

Phonology squib: raate

The standard Finnish word for the buckbean (Menyanthes trifoliata) is raate. This word often appears in overviews of Finnish historical phonology as a supposed example of irregular development of early Finnish *ð. Sure enough, dialect forms like Satakunta rarake, Tavastian ralake definitely point to *raðakeK (where *K ∈ {*k, *h}), while transitional and eastern dialects’ raate ~ roate ~ ruate would be regular from *raðateK. Same goes for Karelian roateh, which appears to identify the word-final consonant as *h. Northern Ostrobothnia also shows a “bridging” form raake, seemingly from *raðakeh.

However, there are a few general problems with this:

  • for single *-ð-, more often it is western forms with -r- or -l- that spread beyond their expected borders, not eastern forms with loss; [1]
  • there seems to be no substantial eastern dialect evidence for the form *raðakeh;
  • the variation between *-keh and *-teh remains unexplained.

I propose that the forms with ⁽*⁾aa actually do not result from loss of *ð, irregularly in the west; they result from an early POA dissimilation of *raðak̆keh to *raɣak̆keh. This, then, would’ve set off a further “suffix dissimilation” to *raɣat̆teh in Eastern Finnish ~ Karelian (and we now require also distinguishing *-P̆P- from *-P-, given their distinct reflexes in southern Karelian).

As long as the origin of this word remains unknown beyond “northwestern Finnic” (Finnish–Karelian [2]), in principle *raɣakeh could even be the oldest form, with *raðakeh being due to regressive dissimilation from *ɣ-k, rather than progressive dissimilation of *r-ð. This combination is indeed otherwise generally tolerated: rata : radan ‘track’, retu : redun ‘dirt’, riita : riidan ‘quarrel’, rita : ridan ‘trap’, rotu : rodun ‘race’, ruoto : ruodon ‘fishbone’ do not show any similar irregularities. Of course though, in cases like these strong grade -t- would have provided analogical support; in principle we could assume even regular dissimilation once upon a time, with all the evidence other than raate then analogically reverted. [3] It also seems quite likely to me that the dissolution of western Finnish has begun in the southwest, and that therefore we should not expect to find major common innovations across the area … which we indeed don’t, aside from the general areal features that are normally used to define the Southwestern dialect zone (lack of Tavastian *ð > l and *CɣV > CVV, heavy syncope and apocope) and some commonalities that are analyzable as shared retentions from an older era (e.g. plural genitive in *-den, *-ten rather than *-iden; *suvi ‘summer’ vs. *kesä ‘fallow’; numerous vocabulary items shared with Estonian).

[1] On the other hand, with *-hð- there are a number of examples tending towards loss even in the west; known examples appearing in standard Finnish include ehättää ~ dial. ehrättää ‘to reach in time’ < *ehðättää ← ehtiä ‘to be on time’; lähettää ~ dial. lährettää ‘to send’ < *lähðettää ← lähteä ‘to depart’; kohentaa ~ dial. kohrentaa ‘to improve, to fix an object’s placement’ < *kohðentaa < kohta ‘place’; and the derivative suffix -auttaa ~ -ahduttaa -ahtaa.
[2] IMO a more likely grouping than a primary division into Western Finnish vs. Eastern Finnic including Ludian–Veps. Features that Karelian shares exclusively with Ludian or Veps can be mostly attributed either to late Russian influence, or to contact with Old Veps (maybe better called Old Ludian) and the later Ludian varieties. In turn, Eastern Finnish and Ingrian, whose close affinity with Karelian is very obvious, have almost nothing shared with Ludian–Veps that would require setting up an Eastern Finnic proto-language. All the “Karelid” varieties however show several features that are absent entirely from “Far Eastern Finnic”, yet shared with western Finnish — phonemic consonant gradation being maybe the most conspicuous feature.
[3] In principle reuhka ‘poor hat’ could be maybe derived as *reðu-hka > *reɣuhka; but this would be much too inexact semantically compared to the perfectly fine loan etymology from Russian треуг.

On Out of Eurasia and linguistic time depth

So here’s the hypothetical (as developed previously). Suppose modern humans have been hanging out at least somewhere around Eurasia already for 100, perhaps 200, maybe as much as 300 millennia, instead of merely 50–70. Should any of our views on the history of language(s) be affected?

A basic immediate result is that this substantially increases the time depth available for the language families around. This includes not just the known and proposed ones — but also the undetected ones that anthropology and genetics tells us will have to exist. As established before, an Out of Africa theory of modern human origins demands that ~all languages of outside Africa ought to go back to a single common ancestor about 70,000 years ago, since the Near East creates a natural bottleneck for early, pre-naval migrations. An alternative Out of Eurasia however does no such thing. It does suggest the existence of a few linguistically unconfirmed macrofamilies like African, Amerind or Australian, but these do not need to go back to any especially closely related Eurasian ancestors. These in turn do not have to be especially closely related to any modern Eurasian language families either.

If so, a failure to detect any relationship between even geographically relatively close-by families such as Sino-Tibetan and Indo-European, or Semitic/Afrasian and Sumerian, does not have to mean that the Comparative Method is therefore likely to run out of steam after 10 or even 20 millennia. Maybe the effective limit is much deeper, but it is also the case that these kind of “patently unrelated” languages really have been separate from one another ever since the Lower Paleolithic.

Eurasia houses also the great majority of the world’s well-studied language families. (In this context I would count also Austronesian as a “Eurasian family”, given its homeland in Taiwan, or maybe adjacent continental China. [4]) Those elsewhere have been documented and reconstructed on average much more scantily. A few blazing successes such as Bantu or Algonquian also suggest that there are many more results left to be claimed. It is therefore only in Eurasia that we can really with decent confidence claim that plenty of the language families appear to be unrelated or only vaguely related. Macrofamilies elsewhere in the world, such as Amerind or Australian, cannot be decreed invalid just on the basis of the so far poor results for macrofamilies across Eurasia!

It is every so often also claimed that linguists working with African languages in particular would tend more towards “lumping”, while linguists working with Eurasian languages would tend more towards “splitting”. I don’t think this is fair, except perhaps if we define “lumping” and “splitting” purely by the size of the language families involved. If African macrofamilies appear to have about as much evidence for them as Eurasian macrofamilies do now, when the languages of Africa are far less documented and researched, then I think we can expect the evidence base to keep growing. Over the 21st century I expect further solid reconstructions and new perimeters to be reached.

On a historiographical note, I would also like to briefly note that while “lumping” is often blamed on Joseph Greenberg and his alleged uncritical followers, almost all of his macrofamilies had been proposed or at least explored already earlier on. He may have brought in some new annexations and new unwarranted confidence, but the concept of “macrofamilies” has in principle nothing to do with Greenberg’s barely-method of mass comparison. [5]

I expect also one cross-linguistically important result to eventually emerge from ongoing research on the major African language families in particular. Besides big, these are likely to be fairly old… Hence, if we have one day a detailed reconstruction of Proto-Niger-Congo or Proto-Macro-Sudanic (“Nilo-Saharan” at its widest I do think is a wastebasket taxon), and their development to a few fairly distant languages, this will be able to show us what a linguistic relationship that’s 10,000+ years of age really looks like! I am not confident that this would have to turn out as minimal as is often claimed.

It is an unfortunately common notion that a linguistic relationships of 10,000+ years of age would have to be undetectable. This seems to come from two main sources, both of them IMO fallacious. The first is naive extrapolation from examining relationships maybe halfway or third as old. Proto-languages of this age we can reconstruct, but only partially. E.g. in terms of lexicon: in any language that is still spoken, tens of thousands of words can be attested, while in multi-millennia-old proto-languages only some hundreds can be reconstructed. So should we not assume that further two, three or four time periods equally long should squeeze the available evidence down to definitely nothing? Well, maybe — if you’re willing to bite the two bullets that (1) glottochronology works at least on long enough time periods, but (2) there is no core lexicon or grammar, and everything is equally likely to change. Without the first, punctuated equilibrium models will allow for the possibility that some languages may have remained, at times, mostly stable for millennia. Without the second, you will have to admit that languages’ core features actually remain stable much longer than their peripheral features, [6] and they will likely allow reconstruction efforts well beyond what naive extrapolation suggests.

The second error is based on the essentially winged age estimates for Eurasian macrofamilies. This is already internally incoherent, though. If a maybe-family like Nostratic is proposed to be in the age range of 10,000 years, and if the Nostratic proposal is too weak to be accepted… then it does not follow that all language families in this age range will be too weak to be accepted: rather, this means that the age proposal for Nostratic is, together with the family itself, also too weak to be accepted. What does not exist, cannot be dated either. (Did Cthulhu invent cephalopody before or after the molluscs did?)

One possible compromise for this would be to treat (again, e.g.) Nostratic as a language family that can be accepted through archeological / cultural-anthropological / genetic arguments, but not linguistic ones. I don’t think anyone really believes in this exactly, though. I only ever see arguments to the effect that, if Nostratic exists, based on cultural & genetic distance, then it will have to be at least 10,000 years old (and this much is easy to agree with). But this is only a lower bound! It gives us no evidence about an upper bound. Since language shift and cultural convergence exist, there is no linear or even monotonic relationship between linguistic distance and cultural-genetic distance. After a few language shifts and centuries of convergence, the modern citizens of Haparanda and Tornio are just about indistinguishable by cultural and genetic distance. Yet by linguistic distance, one side still remains Indo-European, the other Uralic.

Finally, most of the various ideas above can be pooled together as a provocative new hypothesis for thinking about the Eurasian “maybe-families” and their place in the general context of “macro-comparison”: the vague resemblances we see in e.g. Nostratic could be indicative of what language relationship after 100,000 years looks like. That is, it might turn that the reason macro-comparison has been thought to be mostly fruitless is that, the entire time, most linguists have been actually beating their heads against the hardest such problems around! Once we have Indo-European-level (Uralic-level? [7]) documentation of most languages involved, maybe not just units like Atlantic-Congo or Trans-New-Guinea, currently established on fairly meagre evidence, but also even much older and larger units, say Niger-Saharan or Southern Amerind, will turn out to be relatively feasible to establish and reconstruct. Only future work will tell for sure. [8]

(To be continued still…)

[4] As suggested by Blench in various draft papers. Also the today quite well-emerging Austro-Tai theory would fit in with this: given probable continental relatives as well, Taiwan may simply constitute a residual zone populated by Austronesian groups driven off of or extirpated on the mainland by the Chinese.
[5] Even mass comparison can be probably steelmanned, but that’d be a topic for another time.
[6] Cf. the fact that radioactive isotopes’ half-lives come as anything between billions of years and some fractions of picoseconds.
[7] It seems possible that Uralic is actually the best-documented large language family out there, if “large” is defined to cover both diversity and time depth. There’s a grammar for everything (even if from the 1800s for some languages), and at least one extensive multidialectal dictionary for almost everything (usually more; oldest unsuperceded ones are from around 1940; biggest omission is probably Veps); multiple etymological dictionaries and historical grammars for every language with a longer written tradition (well, all three of them), and a bunch of them even for minority languages. Indo-European definitely sports the best-documented individual languages out there, but the family-wide average is killed hard by modern Indo-Iranian languages, which AFAICT have been essentially deemed “comparatively useless” to document due to Sanskrit and Avestan being available. Semitic fares somewhat similarly (weak points: Ethiosemitic, Modern South Arabian). On comprehensiveness + diversity, several other (once again usually Eurasian) families like Japonic and Turkic are doing well or getting there, but all of these are younger. On comprehensiveness + time depth the only contender I can think of is perhaps Kartvelian, which does poorly on internal diversity.
[8] Provided that people will not write out even the prospect of such future work by unsubstantiated assertions about the Comparative Method “stopping working” after some random number of millennia. But, I also think it doesn’t really matter if this belief is being held by some people currently… It will be a good idea anyway to get a decent reconstruction of Benue-Congo before trying to reconstruct Atlantic-Congo or the whole Niger-Congo, and also to get a decent reconstruction of Bantoid or Edoid before trying to reconstruct Benue-Congo. And if I am right about families like NC being able to eventually provide us with explicit examples of languages being demonstrably related over 10,000+ years, then the skeptics are not going to be denying any of the results happening along the way; they’re going to be gradually retracting their supposed time limit, until it turns out to be deep enough that it can be no longer used to flatly deny other similarly far-away but less obvious results like Dene-Yeniseian either.

Excursion: On Out of Africa

Out of Africa (OOA) has been the main theory of the origin of modern humans since the mid-20th century. Strictly speaking this is only a theory of anthropology. Since language is a human phenomenon, [1] it has however also sprouted a “linguistic Out of Africa” theory alongside.

According to what could be called “the evolutionary theory of language”, we observe that new languages only come about by the spreading and splintering of earlier languages. (Or, perhaps a better biological analogue still is the third tenet of cell theory: that cells only come about from other cells.) This alone already suffices to imply that there exists a family tree of languages, tracing back to the ancient era of glottogenesis. Connected with a relatively late (“recent”) expansion of modern humans out of Africa, we can then in particular infer the highly likely existence of a language that could be called “Proto-Exo-African” (PEA) — the language of the humans who first set on this exodus, which must also have been a common ancestor of all languages spoken in Eurasia, Oceania and the Americas. This is an idea that is in principle sound, even if, in my impression, underappreciated among historical linguists. The smaller number of so-called evolutionary linguists out there do understand it well, at least.

This argument though says nothing about if this common descent would be in any way identifiable from the linguistic data itself. Language does not have a strict analogue of DNA (or any other similarly transferred major biochemical machinery), and is not strictly speaking “transmitted” as much as “constructed” over and over again every generation. No child is born knowing a language, only with the ability to acquire a language. This could add up to the result that every linguistic feature of PEA has been by now either lost or diluted to undetectability. And it happens to be the case that all language families with general approval so far are still at least an order of magnitude younger than the assumed recent OOA spill-out starting some 70,000 years ago. Even the more ambitious proposals like Amerind or Nostratic (that actually have some legitimate comparative evidence backing them, unlike attempts to scrape together things like Proto-World) are only proposed to reach at most some 20-30 millennia of age, i.e. barely a third of the way back.

If PEA might be undetectable by direct means, how error-proof is the indirect demonstration of its existence then? As long as we do not question the underlying OOA theory, there are really only four possibilities under which the assumption of PEA might fail:

  1. humans leaving Africa did not yet have language, and it has come about only later, possibly several times independently;
  2. at some point in prehistory, new languages were created from scratch to replace earlier natural languages, and some or all modern languages rather descend from these “new” languages;
  3. the African exodus population spoke more than one language;
  4. at some point in prehistory, other (possibly entirely unrelated) language families have secondarily spread out of Africa, to replace some or all descendants of PEA.

Of these, #1 is difficult to directly refute. Spoken language does not fossilize, and hence the study of the biological evolution of language is to a large extent an issue of speculation. The most common opinion around however is that language would have existed at least by the transition to anatomically modern humans (AMHs), as distinct from Neanderthals and the newly found Denisovans, so at least a couple hundreds of millennia ago. In this post series I will continue to follow this assumption as well.

All the others, however, would not make major dents in the hypothesis of monogenesis of non-African languages.

#2 is a priori improbable, and hence not actually a major objection. If we take seriously the rarity of languages being freshly invented (i.e. stick to the principle of uniformitarianism), then even recent glottogenesis events will actually only leave a slightly weakened OOA theory of language, one where we can allow for e.g. Basque or Turkic to have been created from scratch, but all other non-African languages can be still assumed to be descendants of PEA. Similarly #3 would only split the family tree into a small copse of unrelated-at-OOA-time trees, probably at most no more than 3-4. These would have good chances of being still related to one another at some pre-OOA date, so that there is a PEA in the last common ancestor sense, even if confusingly enough it was not itself exo-African (and could be therefore also the ancestor of various African languages).

#4 actually has good chances of being true: maybe the best contender for non-PEA languages spoken in Eurasia today are the Semitic languages in the Levant and Mesopotamia, grouped in the larger Afrasian family, whose homeland is often (though not always) placed somewhere in northeastern Africa. But we can also see that this too would only very slightly push back the boundary of African vs. exo-African languages. Languages spread only step by step. Perhaps some other lineages in the vicinity of Africa, say Sumerian or Dravidian, could be also of yet more recent African origin, but once modern humans had first colonized places like Siberia and Southeast Asia, new intrusions all the way from Africa are unlikely to happen.

Altogether even a buffer zone of maybes doesn’t seem to shake the conclusion that the 100+ exo-African language families known today to linguistics (including isolates) must be only a few top boughs of at most a handful of much larger underlying language families, dating back to the time of the OOA expansion.

But has there really been a recent Out-of-Africa expansion?

There definitely has been at least one OOA event, since also the Neanderthalians, Denisovans and Homo erectus are thought to descend from African hominins. For recent modern human OOA in particular though, the main line of hard evidence has originally rather come from the fossil record. 200k years old Homo sapiens remains from Omo Kibish in Ethiopia, the oldest known throughout the late 20th century, have been some of the best supporting evidence. This is however but a single archeological datapoint — and one that has been overturned even. Since last year, the oldest known remains of AMHs now come instead from Jebel Irhoud in Morocco, dated around 300k years old. Note that this is quite a gap, both chronologically and geographically! The data doesn’t get especially dense going forward either. Only a handful of any modern human remains under 100k years of age are still known. More importantly this slightly extended selection already includes locations also outside Africa, in modern Israel and Oman (discovered not too long ago as well). These have been suggested to represent “failed migrations that died out”, but this strikes me as special pleading. I dout that anyone looking at this scattered early record without the weight of research history (and with understanding of the Signor–Lipps effect) would place the origin of anatomically modern humans within Africa with great confidence. At minimum a Near Eastern origin seems to be entirely within the question as well. Paleoecology could be able to suggest other likely locations still.

Several posts by anthropology blogger Dienekes have moreover drawn my attention to a few interesting additional arguments to consider recent OOA on very shaky ground by now. For one, recall that modern humans’ closest known relatives are the Neanderthalians and the Denisovans — two Eurasian species, with Denisovans branching off first, which would suggest that the common ancestor of the three, and even the Neanderthal-AMH last common ancestor, lived in Eurasia as well. (This does not need to coincide with the LCA of crown group AMHs, however.)

For two, genetics has for long pointed out that the modern human populations of sub-Saharan Africa [2] show altogether greater genetic diversity than those of Eurasia (+ with even further rarification in Oceania and America). However with the rapid development of archaeogenetics in the last few years, we have by now first lines of evidence that this could be due to admixture with archaic Homo sapiens groups.

Maybe the Neanderthal and Denisovan OOA event was then also the main modern human OOA event after all?

This would also imply at least two inverse “into Africa” expansions (one leading to archaic African substratal groups, the other for crown group AMHs). This does not seem to be a very costly assumption though, since the distribution of several archaic haplogroups already demands multiple major population movements across the continent. E.g. the archaic mtDNA haplogroup L0 of South Africa is both first-to-branch-off and present in only fractional proportion in the populations that do carry it, clearly requiring multiple admixture events along the way (instead of, say, a Great San Migration that starts 100k years ago followed by them hanging out in South Africa mostly intact after that). There are also haplogroups with primarily Eurasian distribution but some inroads even into sub-Saharan Africa, requiring their own more recent but still prehistoric into Africa or out of Africa movements or gene flows (e.g. Y-haplogroup T, mtDNA haplogroup U).

An option to be also kept in mind is haplogroup extirpation: various today-African haplogroups may have once existed in Eurasia too, but eventually died out, e.g. under later population movements. The same could be the case anywhere of course, but to me it seems that Eurasia is a priori the most likely location for this. For one already due to size, for two due to the historically well-documented extensive population movements, particularly across and around the major “crossroads” that is the Near East. [3] A third and maybe the most powerful candidate for wiping out genetic diversity from Eurasia would be the latest glaciation period. Still, the human Y-chromosomal and mtDNA haplogroup trees both start off with a remarkably large series of exclusively African early groups. Any theory of AMH origins outside of Africa would have to explain most of these through archaic admixture, with haplogroup extirpation probably only granting some wiggle room around those points in the two family trees where the branches start to turn Eurasian-centric instead.

I’m aware I’m outside my zone of expertise here, so in case this sounds like I am suddendly a few moomins short of a valley, I do want to note that overturning recent OOA is still not looking cut and dry exactly (and if any of the arguments above have big glaring holes in them, I would appreciate readers pointing them out). If you bear with me for now, though, I will develop in the next post some potentially very interesting corollaries this possibility would have.

[1] It may be at times useful to think of linguistics, likewise most other humanities subjects, as a subdiscipline of anthropology (in a somewhat similar way as how biology could be considered a subdiscipline of chemistry).
[2] Important as a human genetic and cultural area, and to some extent also as a more general biogeographical region. North Africa by contrast on many marks aligns with Eurasia instead. I have wondered if sub-Saharan Africa would deserve its own underived term, along the lines of “Maghreb” for North Africa. There are a few historical candidates, but nothing that really stands out as immediately usable: Ethiopia and Sudan have been already claimed by states (much as also Libya), and Zanj also has been kind of claimed by Tanzania. The analogy of Australia could suggest e.g. “Meridionalia” or “Equatoria”, but outright coinages have quite a few orders of freedom to them.
[3] Later also the “highway” that are the steppes, but my impression is that pseudo-periodic nomad invasions of Eastern Europe / Persia / China have only really been a thing after the domestication of the horse, and related inventions such as chariots, saddles, stirrups etc., all fairly recent in the big scheme of things.

A Problem Statement for Uralic vocalism

As noted in my previous post, I have by now nailed down as my next professional milestone a hunt for previously unnoticed innovative features within the Finnic vowel system.

Besides individual surface questions about how the vowel system of Proto-Uralic may have looked like (harmony this, stem vowels that, long vowels yay or nay…), there is also a second, more methodological theme involved that may be less apparent. This is the question of how to reconstruct a proto-language when faced with extensively overlapping correspondences.

Uralic vowel reconstruction is not really constrained by data. The etymological pool has been sitting at around 1000 items already since the late 19th century. Etymology has kept progressing over the time, but almost as much of it involving discarding old poor comparisons as adding new better ones, and hence with surprizingly not that much quantitative impact (an optimistic count would put us as having gone from about 900 to about 1200). Yet it still remains the case that effectively no two etyma fully agree in their correspondences! Even seemingly perfect rhyme series found in just about all languages show divergence in one or two languages. E.g. for PU *kala ‘fish’, *pala(-) ‘bit, to bite’ and *sala- ‘to steal’ it is Khanty and Selkup that diverge, both in different items even: *kuuL, *puuɭ, *ɬaaL-; *qwëlɨ, *poolɨ-, *twëlɨ-. [1] More often, numerous gaps in data prevent assigning correspondences definitely together as series: a given sparse correspondence set may be simultaneously compatible with three other sets, which however all disagree with one other. In other words we are saddled with too many correspondences to straightforwardly tackle. This all has already been noted before too, e.g. by Kaisa Häkkinen in 1983, in her PhD thesis Suomen kielen vanhimmasta sanastosta ja sen tutkimisesta, pp. 120–151.

My reading on the research history seems to moreover reveal that it’s tackling this issue that has been driving all the major debates on Uralic vowel reconstruction thru the years. There have been roughly four approaches considered throughout the years, all of them in principle admissible per known processes of language change:

1. Reconstruct different vowels for each correspondence (the “trivial reconstruction” approach). This was briefly attempted in the late 1890s, within the West Uralic (Samic–Finnic–Mordvinic) group. E. N. Setälä proposed at this time (see p. 839– here) reconstructions such as the following:

  • *ȧ > S *ā ~ F *a ~ Mo *a
  • *å > S *uo ~ F *a ~ Mo *a
  • *ɔ > S *oa ~ F *o ~ Mo *u
  • *o > S *uo ~ F *o ~ Mo *o, *u
  • *ɔ̄ > S *oa ~ F *oo ~ Mo *u
  • *ō > S *uo ~ F *oo ~ Mo *a

However, any attempt to extend this method wider out will turn out to require further and further splintering, and by the time we end up with a triple-digit number of different proto-vowels, this idea will be clearly untenable.

2. Assume original vowel alternations, with levelling in each descendant. This idea was also initiated by Setälä very shortly afterwards, indeed already explored in the same article I linked, and gained maybe in its purest form by T. Lehtisalo in the 1930s. [2] In his work e.g. what Setälä above reconstructs *ȧ becomes *ā; but *å is transformed into *ā ~ *ò, *ɔ into *ò ~ *ù, *o into *ò ~ *ō ~ *ū, *ɔ̄ into *ō ~ *ū, and *ō into *ō ~ *ā. Most of his various proto-vowels actually never exist outside such pseudo-ablaut patterns.

After WW2 the “locus” of this line of reconstruction moved from Finland to Germany, with W. Steinitz defending his own variant of the idea extensively in the 40s thru 60s. No real research on the topic has occurred since his death however. (Amazingly enough, it still lingers though in some overviews of Uralic penned by people who evidently ignore all research published outside of the German-speaking world.) I see this as unlikely to be effectively revived either: the only Uralic language showing somehow productive evidence for “ablaut” is Khanty, while everywhere else alleged evidence for vowel alternation is either due to transparently secondary changes, or is really based on sound correspondences rather than language-internal evidence.

3. Assume sporadic vowel changes, per ad hoc influence of varying surrounding phonetic environments: anything adjacent to labials might be labialized or delabialized, anything adjacent to velars might be backed or labialized, anything adjacent to /r/ might be lowered or backed, etc. No one has treated this as the sole explanation for various vowel correspondences across Uralic, but this was considered a major mechanism first by E. Itkonen, whose work ended up repealing the Setälä school “gradation” model in Finland, yet ended up enshrining a very Finnocentric image of Uralic vocalism (cf. before).

Today this approach most strongly still persists in Hungary. One reason surely is that this is the model that has been adopted in the UEW, often treated as the crown jewel of Hungarian Uralistics, and whose tentative reconstructions are then sadly often treated as ex cathedra truth. I suspect a second reason is moreover found in language-internal history: the Modern Hungarian vowel system cannot be derived from that of Old Hungarian by regular sound changes — if taken at face value. However the very limited inventory of Old Hungarian vowel graphemes (in first sources just ‹a e i o u›, slightly later expanded to ‹a e i o u ü›, etc.) very likely hides unwritten distinctions. [3]

4. Attempt to reconstruct conditional vowel shifts. First explored already by A. Genetz contemporarily with Setälä, and by now universally adopted e.g. for much of the West Uralic data: Setälä’s *ɔ, *o turn out to be in complementary distribution with respect to stem type (*o-a versus *o-ə, split only in Samic). Recent wisdom shows this to be mainly the case for his *å versus *ō likewise (*a-a, *aCCə, *aTə versus *aRə, split only in Finnic). This then also explains extremely naturally the identical reflexes in Finnic and Mordvinic for the former, in Samic and Mordvinic for the latter: they don’t just coincide, they’re always had the same vocalism.

More generally, this approach is adopted to some extent by everyone more recent than Lehtisalo (including Steinitz and Itkonen), but often only partially. I believe it can and should be still pushed further to reach new results.

It must be also noted that these are not methodologically equal approaches.

The first approach does make exact predictions, and is to an extent obligatory: we do need to assume some number of vowel phonemes in Proto-Uralic, and some unconditional/elsewhere reflexes for them in the daughter languages. [4] But the vast number of correspondences demands also some other mechanisms to account for the large number of non-core cases (not really “edge cases” when they may be the majority altogether). While a few of them could be in principle again accounted for by setting up new Proto-Uralic vowel phonemes, this method ends up as awfully arbitrary: we have no clear grounds to prioritize any single case of variance in reflexes as inherited from Proto-Uralic, while leaving other cases of variance to be explained by other methods. In fact I think by now that reconstructing any proto-language contrasts at all from only a single branch among several (i.e. at least in largely polytomous-looking dialect-continuum/linkage situations such as Uralic) is methodologically illegitimate — while such cases can obviously happen in principle, only when a contrast is continued by more than one line of evidence is it possible to securely privilege a particular reconstruction.

The second and third mechanisms however are poor patches to the problem: they end up as unfalsifiable “just-so phonology”. Both irregular sound change and paradigmatic levelling are singular events that can be only assumed, never defended in detail, and never clearly shown to be incorrect by additional data. Usually it also becomes nearly impossible to then establish the real proto-language starting point. For the former the main issue is one of directionality, especially for supposedly irregular correspondences widely across a family, but also since local archaisms are in principle possible. For the latter the typical problem has been treating “alternation” merely as a free-floating excuse to mix and match vowel reflexes, without giving it any original morphological or phonological distribution. Sometimes we may fall back to these, but they’re no more than band-aids for etymologies that otherwise seem to work and which we don’t feel like discarding for but a single irregular feature. (There are further similar mechanisms available to the historical linguist too, but they start to get outside phonology entirely. [5])

It is only the fourth approach that has real explanatory power for exception cases. Reliably established conditional sound changes allow accounting for the development of multiple words by a single explanation, reaching a more parsimonious historical scenario than anything built of one-off changes. Conditional sound changes make fairly exact predictions as well about what correspondences future etymological research may find. Though this should not be overstated: etymology is not a black box that feeds us experimental data, it’s made of scientists who are able to read work on historical phonology and might use it to hunt for new etymologies, in principle risking confirmation bias. New data rarely outright falsifies conditional sound changes either: more common responses in my impression are to either narrow down the conditioning further yet, or to seek explanations through relative chronology, so that apparent exceptions may turn out to be accountable as being due to counterfeeding sound changes.

As I’ve stated already in the intro slides to my CIFU 12 presentation: “one who seeks, shall find”. For several years now, a large proportion of new discoveries in Uralic historical phonology have precisely been conditional sound changes, either entirely new ones, or new and improved conditions for known sound correspondences. This includes also almost all results I am “sitting on”. Hence it seems evident that this represents a major underresearched area.

This is all the more surprizing since ample preliminary work has regardless already been done! With just a bit more rigor, many minor “sporadic” sound changes assumed by mid-20th-century researchers like Itkonen, Collinder or Rédei (to an extent also even 19th-century pioneers) could probably be transformed into more regular shape. This goes beyond the big names too: many minor articles may yet turn out to have the seeds of important insights, as maybe best exemplified by Lehtinen 1967; but also (staying still within Finnic) e.g. Bergsland 1968 as the inspiration for my idea of more general palatal unpacking *AĆ >*AjC, or the various loanword studies to have first discussed the idea of a sound change *ej > *ii. I have enough ideas already to put together a PhD from ideas I’ve already uncovered or developed on my own, but going onward from there, compiling and reassessing proposed sound changes from earlier research seems to me like an important desideratum for Uralic studies in the early 21st century.

[1] At least the Selkup development is perfectly explainable: there is no **pwë- in Proto-Selkup, and evidently diphthongization of Proto-Samoyedic *å to *wë was blocked after labials. Terentyev in СФУ 16 suggests *å > *o between a labial and a resonant (thus also e.g. PU *pončə ‘tail, back’ > PSmy *pånčə > PSk *ponč-ar ‘hem’, (? *parka >) *pårkå > *porqɨ ‘coat, parka’), *å > *u between a labial and an obstruent (thus also e.g. *mośkə- > *måsə- > *musɨ- ‘to wash’, *poskə > *påtə > *putɨ-la ‘cheek’). There are also cases of *å > *o/u not preceded by a labial though. I wonder if syllable closure and/or if PSmy *å goes back to PU *a or *o should also be taken into account.
[2] Most extensively in: Lehtisalo, T. 1933. “Zur geschichte des vokalismus der ersten silbe im uralischen vom qualitative standpunkt aus” [sic: no caps]. Finnisch-Ugrische Forschungen 21: 5–55.
[3] E.g. ‹i› when giving modern Hu. ë/ö is likely to have been a shorter/laxer *ɪ, while ‹i› when giving modern Hu. i/í is likely to have been longer/tenser *i ~ *iː, as can be confirmed by different Uralic sources for the two — and hence these correspondences do not involve “sporadic” lowering of †i, but rather quite regular lowering of */ɪ/.
[4] It is in theory however possible, given long enough phonological development, that many conditional sound changes bleed a proto-phoneme such as *a on its way to some default reflex *A in a sub-branch, and then this is bled by additional conditional sound changes in several environments including all the retention ones on its way to some modern reflex like /a/, that there aren’t actually any cases left at all where *a > /a/. In such a case all reflexes of original *a in this modern variety would be conditional one way or the other. An almost-example is the fate of PU *k in Tundra Nenets: when singleton palatalized to /sʲ/ before front vowels and (? backed-then-)lenited to /x/ before back vowels, in coda debuccalized to /ʔ/ as the first member, and almost always lost as the 2nd member of a cluster — so that the “default” development *k > /k/ is only really found in the original cluster *kk. On the average Uralic is still phonologically compact enough though that usually anything like this does not happen.
[5] One other common option is “find a root that does work phonologically, then go hog wild with semantics”. This has given us many such great etymologies as Kari Liukkonen’s infamous derivation of Finnic *noki ‘soot’ from Baltic *nagis ‘nail’, allegedly through an unattested sense ‘dirt under fingernails’ (I wish I were kidding). — When in need of a patch, I seem to tend towards phono-semantic contamination the most for some reason. Arguably this is also an underresearched area, but again, semantic change is singular and cannot be actually usefully reconstructed all by itself. At most it seems that we could collect examples and try to look for typological generalizations, hardly a project to have lasting impact very soon.

Not So Freelance Soon

The countdown has begun: my PhD applications are in, to be followed by grant applications later this year (right now I am employed enough with other work). This will surely call for the long-imminent rebranding of this blog away from the current title Freelance reconstruction. New title ideas are being brainstormed already.

The project pitch has been simple yet ambitious enough: I’m going to be grappling more with Historical Phonology Hard Mode, i.e. the reconstruction of Uralic vocalism. I don’t think I will be revolutionarizing the field entirely though (a single PhD is too short for that!). Primarily I will be putting together topics to move the reconstruction of Proto-Uralic further away from Finnic, to be tied up in an article thesis; preliminary title: “Innovations in Finnic vocalism“. Blog readers actually will be able to recognize much material along these lines having already been featured on this blog in some incipient form.

There will be also entirely unpublished but already charted-out research coming up, e.g.

  • Some thoughts on the sources of Finnic “primary disharmonic” roots (the type *nila ‘phloem’, *piha ‘yard’, *virka ‘noose’)
  • Some new results on second-syllable vocalism, most of them arguing for a slightly less archaic status for Finnic (cf. some earlier non-Finnic-related comments here and here).
Posted in News

Etymology squib: riipustaa

I happened today upon a small etymological review article “Lat. scrībere in Germanic“, which argues that this is indeed a loanword rather than a cognate, but a relatively early one, already roughly into Proto-West Germanic. This got me thinking about a possible modern Finnish reflex: riipustaa ‘to scribble letters’. This is not a word of particularly wide currency, and does not even cover the general meaning ‘to write’, which is usually expressed by the native term kirjoittaa (more rarely also piirtää, whose main meaning is ‘to draw’). The word seems to be in fact marginal enough that it is not treated in any Finnish etymological work at my disposal. Regardless resemblance to Latin is quite apparent.

There is room for dout here already to start with. For one, riipustaa exists beside a variant raapustaa, which is in my impression (and by Ghits) even the more common one. Formally both could be also analyzed as derived from the more basic verbs riipiä ‘to rip, pluck, tear, scratch’, raa(p)pia ‘to scrape, scratch’. However the fairly specific semantics regardless lead me to think that the similarity of scrībere and riipustaa is not accidental. If this were a straight-up loanword, raapustaa could have come about as a contamination with raapia (or perhaps its further derivative raaputtaa). As for riipiä, the cognates across Finnic and also Germanic (← *rīfan-, *rīpan- [1]) only seem to show the meanings in the range ‘to rip, pluck, tear’. The dialectal Finnish meaning ‘to scratch’ could be rather by the influence of raapia and/or riipustaa.

It should be also noted that most loanwords from Latin into early Germanic have sooner or later continued their trek further into Finnish: e.g. enkeli ‘angel’, kauppa ‘store, trade’, keisari ’emperor’, kellari ‘cellar’, kori ‘basket’, kyökki ‘kitchen’, luumu ‘plum’, pannu ‘pan’, penni(nki) ‘penny’, piippu ‘pipe’, pytty ‘pot’, säkki ‘sack’, tiili ’tile’, viina ‘spirits’ / viini ‘wine’, ämpäri ‘bucket’, äyri ‘monetary unit’. Most of these are newer loans from Swedish, but earlier loaning roughly to late Proto-Finnic or early Common Finnic would not be unthinkable. One such more widespread case is kattila ‘kettle’, which appears to be reconstructible for Proto-Finnic (> e.g. Veps katil, Livonian kaţļā, South Estonian katõl’, with regular development [2]). Another candidate is ‘pound’: besides Fi. punta there are also Est. pund, Liv. pūnda, which could also all derive already from PF *punta. This word however has undergone so few sound changes on any side of the equation that, in the absense of cognates in Eastern Finnic, I would not rule out later parallel loaning. [3]

This possibility is relevant since quite early loaning would have to be assumed for riipustaa in order to account for -p- (contrast modern Sw. skriva, Low German schrieven). Outright retention of the plosive all the way from Classical Latin seems still unlikely (already Old Norse shows skrifa), but given that also Germanic *f turns up as /p/ in Finnic for a while, substitution of Germanic *-b- as early Finnish -p- seems like it should also continue to be possible even after lenition to *-β- > *-v-. Especially if I’m right about the hypothesis that the introduction of the substitution pattern fv is due to the onset of the sound change *w > v in Finnic and not anything changing on the Germanic side.

Morphologically however riipustaa could not be ancient in this scenario: before the heavy formant -sta- a weak grade would be expected (cf. e.g. riippu- ~ rippu- ‘to hang’ → *rip̆pu-sta- > ripusta- ‘to hang up’, lintu ‘bird’ → *lindu-sta- > linnusta- ‘to hunt birds’), which would here demand a root √riipp- (clearly underivable from Germanic). The other option would be to consider this a fairly recent formation, similar to the likes of maku ‘taste’ → makusta- ‘to savor a taste’ (vs. older mausta- ‘to spice’), julkinen ‘public’ → julkista- ‘to publish’ (vs. older julista- ‘to proclaim). The immediate source of derivation would then probably have to be a noun *riipu ‘scribble’ (from a verb *riipa-, *riipat-, *riipi- or even *riipe- ‘to scribble’). So this idea of riipustaa as a loanword is but a “root etymology”: only the root syllable riip- could possibly derive as an old loan based on scrībere, all else needs to be more recent.

Assuming all this eventful history within Finnish is also unfortunately getting rather convoluted. If it is only the semantics of this word that end up nicely matching with Latin, perhaps a more economical solution would be to reverse the direction of the various semantic contaminations I’ve assumed above:

  1. a derivative raapustaa ‘to scrabble, scratch’ comes about in Finnish;
  2. through the influence of riipiä, this develops a by-form riipustaa;
  3. through the influence of Swedish skriva (and, why not, also riimu ‘rune’?) this acquires the meaning ‘to scribble letters’ in particular;

with the net result being that riipustaa is much younger altogether, built up within the last few centuries.

The third point would probably require that skriva gets borrowed into Finnish first, presumably in the form ((s)k)riivata.

At this point I need to finally turn my eye from reconstructive speculation to real data. And a narrower chronology turns out to be vindicated by the Finnish dialects. While Suomen Murteiden Sanakirja has not yet reached R- (probably won’t until a few decades from now; finishing L alone has been taking some five years), it is already possible to see that (s)kriivata ‘to write’ is in fact attested (including even a by-form kriipata)! At least the initial cluster skr- cannot be here anything else but a sign of a recent loanword. [4] Various other evidently related formations have attestations with a cluster kr- as well: e.g. kriipu ‘scratch’, kriiputa and kriiputtaa ‘to draw lines’, and indeed also kriipustaa ‘to scribble’, kriipustus ‘scribble’. Tracing the rise of the forms with -p- rather than -v- is not obvious without access to the entries of the clusterless variants, but I’d still imagine the source is in the end raapia — this, too, indeed also with a variant kraapia (it’s a loanword from Sw. skrapa ‘to scrape’ after all).

Case closed, I think: riipustaa is a recent Finnish-internal formation, evolving on the basis of Swedish skriva, and only accidentally comes again close in form to its ancient Latin original. Seeking a deep-reaching root etymology for a geographically isolated word turns out to be a demonstrably bad idea once again.

[1] Today explained as Kluge’s Law doublets, though I suppose back-loaning from Finnic would also work.
[2] The SE form and North Estonian katel show syncope-then-epenthesis through *kattila > *kattľ, similar to e.g. *akkuna > *akkn > NE aken ~ SE akõn’ ‘window’, *taikina > *taikn > NE taigen ‘dough’. This took interestingly enough only place between a heavy first syllable and a light 3rd syllable. Otherwise V2 survives, e.g. *satula > NE sadul ‘saddle’, *rusikka-s > NE rusikas ‘fist’, *palmikkoi > NE palmik ‘plait’.
[3] This moreover has apparent reflexes even in the Volga region, usually explained as loaned from Gothic: Erzya /pondo/, Moksha /ponda/ ‘measure of weight’, Mari /pundə/ ‘money, capital’. The former interestingly enough appears to have come in early enough to have participated in the lowering of native *u > *ʊ to /o/… I wonder if early Slavic *pǫdъ could work as an alternative more recent loan source (nasal vowels are still reflected as /Vn/ in several early loans to Finnic as well).
[4] For plain kr- in western dialects an onomatopoetic origin might be conceivable, cf. e.g. rapista ~ krapista ‘to rustle’, rätistä ~ krätistä ~ prätistä ‘to crackle’.

An Old Komi inscription

From A. J. Sjögren’s Die Sürjänen (orig. 1829), more exactly the reprint in his Gesammelte Schriften I (1861, edited by F. J. Wiedemann):

An Old Komi inscription.

I do not have a transcription or any other analysis to offer, alas. For one, while “Old Permic” is in Unicode as of 2014, I still have no typeface capable of actually displaying it. For two, transliteration just off the cuff would be difficult too: while the Old Komi alphabet is in principle Cyrillic-derived (developed in 1372 by St. Stephen of Perm), most familiar-looking letter shapes are actually red herrings. E.g. the reversed-Г letter is /i/, the lambda/rho-digraph-esque letter is /š/, and the hourglass is /v/ (derived from В). The only letter that is readily correctly identifiable with just knowledge of Cyrillic might be yat Ѣ, observable e.g. at the start of the 4th line. For anyone who’d want to try their hand themselves though, has a paleography chart from the one authoritative modern source, Lytkin’s Древнепермский язык (1952).

For three… we actually hear in Uralistics very little about Old Komi at all. I for one would enjoy seeing at least a bit more coverage in basic sources. Yet the only fact that comes up with any frequency is the claim of Old Komi being the only direct and consistent source for differentiating Proto-Komi *e and *ɛ. [1] Otherwise, e.g. Raija Bartens’ 360+page handbook Permiläisten kielten rakenne ja kehitys (2000) only discusses the Old Komi written records for no more than half a page, with no direct examples given.

There is regardless not much available to begin with: in the 1998 handbook on Uralic, A.-R. Hausenberg reports that the known corpus on Old Komi only measures 225 words, which would be on par with minor epigraphic languages such as Thracian or Lydian. I suppose the inscription above would therefore already constitute maybe a good 10–15% of the total corpus.

[1] The Komi-Yazva variety also distinguishes these however, as /i/ versus /e/.

The origin of the Finnic long vowels: An outline

Continued from my thesis release post, as is perhaps appropriate now that I finally have wrapped up my graduation as well. To make it a bit more convenient for readers, I provide here an English outline of the specific topics I discuss in my thesis, even though writing out my argumentation in detail would take much more work.

Comments in [brackets] are additions specifically made in this post, not found in the thesis itself.

pp. 2–8: Methodological basics [§ 2]
A honestly fairly scattered lookover at the usual meat-and-potatos of the Comparative Method. Language relationship means descent from a common protolanguage; word relationship means descent from a common proto-form, and constitutes a comparative etymology; comparative etymologies imply sound correspondences; regular sound correspondences allow reconstruction, i.e. the postulation of proto-forms, and sound changes leading from them to attested forms; any given set of comparative data allows many different reconstructions, and parsimony must be appealed to to choose between them. On p. 5 I also sketch the structure of sound change in terms of 2+2+2 factors [a topic I think I would like to expand on in future work on the ontology of historical linguistics]: input/output, enabling/blocking conditioning, and location in time/space.

pp. 8–11: Research data [§ 3]
Early on I had datamined the etymological literature for all cases where an etymon shows a long vocoid (long vowel or a diphthong) and could feasibly date back to Proto-Finnic. This compilation comes out at about 1500 items. For now I treat in full detail only the oldest layer of long vowels however, numbering a few dozen. So there would be much further work to do also (some of which in fact is already done and will be simply delayed for future publications).

pp. 12–17: Finnish to Proto-Finnic [§ 4]
This era of history is pretty much established. There is a consensus view of the vowel system of late Proto-Finnic (tables 1–2), save for one issue: whether there were harmonic equivalents *e, *ë (= FUT *e̮, orthographic õ), as in Estonian / Votic / Livonian, or a harmony-neutral *e, as in Finnish–Ingrian–Karelian and Ludian–Veps (pp. 15–16).

[The latest word on PF *ë has actually recently come out in Petri Kallio’s Festschrift, where Jaakko Häkkinen argues, with further commentary from me & Santeri Junttila, in favor of the harmonic reconstruction.]

The main phonological changes from PF to modern Finnish are *eü > öy, long mid diphthongization *ee *öö *oo > ie uo, and various syllable contractions due to consonant losses (introducing unstressed long vowels and recreating stressed ee oo). Innovations not affecting the vowel inventory per se but still adopted in Standard Finnish include coda vocalizations (e.g. *mükrä > *müɣrä > myyrä ‘vole, mole’) and some partially unexplained vacillation in vowel length (e.g. *kärmeh > käärme, SW kärmes ‘snake’).

[There are also many further shifts like conditional *e > ö or unconditional *ä > ɛ, particular to specific dialect areas, today sidelined due to the archaizing nature of Standard Finnish. Most Finnish dialects other than Savonian are overall fairly conservative in their vocalism however.]

pp. 18–22: Preambles for comparative Uralistics [§ 5]
Some known results and concepts:

  • (pp. 18–19) the sprawling terminology of Uralic “intermediate” proto-languages, from Proto-Finno-Samic to Proto-Finno-Ugric, and whether any of this is really necessary (hardly).
  • (pp. 19–21) the Proto-Uralic word root as canonically bisyllabic, but with impoverished second-syllable (“stem”) vocalism; including the debate on if the non-open stem vowel should be reconstructed as *i, *e or *ə. In particular, extensive metaphony and vowel reduction across the Uralic languages means that most of the time it is convenient to operate with vowel combinations such as *i-ä or *ä-ə, instead of individual vowels.
  • (p. 21) the recent result that Finnic *a-ë actually continues three different PU vowel combinations: *a-ə, *ë-ə and *ä-ä.
  • (pp. 21–22) the distinction between primary and secondary long vowels in Finnic. The latter arise by syllable contractions after loss of “vocalizing” consonants *j, *w, *ŋ, *x, while the former do not — they are usually held to constitute the oldest group, and it is them that my work focuses on. The post-Proto-Finnic myyrä, käärme type could be probably furthermore called tertiary long vowels.

pp. 23–24: Early Proto-Finnic [§ 6.1]
(Now entering the extensive research history section.)
The main Finnic-Samic and Finnic-Mordvinic vowel correspondences were worked out already in the late 1800s. It turns out that the 16-member Proto-Finnic vowel system has retained more original contrasts than the 9-member Proto-Samic and the 6-member Proto-Mordvinic systems (well, duh); and, per a “prestructuralist” result due to A. Genetz, reshuffling other than simple mergers in the latter two has mainly taken place by metaphony, so that variable correspondences between them and Finnic do not require reconstructions like deriving Finnic *o partly from an *o and partly from an *ɔ. Later loanwords from Finnic into Sami greatly obfuscate this though, and it is imperative to use evidence from other Uralic languages to figure out what is really old native vocabulary and what is not. These add up to set up the Finnic system as looking essentially archaic when compared within West Uralic.

pp. 25–26: W. Steinitz (1944) [§ 6.2.1]
An overview of the earliest big system of Proto-Uralic vowel reconstruction, due to Wolfgang Steinitz and based primarily on Khanty and Mari. This included no long vowel subsystem, but instead a reduced vowel subsystem (*ĕ *ö̆ *ŏ = [ɪ ʏ ʊ]), supposedly retained in these two key languages. For the primary (non-contracted) long vowels in Finnic, three important distributional limitations were noted:

  • only *ii *ee *oo *uu seem to occur
  • only in open syllables
  • only in *ə-stems

In other words, words like piika ‘maid’, näätä ‘marten’, huosta ‘care’, tyyni ‘calm’, kaari ‘arc’ would have to be either not native vocabulary at all, or to involve secondary i.e. contracted long vowels.

This setup was proposed to result from a conditional development of *i *e *o *u into long vowels, versus the reduced vowels then giving short *i *ü *u; *ĕ partly also *e. No real solution was offered on where Finnic *oCe- would come from. *a *ä were supposedly left outside this lengthening rule.

pp. 27–33: E. Itkonen & followers [§ 6.2.2]
The previous was quickly countered by Erkki Itkonen, citing extensive counterevidence for the supposed vowel lengthening rule in Finnic, as well as evidence for reduced vowels in at least Mari clearly being secondary development from non-reduced *i *ü *u. What was set up instead was the “Finnic icebox” theory, according to which Proto-Finno-Permic (defended in detail by Itkonen), Proto-Finno-Ugric (defended more sketchily) or even already Proto-Uralic (only as a suggestion and not by Itkonen himself) had a defective long vowel system with just *ii *ee *oo *uu. Often but not always Steinitz’ two other distributional restrictions were included also. Over the 40s to 60s, evidence was gradually dug out from pretty much all across Uralic that the Finnic primary long vowels have distinct correspondences from their short counterparts, and so cannot be derived from the corresponding short vowels by a late lenghtening rule. However, it was never argued why these correspondences should be reconstructed with the same values as in Finnic — aside from the circular assertion that “the Finnic vowel system is archaic”. To be fair, Itkonen had originally based this on the evident archaicity of the Finnic root structure, i.e. the fact that Finnic maintains PU second-syllable *-a and *-ä, but the idea had slowly grown from there into dogmatic insistence on general archaicity.

pp. 33–35: M. Lehtinen (1967) [§ 6.2.3]
Just one paper from the 60s seriously considered the reconstruction of the correspondences of the Finnic long vowels from a new angle. This included the insight that since already in Samic and Mordvinic, the Finnic vowel combinations *ee-e and *oo-ë have the same correspondences as *ä-e and *a-ë, the former two should be assumed to result from raising of earlier *ää and *aa, and that this could be then used to revive the vowel lengthening theory in part. This, alas, went by with little attention (and what little there was included a takedown by Itkonen that glossed over the key insights entirely).

pp. 35–38: J. Janhunen (1981) [§ 6.2.4]
Basic comparative work on Uralic reconstrunction was rekindled after Juha Janhunen had in 1977 released a usable reconstruction of Proto-Samoyedic. An earlier work, Sammallahti (1979), focused more on the consonant system, and for vocalism only involved very vague “structural comparison” of the first-syllable vowel inventory of PSmy with the ice-box reconstruction of PFU, yielding no substantial results. Janhunen’s own contribution instead begins by dividing the data into vowel combinations rather than just first-syllable vowels in isolation, which allows him to uncover a number of metaphony developments: several in PSmy, and at least one that he tentatively dates to Proto-Finno-Permic. Not through any analysis of Permic data in particular however, but rather “inherited” from the fact that this is how far back detailed comparative work by Itkonen had reached. Both these works also reinstated a back unrounded vowel *ï (FUT *i̮) into the proto-system. This had already been proposed in several early-1900s works including Steinitz’, then sharply denied by Itkonen. Sammallahti and Janhunen give no references whatsoever to any of this earlier work, though. Janhunen has one passing mention that Ugric evidence might also support the reconstruction of *ï, but even this could be an independent rediscovery.

Janhunen had a new proposal entirely for the Finnic primary long vowels: similar to Indo-European, these would come from vowel + a laryngeal, transcribed by him as *x (= FUT for “unknown consonant”, not IPA [x]). These *Vx sequences were on their other leg based on Samoyedic *Və, including then the possibility that this “laryngeal” was indeed already a vowel to begin with. Or rather, had been vocalized already in PU. The normal Uralic root structure is *(C)V(C)CV, and *x would best fit here in the coda consonant slot. This would also naturally result in long vowels being restricted to open syllables. To explain the apparent lack of *CVVCA, Janhunen proposed shortening of the FP long vowels in this root type. The lack of *aa and *ää was explained by raising to *oo and *ee, i.e. same as Lehtinen, though perhaps again reinvented independently. No explanation was given for the lack of primary *üü.

pp. 38–39: Summary of theories [§ 6.2.5]
The previous four views constitute the main theories presented for the origin of the Finnic primary long vowels. I note in Table 3 some commonalities between most of the two: e.g. Steinitz and Itkonen agree in deriving the *ii : *i, *uu : *u contrasts already from an original quantity contrast, with or without reshuffling; Itkonen and Janhunen agree in setting aside distinct vowel combinations as the source of the PF long vowels. It may be additionally worth noting that while Steinitz, Itkonen and Lehtinen all comment on one another, Janhunen’s work started off as “disconnected” from the discussion, nominally by the excuse that he was researching Proto-Uralic rather than Proto-Finno-Ugric.

pp. 39–42: E. Tálos [§ 6.3.1]
Something looking like a fifth theory was also published in a somewhat little-known paper by Endre Tálos in 1983; so just two years after Janhunen’s. On closer reading of his formula-dense presentation though, this is rather a much-derived update on Lehtinen’s theory, combined furthermore with taking Ugric vocalism as more archaic and western Uralic as more innovative. Notably Tálos sides with Itkonen in rejecting Steinitz’ and Lehtinen’s idea of Finnic *ee and *oo being in some cases derived from earlier *e and *o, but still sides with Lehtinen’s (and Janhunen’s) idea of them being however derived from earlier *ä and *a. This is further combined with deriving *ii and *uu from (the same source as) *e and *o. At this point, then, Tálos’ theory actually turns out to fulfill also Itkonen’s much-insisted “boundary condition” that long *ii *ee *oo *uu always have a different source from short *i *e *o *u, respectively — but starting from a more parsimonious original system.

pp. 42–: Sammallahti (1988) [§ 6.3.2]
A massive synthesis of Janhunen, Itkonen and Steinitz’ reconstructions was sketched out by Sammallahti in the 1988 handbook The Uralic Languages. He jettisons his own earlier 1979 ideas and rather takes Janhunen’s Proto-Uralic framework as the starting point, and demonstrates that slight variants of Steinitz’ and Itkonen’s reconstructions, now labeled as Proto-Ugric and Proto-Finno-Permic, are derivable from it. These results, among other things, cement the reconstruction of PU *ï, now clearly based on evidence also from Ugric and Permic. The model ends up as a compromise of “least quarrel” rather than of maximum parsimony, though, with the three different prominent reconstructions siloized in their own taxonomic units; and without a single word on either Lehtinen’s work, or on Steinitz’ and Itkonen’s points of substantial disagreement.

pp. 44–45: Addenda [§ 6.3.2]
Come the 2000s, some fine-tuning of Sammallahti’s vocalism model was eventually given as well. This perhaps had required the earlier publication of articles critizing most of the traditional Uralic subgroups such as Finno-Permic. The outcome was mainly to expand “Janhunen’s territory” at the expense of Itkonen — e.g. Ugric and Mordvinic do not show evidence for the shifts *ä(x) > *ee and *a(x) > *oo, and therefore, rather than presuming back-developments *ee *oo > *ä *a, the qualities *ä and *a in these languages should be simply taken as archaisms.

pp. 45–50: Lehtinen’s Comeback [§ 6.3.3]
The evolving standard model of the PU vowel system had up to this point remained a Janhunen / Itkonen / Steinitz compromise — or by 2010, essentially just Janhunen / Steinitz, with Itkonen’s model having been gradually pushed to irrelevance (as I show in detail in the list on pp. 47–48). Lehtinen’s model was however (re)introduced into the discussion only at the beginning of the current decade, in a 2011 article by Reshetnikov & Zhivlov. This quickly led to an article by Ante Aikio the next year, which ended up replacing the until then expansive region of Janhunen’s reconstruction with what could be by symmetry called “Lehtinen’s reconstruction”: a PU vowel system with no extant or incipient vowel length at all, with length arising entirely secondarily in Finnic. This involved also presenting a new origin, independent of the Finnic primary long vowels, for Samoyedic *Və sequences, the only other source of evidence Janhunen had been able to propose for his *Vx reconstruction.

Even more impressively, Aikio shows that Lehtinen’s conditions for *a > *oo don’t just apply to “regular” *a, but also to what was by then still considered “irregular” *a from secondary sources (they were subsequently promoted to regular just a few years later; cf. § 5). Instead of three different sound changes having the exact same conditioning factors and output, we clearly should then assume only one sound change — but this then requires that it in fact takes a normal short *a as its input, and not anything else like *o or *aa.

p. 50: Lehtinen’s Law [§ 6.3.3]
We should take a step back to appreciate what has been accomplished here: an originally humble suggestion has, about 45 years after its first proposal, turned out to vindicate a relatively parsimonious eight-vowel reconstruction of Proto-Uralic. Other things being equal, this is clearly an improvement over either Steinitz’ or Itkonen’s systems, both of which required more extensive 11-vowel proto-systems. The key has been the identification of somewhat intricate but still plausible conditioning factors for the rise of long vowels in Finnic. This latter development surely then deserves being promoted to the status of a named Sound Law, even though such results have not often been recognized in Uralic studies.

[While “Lehtinen’s Law” is my own coinage, originally back on this blog in 2013, the first published use is actually not: that milestone goes to Patrick O’Rourke in a 2016 article.]

[Additionally perhaps worth noting: I have seen @Laws_of_IE recently note that even in the much larger field of Indo-European studies, so far essentially all named soundlaws have been credited only to men. Lehtinen’s Law would not however be the first example against the tide: in Uralistics, Edith Vértes can already claim a third-of-a-credit for a “rule” that was nascently named in 1973 (see my footnote 14). Moreover, full primary authorship of a Law has by now been granted, at minimum, to Betty Chang in the 2011 article “An Inventory of Tibetan Sound Laws” by Nathan W. Hill.]

pp. 50–53: The Close Vowels [§ 6.3.4]
Some initial thoughts on a topic that ended up being mostly cut from the finished thesis. An important starting point that should be recognized is that while the cases of mid *ee and *oo are symmetric to one another (with respect to backness), similarly *ii and *uu to one another, there is no reason to require that the history of the long mid vowels and long close vowels should be isomorphic to each other! Plenty of languages have long close vowels without having long mid vowels. Examples already within Uralic include Tundra Nenets and much of Eastern Finnic. Hence, even if Finnic *ee and *oo can be parsimoniously derived from earier short vowels, this is not an automatic licence to deny the reconstruction of *ii and *uu for Proto-Uralic.

A brief review of proposed sources for the long close vowels allows however putting together an interesting heuristic argument, based on how there is no primary long *üü in Finnic, even though PU is agreed to have had *ü among its close vowels. If *ii *uu were from something like *ij *uw, or *ix *ux, we should expect to find also something like **üü < *üw or *üx alongside. However, there is one proposal out there that would lead to a lack of **üü naturally: *ii *uu < *ej *ow (since PU had no **ö among the mid vowels). Interestingly this proposal has for long mostly been investigated in loanword studies, which can demonstrate some fairly clear examples like pre-II *pey-men- ‘milk’ → *pejmä > Proto-Finnic *piimä, yet not in works on PU reconstruction.

pp. 53–54: Summary of research history [§ 6.4]
The main arc of research history on the Finnic long vowels has probably been the introduction of the idea that the primary long vowels would be archaic inheritance, gradually pushed further and further back in origin; and then just as gradually, them getting pushed forwards again, and now ultimately considered only an innovation particular to Finnic. While this has resulted in a new, more parsimonious reconstruction of Proto-Uralic, it has also left behind quite a bit of baggage: the existing overviews of the historical phonology of the more western Uralic groups (Samic, Mordvinic, Mari, some versions of Permic) as published in the last decades of the 20th century all begin from a system with the Finnic long vowels still hanging around. I cannot hope to clean up this mess entirely in subsequent thesis chapters; just working the Finnic history itself into a more streamlined shape will have to do. But much other work lays open for the taking too.

pp. 55–59: The Proto-Uralic vowel system and its development [§ 7.1]
Moving onto comparative reconstruction, I take as my starting point the current standard eight-vowel inventory, as due to Janhunen & Sammallahti (table 5). I give a very condensed literature summary on the reflexes of the non-open vowels *e *ä *a *ë *o in the daughter language groups (table 6), including a few comments of my own on disputed or nontrivial developments. [Some of these may warrant expanding into full articles somewhere down the line.]

pp. 59–65: Lehtinen’s Law: Phonology [§ 7.2.1]
From a phonological angle, Lehtinen’s Law has more than a few interesting features. Vowel lengthening being limited to open syllables is trivial, at least. Phonetically natural but non-trivial are lengthening being limited to the open vowels *a *ä; its being limited to stems where it is followed by a weak stem vowel *ə and no other stem-forming material (even CVCVC stems like *śadək > *sadëk ‘rain’ are unaffected); and being limited to stems with a voiced medial consonant. The output seemingly being mid vowels *oo, *ee is best explained by an intermediate stage with long open vowels *aa, *ää, followed by unconditional long vowel raising. The most interesting question however might be why *j, *w, *ŋ, *x do not trigger LL. I propose that this set of segments can be transformed into a natural exception class of semivowels, if we assume sound changes *ŋ, *x > *ɰ before LL. This hypothesis can be supported by how *ŋ and *x in fact have identical reflexes everywhere in Finnic (either lost or vocalized to *w), aside from the cluster *ŋk, which is the only environment where [ŋ] survives into late Proto-Finnic (and there’s no contrast with *m or *n here anyway).

The phonologization of the long vowels resulting from LL is hard to date. *d > *t in *aadə > *aatə (> *vooci) ‘year’ suffices to turn the conditioning opaque, but this does not introduce any contrast anywhere: this is the only example of LL before *d, and there are no roots of the shape **Catə. Syncope in inflected forms such as the partitive case could work also, but is itself difficult to date.

pp. 66–69: Long vowels in loanwords: *oo [§ 7.3.1]
Loanword studies allow dating the phonemicization of long mid vowels in Finnic quite well, however. Long vowels show up in non-LL positions already quite early in Indo-European loans such as *soola ‘salt’ — and it seems possible that phonemic long vowels were essentially originally introduced as loanword phonemes. ‘Salt’ and many other examples actually derive from a late PIE *ā and not *ō (cf. *sooja ‘protection’ ← PII *sćāyā, *vootna ‘lamb’ ← Baltic *āgna-, etc.), which then provides independent evidence for my phonetically assumed intermediate stage *aa.

My new intermediate reconstruction stage even seems to solve a minor morphophonological mystery: Fi. suola ‘salt’ shows an irregular plural stem suoloi-, while normally a-stems have a plural stem in -i- when following a labial stressed vowel; plural stems in -oi- normally only occur after illabial vowels (cf. e.g. sola : soli- ‘gap, pass’ | sara : saroi- ‘sedge’). I propose this plural stem is a retention from the *saala stage, when the word still indeed had an illabial vowel in the first syllable!

pp. 69–71: Long vowels in loanwords: *ee [§ 7.3.2]
Scraping together similar evidence for an *ää stage is much harder though. Most potential examples along the lines of miekka ‘sword’ are immediately discardable from consideration, as they have back harmony (PF *mëëkka) and projecting them back to disharmonic preforms like **määkka would be in violation of pre-Finnic vowel harmony. In Germanic it seems clear that we can reconstruct an open-ish *ǣ as being more original than close-mid *ē; in Balto-Slavic this is less obvious. Elsewhere in Indo-European also Albanian shows a shift *ē > *ā [and, as I’ve learned since then, ditto Phrygian], which could be taken as evidence for *ē having been somewhat open originally. If so, then probably at least a few loanwords into Finnic indeed originally came over with *ää and were then raised to *ee.

pp. 71–74: Shortened long vowels in loanwords [§ 7.3.3]
A topic that also needs to be addressed are the cases where IE long vowels yield Finnic short vowels. Some oldest cases could perhaps predate LL entirely. I propose however that cases with *ā → *a, *ē → *ä could be from the stage of Finnic where long vowels from LL were now *ee and *oo, but new secondary *aa and *ää (from contractions like *aŋə > *aɰə > *aa) did not yet exist, so that only short *a and *ä would have been available as substitutes for [aː] and [æː].

A clearly newer layer are words that reflect PIE *ē > Northwest Germanic *ā as Finnic *a. These and some other examples seem to involve mostly phonotactic limitations that were active even up ’til Proto-Finnic, such as no long vowels before sonorant + sonorant clusters.

pp. 75–79: Long *oo in native vocabulary [§ 7.4.1]
I have assembled the vowel correspondences of Finnic *oo across Uralic in Table 7. They show a fairly clear three-part division, mirroring the fact that also Finnic *a-ë has three separate origins, and quite well demonstrating the same reflexes: my group 1 reflects PU *ë-ə, group 2 reflects PU *a-ə, and group 3 reflects PU *ä-ä.

pp. 79–87: Etymological commentary on words with *oo [§–4]
Some of the etymologies involved require in-depth discussion. I e.g. propose a new soundlaw for Permic: *waRV > *wȯRV; summarize/extend a so far unreleased argument from Kallio that Suomi ‘Finland’ is indeed after all cognate with Sámi, as has long been suspected, both derivable now quite straightforwardly from a protoform *sämä; assemble a new etymology for nuori ‘young’ from bits and pieces already known in earlier literature; and propose as a more speculative idea that vuori ‘mountain’ would actually come from a meaning ‘hillfort’, and be in turn a loan from Iranian *wāra- ‘fort’.

pp. 87–90: Long *ee in native vocabulary [§ 7.4.2]
Table 8 similarly collects the vowel correspondences of Finnic *ee across Uralic. These are more homogeneous than those of *oo, and almost all point to original *ä. Only the Permic evidence requires closer treatment, and I show that *ä-ə > *ɨ seems to be quite regular, including also several non-LL cases (e.g. *jäŋə ‘ice’ > Finnic *jää ~ Permic *jɨ). Even some seemingly different cases showing *ä-ə > *i (e.g. *kätə ‘hand’ > *ki) can be assumed to have gone through intermediate *ɨj. [I have however not treated here some other yet smaller exception groups that also exist, such as *jäsnə ‘joint’ > *jȯz.]

pp. 90–96: Etymological commentary on words with *ee [§]
Some etymological fine-tuning is required again, such as an extensive discussion of words in the semantic area of ‘turn, twist, tie, wrap, rotate, round’, and another new loanword proposal: Livonian kēv ‘mare’ ~ Eastern Sami *kiəvə ‘reindeer cow’ ← Indo-Iranian *gāw- ‘cattle’ [as already previously covered on this blog].

pp. 96–98: Exceptions [§ 7.5]
Some discussion on two remaining exceptions to LL in seemingly native vocabulary, namely *panë- ‘to put’ and *ääni ‘voice, sound’. One exception for both of *a and *ä should surely not suffice to disprove a soundlaw, though I consider some [frankly ad hoc] possibilities to explain these away too.

p. 99: Chronology [§ 7.6]
A summary of the rise of vowel length in Finnic, as worked out above in sections 7.2.–7.4., and its relative chronology compared to some other early Finnic innovations in either vocalism or consonantism.

p. 100: Closing Words [§ 8]
[Nothing to comment directly on here, but I must say I never noticed before how my thesis seems to have a very round-numbered structure — exactly 100 pages of main contents, plus with Lehtinen’s Law, maybe the key result I build on, being reached and named exactly halfway on page 50.]

pp. 101–111: Literature [§ 9]
Some statistics:

  • Total sources cited: 188
  • Total authors cited: 93
  • Source languages: English, German, Finnish, Hungarian, Russian & one appearence each of Estonian and Swedish
  • Oldest sources: Anderson (1893), cited for a minor etymological detail; and Setälä (1899), a collected edition of a two-part work originally from 1890/1891.
  • Most recent sources: Guillaume & List “(forthcoming 2018)”, actually finally only published early this year; and Kallio (2018), published some five months after my thesis, and changed from “forthcoming” to published only in the online fine-tuned version.
  • Most cited journal: Virittäjä, with 18 citations spanning from 1934 to 2013. (Regardless I’ve not followed the conventional abbreviation as Vir.: the name seems short enough to me already, especially compared to the likes of SUSA = Suomalais-Ugrilaisen Seuran Aikakauskirja.)
  • Authors with most sources cited: Ante Aikio (11½) closely followed by Erkki Itkonen (11), who I could perhaps declare the main protagonist and main antagonist, respectively, of the thesis’ research history section.
  • Author with highest relevance to sources cited ratio: Meri Lehtinen, now with a soundlaw to her name, even though she seems to have never published any research at all other than the one 1967 article.
  • Source with least relevance: Bańczerowski (1972), cited for minor etymological detail that I’ve by now found out is actually taken from Otto Donner’s old comparative dictionary (1874–1888) without proper referencing.
  • Technically unpublished: Häkkinen (2007), perhaps the most cited Master’s thesis in Uralistics so far, and Aikio (2013a), an unusually thorough conference handout (though both of these can be found available online if required)
  • Weirdest title: Tálos (1984), typographically complex enough to not be reproducible in this post.
