*je-: A Reprise

Summer’s wrapping up, a new academic year’s about to roll in, and if all goes well, I might be returning to more active blogging around here.

I have also returned, about a week ago, from the 12th International Congress for Finno-Ugric Studies. You can check out my presentation online, too: Semivowel losses and assimilations, in Finnic and beyond. Longtime readers may recall me having first explored the ideas within many blog posts (and one blog platform) ago. Evidence has continued to turn up, and I’m by now quite convinced that my newfound soundlaw *je- > Finnic *i- indeed exists.

The title is admittedly a bit more general than what might be warranted per the presentation’s contents. For space concerns, I was not able to treat the topic of initial semivowels in Uralic languages in more general.

I could mention one fairly simple addendum here, though — while *wo- has been already traditionally well-established, and I attempt to show that abundant evidence of *je- can be found as well, by contrast it seems to me that **wu- and **ji- were not possible sequences in Proto-Uralic (and they remain impossible in most descendant languages as well).

Only one widely accepted instance *wu- has been proposed: the word for ‘new’ (> Fi. uusi, Hu. új etc.), traditionally reconstructed (modulo notation) as *wud₂ə. I however suspect that this should be instead reconstructed with *o, and that the evidence suggesting *u is due to the shift *o-ə > *u-ə in open syllables, as first proposed by Janhunen (1981) for the Finno-Permic end of the family. (Seen also in e.g. *lomə > *lumə > Fi. lumi ‘snow’. Sammallahti has later suggested that the change also affected Ugric; I am skeptical, however. More on this at some point in the future.) — For *ji- there are a few more potential examples, but the best-looking cases (? *jikä ‘age’, ? *jitɜ ‘night’) fall among those where I believe *je- should be rather reconstructed.

If so, then it seems to me that we can likely apply for Proto-Uralic a phonological analysis also known from various other languages of the world: to unite *j *w on one hand and *i *u on the other as allophones of each other.

Tagged with: , , ,
Posted in News, Reconstruction

On comparison in Proto-Uralic

Here is a somewhat speculative idea that recently occurred to me. I don’t think I will be able to deliberate on all the comparative implications just now, but it wouldn’t surprize me too much if something similar had already been proposed.

A relatively well-known suffix element usually reconstructed for Proto-Finno-Ugric (implicitly Proto-Uralic, in wake of Finno-Ugric turning out to be probably not a genetic grouping) is the comparative suffix *-mpa, reflected quite well in the best-known Uralic languages: Finnish -mpi : -mpa-, Estonian -m, Hungarian -(a)bb. [1] The Samic languages also have clear cognates, e.g. the Northern Sami bisyllabic adjectives’ comparative ending -t : -bu- (from slightly earlier *-b : -bu-).

No trace of such a comparative suffix though is known elsewhere in Uralic. This smells slightly suspicious. Hungarian is, overall, quite innovative, and usually whatever clear old Uralic features have been retained there, can also be traced in at least some of its Russian relatives; especially Ob-Ugric and Permic. There’s a proposed Samoyedic cognate, but as far as I recall seeing, found only in Nenets — and only used in an approximative sense.

It’s also the case that there is a quite well-established PU participle ending *-pa. These two suffixes share the privilege of being just about the only places in comparative Uralic inflectional morphology where *p occurs; and both of them have very roughly adjectival semantics. Might it be possible to thus segment *-mpa as *-m-pa? We’d like to know if this can be made to make semantic sense; and if we can find a reasonable candidate for what the nasal element comes from.

The former question can be roughly reformulated as: “if a thing is greater (than another one), what is it doing?” To me it seems the answer would be “exceeding, being greater”. A PU “comparative” form such as *wod₂ə-mpa (> Fi. uude-mpi, Hu. új-abb ‘newer’) could then be instead analyzable as *wod₂əm-pa, meaning ‘that which *wod₂əN-s’; and which could have independently developed into an IE-style nominal comparative in Hungarian and Finno-Samic. Originally it’d have been instead the verb stem *wod₂əN- that captured the “comparativeness”, meaning something like ‘to be newer’. The approximative sense in Nenets seems well-derivable from this as well — we can easily imagine the base meaning as just ‘to be new’, and derive from this on other hand an amplification ‘to be newer’, on the other hand a mitigation ‘to be newish’.

Above I’ve written the nasal of my internally reconstructed verb stem as just -N-. While Proto-Uralic allowed heterorganic nasal+stop consonant clusters with a coronal stop, [2] there seem to be no examples of this with a peripheral stop. Only *mp and *ŋk can be reconstructed stem-medially, while there are no **np, **nk, **ŋp, **mk. So I suppose all nasal consonants are fair game here. (And, of course, in Finnic and Hungarian all such heterorganic clusters assimilate anyway.)

— Now consider Finnish verbs derived with the suffix -ne-: e.g. iso ‘big’ → isone- ‘to become bigger’; mätä ‘rotten’ → mätäne- ‘to rot’; pimeä ‘dark’ → pimene- ‘to become dark(er)’. The suffix is used almost exclusively on adjectives, and typically forms verbs meaning indeed increase in quality. This seems to provide a great candidate for a Proto-Uralic derivative class, on which the nominal-type comparatives of “Western European Uralic” could have been based. Altogether, originally a word like Fi. isompi ‘bigger’ would have been a consonant-stem participle, equal to modern Fi. isoneva ‘that which increases, becomes bigger’ (pseudo-PU *ićäwmpä ~ *ićäwnəpä).

A chief remaining problem would be whether we really can reconstruct this verbal suffix all the way to PU though. SKRK reports similar usage across Finnic, as well as possible cognates in Ob-Ugric and Samoyedic; these however indicating original *-m-! Another hypothesis mentioned would be comparison to a momentane suffix -n in Hungarian, found in some fossilized forms such as villan ‘to flash’ (seemingly related in some way to világ ‘world; (archaic) light’ < PU *wëlkə). This sounds a bit better with respect to my reconstruction, but I’d like having some more supporting evidence. And of course, I’d also have to check how well the development of comparison constructions elsewhere in Uralic can be lined up with this scenario.

[1] For clarity, I’m ignoring vowel harmony in this post.
[2] Perhaps the clearest case is *tumtə- ‘to know’, whence e.g. Fi. tuntea, NS dovdat, Hu. túd, Tundra Nenets tumtă-.

Tagged with: , , , , ,
Posted in Etymology, Reconstruction

Linkday #2: FUF online

A small discovery to report: looks like someone from University of Toronto has kindly digitized a few back issues of Finnisch-Ugrische Forschungen, old enough to be out of copyright, and uploaded them on archive.org; findable e.g. under the keyword “Finno-Ugric languages — Periodicals“. Currently available are issues 1, 5, 6, 7, 9, 10, 11, 14, 15, from between (straightforwardly enough!) 1901 and 1915.

(Though the last one actually includes an article from a young Y. H. Toivonen (1890-1956), and which would according to Finnish copyright law remain under copyright for a while still… but one hopes the Toivonen estate will not consider this a terrible injustice.)

Posted in Uncategorized

Love, pity and morphology

Finnish armas ‘dear’ has a somewhat interesting etymology: the word is considered to derive by borrowing followed by semantic amelioration from Germanic *armaz ‘pitiful’.

If we were given no other data, this argument would have to remain rather hypothetical. The shape of the word does suggest an Indo-European loan, but allowing major semantic drift as a free assumption is an easy excuse for finding Germanic or Baltic etc. loan etymologies for almost everything in the Finnish lexicon (and given some effort, I’m sure the method could be also stretched to prove that Finnish is actually, say, a highly divergent dialect of Chinese). Aluckily, according to the data reported in SSA [1], there are several additional pieces of evidence that point to a meaning ‘pitiful’ having existed in Proto-Finnic as well.

  • A “dialectal” meaning ‘pity’ is attested for the Estonian cognate (equivalent? [2]) armas.
  • The derivative *armas-ta- (> Fi. armastaa) has pity-related meanings somewhat more widely, in Karelian, Estonian and Livonian.
  • Finally, an exclusively pity-related parallel derivative from the same root appears to exist: armahtaa ‘to pardon, have mercy on’ — though attested slightly less widely: this appears in most northern Finnic varieties other than Veps, but in southern Finnic in only Votic, and it’s possible though not strictly required to consider it an Ingrian loanword in there.

The last one of these I actually find more interesting yet, though for a different reason. Namely, why does -h- appear here? It is true that the stem of armas in inflected forms is *armaha-, as in the genitive singular *armahan > Fi. armaan (~ armhan in Veps or Kven); but a stem vowel *a is not normally lost before the verbalizer *-ta-, and no consonant stem **armah- exists for this declension class.

One explanation might be haplology. Lauri Hakulinen seems to implicitly suggest this solution in SKRK, [1] listing the word under verbs derived by the momentane suffix *-ahta-. I.e. ‘to suddendly pity’ = ‘to pardon’? After this we’d have to assume contraction of the somewhat awkward stem *armahahta- to the attested armahta-.

This approach however suffers from the problem that Finnish momentane verbs are productively derived only from verb stems, not from nominal stems. Hakulinen only reports five other verbs formed in this fashion. Two of them actully derive from original *-eh-stems and they might be simple *-ta-derivatives after all (repalehtaa, roikalehtaa [3]), and for other two, derivation from a verbal stem does not seem to be possible to rule out (riemahtaa ‘to rejoice suddendly, erupt in celebration’, tipahtaa ‘to drop suddendly’ [4]). This leaves vapahtaa ‘to redeem, liberate’ (← vapaa ‘free’) as the only clear parallel.

Now vapahtaa is of course semantically very close to armahtaa, and this seems like a good reason to suspect that they may have affected one another’s formation in some way. Comparative examination however suggests that it’s probably armahtaa that is the model, and vapahtaa the remodelled verb. As mentioned, the former has cognates in multiple Finnic varieties; meanwhile the latter is restricted to Finnish. The root *armas also seems to be a relatively old Germanic loan, being found everywhere across Finnic, while vapaa < *vapada is a more recent Slavic loan, absent from marginal varieties such as Veps and Ludian. [5] So we have no clear solution here for armahtaa.

I have a different hypothesis in the works, though, that seems to fit in here quite well.

An interesting gap of general Finnic morphophonology is that no words ending in *-ah can be reconstructed for Proto-Finnic, and to my knowledge no corresponding declension can be observed in the modern languages either. This contrasts with a large number of words of the *armas type, ending in *-as : *-aha-; and an equally large amount ending in *-eh : *-ehe- (directly attestable in Karelian: hameh ‘dress’, veneh ‘boat’, etc.) A couple examples of *-es : *-ehe- exist as well (Karel. kirves : kirvehe- ‘axe’), and one or two cases of *-oh (Karel. orih ‘stallion’). Frequently we can also find among these “sibilant-final” words [6] discrepancies between the Finnic languages in the stem type: e.g. Finnish helmi ‘pearl’, a bare *-e-stem, corresponds to an *-es-stem helmes in Estonian, and an *-äs-stem ēļmaz in Livonian.

This makes me suspect that at some point in Finnic prehistory, general morphological levelling may have taken place here; that at one point, stems with a nominative *-ah existed as well, but these were later all reassigned as either *-as-stems or as *-eh-stems.

This is structural speculation so far. But I think there is at least one good reason to suspect the former existence of a class of *-ah-stems: in old enough Germanic loanwords, *s/*z are quite regularly substituted by pre-Finnic *š > Late Proto-Finnic *h. In the case of *-eh-stems, we can indeed find some direct correspondences of this stem type with the Indo-European masculine nominative singular ending *-s (> Germanic *-z): e.g. the above-mentioned *hameh ‘dress’ from PGmc *hamaz, or *padeh ‘path’ (> Karel. pajeh : patehe-) from PGmc *paθaz. [7] But *-as-stems arrive on the scene seemingly quite early, sometimes even in parallel with an *š-substitution: Fi. keihäs ‘spear’, hidas ‘slow’, from PGmc *gaizaz, *sīθaz! This all would surely be easier understandable, if we assumed for early Finnic declension patterns such as *keišäš : *keišäšä-, *šitaš : *šitaša- > ? *keihäh : *keihähä-, *hidah : *hitaha-, later levelled to the directly reconstructible *keihäs : *keihähä-, *hidas : *hitaha-.

I do not yet have a clear enough grasp of the overall picture though to say if the levelling process might have been regular in its output anywhere in the Finnic area — or even, if this should be assumed to have been a pre-Proto-Finnic or a post-Proto-Finnic process. [8]

But armahtaa seems to regardless fit into the framework quite nicely: the word would turn out to be after all a simple *-ta-causative, only one based on a now-lost consonant (nominative) stem *armah! The semantics also fit this picture: as noted above, armahtaa is an exclusively pity-related verb, with no associations of love. Noting again the semantic trajectory of the basic root word — Germanic ‘pitiful’ → presumable earlier Proto-Finnic ‘pitiful; dear’ > later Proto-Finnic ‘dear; pitiful’ > modern Finnic ‘dear’ — this verb was thus probably formed at an earlier time than armastaa, perhaps before the development of the meaning ‘dear’ entirely.

[1] “SSA” and “SKRK”, two indispensible sources in the study of Finnish etymology and morphology, have now been added to my Bibliography page.
[2] I sometimes feel that a pair of words in closely related languages that have the exact same meaning and shape should perhaps be described in stronger terms than “being cognate”. Given that we are usually comfortable saying that a given word “exists”, as a single entity, in several distinct dialects — and that the language/dialect distinction is arbitrary — it might be useful in an etymological context to claim that e.g. English mouse and German Maus are not merely “related”, but in fact the exact same word, just spelled in two different ways. This issue comes up the most often in etymological dictionaries, where a traditional “every language has distinct words” approach will sometimes lead to heavy repetition: “Finnish armas is cognate to Ingrian armas, Karelian armas, Estonian armas, Votic armas…”
[3] Though neither of these is familiar enough to me that I could do a closer semantic assessment of this solution.
[4] These seem like they would likely be derived from riemuita ‘to rejoice’, tippua ‘to drop’ rather than the bare roots riemu ‘joy’, tippa ‘drop’.
[5] Of course, currently Veps and Ludian are anything but marginal when it comes to Slavic contacts; but the oldest Slavic loans in Finnic appear to predate late Proto-Slavic proper (this one as well: reflecting early PSl *svabadā rather than late PSl *svoboda), and they were probably adopted from the archaic Old Novgorod dialect, with a main contact area close to Ingria and the Pskov region.
[6] Recall that Finnic *h < *š.
[7] I do not recall offhand what is the standard explanation of the 2nd-syllable *e of these, though.
[8] This might even have a few repercussions for Finnic historical phonology, but I will refrain from going into the topic for now.

Tagged with: , , , , ,
Posted in Reconstruction

Five Shortcuts to Writing a Heavyweight Etymological Dictionary

Minor apologies for the clickbait-satire title (I do not actually enumerate any shortcuts in this post), but the arriving summer is making me jocular I guess. :)

My current stop on what seems to be turning into an unofficial world tour of etymological research is Dravidian. I was prompted to look into this after noticing a copy of Elli Marlow’s PhD thesis A Comparison of Uralic and Dravidian Etymological Vocabularies (1974, University of Texas) at the University of Helsinki library, and picking it up for examination. This proposed relationship is a curious one in that it always seems to have generated more interest on the Dravidologist than on the Uralicist side. Marlow, too, is a Dravidologist who seems to have little other experience in Uralic studies aside from having easy access to specialist literature by virtue of her Finnish roots.

Her work at least appears to dodge one of the worse pitfalls of comparison between language families, i.e. ignoring pre-existing etymological research and instead data-stripmining individual languages of a family for lookalikes. She refers strictly only to the data of the most up-to-date sources of her day of the old vocabulary in Uralic and Dravidian: Collinder’s Fenno-Ugric Vocabulary (1st edition 1955) and Comparative Grammar of the Uralic Languages (1960); Toivonen, Joki & Itkonen’s Suomen kielen etymologinen sanakirja (1st volume 1958, 4th 1969); and Burrow & Emeneau’s Dravidian Etymological Dictionary (1st edition 1961, followed by a “supplement” in 1968).

The end result is regardless that working mainly on about 1000 etymologies from Collinder, she finds Dravidian parallels for no less than 786 of them. Most of them retain decent semantic and phonetic resemblance, too. I have considered doing a full count, but e.g. taking items #200-#205 at random, we have the following comparisons (with approximate Dravidian reconstructions by me, to save some space; Uralic reconstructions Collinder’s):

  • Uralic *ńärpä- ‘thin, sparse’ ~ Dravidian #ñēr- ‘(to grow) thin’
  • U *jäpće ‘roasting spit’ ~ D #cappū ‘thorn’
  • U *ńola- ‘to crawl’ ~ D #neḷi- ‘to crawl’ / #nur̤ai- ‘to creep’
  • U *ńoma- ‘to creep’ ~ D #nāmpu ‘climbing plant’
  • U *ńowŋa ‘a salmonid fish’ ~ D #naŋku ‘a fish species’
  • U *ńorɜ ‘type of moss’ ~ D #ñir ‘water’

The second of these I would say is entirely unconvincing, and the sixth seems like a stretch as well, but the other comparisons are tolerable at face value. I could further criticize several of them on inter-Uralic grounds, of course. But this altogether still seems to leave a situation where, if not quite in the ballpark of 75%, then still some 30-50% of the best-established Uralic material would have fair-looking parallels in the DED.


The plot twist though is that the DED and its addenda include a bit over 5500 etymologies — a rather bigger pool of data, and it’s obvious that this significantly increases the odds of finding something that appears similar by pure chance.

But instead of continuing to a detailed assessment of the Uralo-Dravidian hypothesis, I have a different question I’d like to ask. Namely: how does one manage to assemble that many Proto-Dravidian etymologies? The family is supposed to be relatively old and diverse, on the same order as Uralic (though clearly less so than Indo-European). Should I be afraid that the dictionary it is an “anything goes” hodgepodge? Or that they are including younger Indo-Iranian loanwords all over the place?

Burrow & Emeneau’s work is currently available online, so this matter is relatively simple to investigate. Again, rather than going in for the full grind, I’ve taken a small sample: the etymologies implicitly reconstructed as beginning with word-initial short *a-, 338 of them altogether. And yes, it turns out that there are a few issues.

The foremost problem appears to be that Proto-Dravidian vocabulary and vocabulary of more limited distribution are mixed at will. Of the 338 words, only 199 appear in at least two main branches of Dravidian; 139 of them are restricted to a single sub-branch — in 118 cases South Dravidian, and even of these a significant portion are found in nowhere else but in Tamil and its close relative Malayalam! It would be easy to get the impression that one is reading not a “Dravidian Etymological Dictionary” but an “Etymological Dictionary of the Native Tamil Lexicon”. And if that were the case, 5500 words starts sounding entirely normal.

Or perhaps slightly more generally: “Tamil, Kannada and Telugu”. Within the wider-ranging comparative material, almost all of the data is represented in two or three of these old major literary languages of southern India. I moreover count 83 etymons that seem to also feature Telugu as the only non-south-Dravidian language to have a reflex, which makes me rather suspicious on if they all should be assumed to descend from the common ancestor of SD and Telugu; especially when the only lesser non-SD language to reach a vocabulary retention rate of more than 20% is Gondi. Even this is apparently actually a highly dialectally diverse grouping, as the DED and also e.g. Glottolog separate it into more than half a dozen sub-varieties.

Now, sure, we could attribute this terrible track record of all the unwritten central and northern Dravidian varieties to simply underdocumentation + massive Indo-Aryan influence. But whenever a DED entry does turn up with a reflex in one of these, it usually appears in quite a few others just as well. The distribution appears to be a bit too neatly divided between “poorly attested” and “well-attested” cases. So my suspicion would be that additionally quite a few of the DED entries represent old cultural loanwords that have diffused between the written Dravidian languages, and which do not go back to Proto-Dravidian (possibly not even Proto-South Dravidian)…

Another possible issue yet is that the known main Dravidian subgroups might still form some intermediate units. In doing some related research, I’ve run into some interesting claims from David W. McAlpin (perhaps best known for the Elamo-Dravidian hypothesis) along these lines — e.g. a suggestion that “Northern Dravidian” does not exist, and instead Brahui (the northwesternmost Dradivian language, spoken in southern Pakistan) and Kurukh-Malto (the northeasternmost Dravidian languages, spoken in East India) are primary branches of Dravidian, along with everything else a third branch. Geographically, this would certainly make sense; we know that the Indo-Aryan languages have entered Southern Asia from the north, and surely they must have in this process taken over at least some amount of territory that had formerly been Dravidian-speaking. McAlpin even has a paper where he suggests that Brahui might be closer related to Elamite than to Dravidian proper; though I do not know if he has held on to this idea.

In any case it looks like that distilling together some kind of a core Dravidian lexicon from the DED would be recommendable for any wider comparative purposes. Reaching all the way down to e.g. vocab found just in Tamil and its closes neighbors, or Telugu and its closest neighbors, is exactly the “data-stripmining” problem I mentioned Marlow having seemingly dodged. But, apparently, not one of her main sources.

Tagged with: , , , ,
Posted in Commentary, Methodology

Finno-Ugric đ

A note for my Finnish-proficient readers and other interested people: I’ve transcribed and uploaded Arvid Genetz’ presentation Suomalais-ugrilainen đ ensimmäisen ja toisen tavuun vokaalien välissä (published 1896) on Wikisource.

This is one of the three works to have come from Finland in the mid-1890s where people seemed to discover just about in parallel that no, original *t does not become Hungarian /l/, and that an entirely different proto-consonant needs to be set up instead for words like Hu. elő ~ Finnish ete- ‘fore, front’; Hu. tele ~ Fi. täyte- ‘full’. A nice demonstration of the internal consistency of the comparative method.

Genetz here does not manage to distinguish the plain and palatal variants *d₁, *d₂ though; that detail would be instead proposed by his more famous student E. N. Setälä.

It’s an interesting read from a 21st century perspective. Plenty of standard etymologies are already in place. There however remain several etymological comparisons that I can only describe as pre-scientific (e.g. Hungarian bűz ‘to smell’ ~ Finnish mätä ‘rotten’). Others yet look like they might be onto something, but seem to have remained without further investigation over the last dozen decades, e.g. an analysis of Fi. hiukka ‘small amount’ as a deminutive of hitu ‘thin bit, flake’.

The Finnish of the times is interesting to read, too. Dated enough to be obviously old, but still perfectly understandable. Perhaps the strangest thing might be a detail of Genetz’ phonetic terminology: he describes /ð/ as alveolaarinen vs. Hungarian /z/ as dentaalinen, the exact opposite from what we’d expect today.

Tagged with: , , ,
Posted in Uncategorized

Geography-constrained family trees?

A complaint that often comes up in introductions to studies on computational phylogenetics is that the number of possible binary trees grows quite fast (loosely factorialesquely) as a function of the number of entities we are attempting to relate. This means that regardless of what method is being used, it is for larger datasets usually not possible to investigate all possible trees on how well they would fit the data.

I wonder though if there might be though a shortcut that would allow simplifying linguistic analyses, in particular, a fair bit: geographical restrictions on branching. Languages, for a rough generalization, exist on territories within the 2D surface of the Earth; they have a number of neighbors, and usually have a neighboring language as their closes relative. Languages far separated from their relatives tend to be exceptional cases, often confirmable as recent intrusions by history.

So, instead of investigating arbitrary binary trees, we perhaps ought to only investigate binary trees where each division leaves two currently or historically contiguous groups of daughter languages. This restriction seems likely to be quite powerful in paring down the seemingly intractably vast hypothesis space.

Consider the simplest possible class of examples, the linear dialect continuum:

  • If given a western dialect A, a central dialect B, and an eastern dialect C, the only sensible phylogenetic hypotheses are ((A B) C) and (A (B C)); the tree with B as an outgroup is rejectable.
  • if given four dialects A B C D, we will accept five trees (A (B (C D))), (A ((B C) D), ((A B)(C D)), ((A (B C))D), (((A B) C) D), and reject ten.
  • if given five dialects, we will accept 14 trees and reject 91.
  • if given six dialects, we will accept 42 trees and reject 903.
  • if given seven dialects, we will accept 132 trees and reject 10 263.
  • if given eight dialects, we will accept 429 trees and reject 134 706.
  • etc; as can be seen, the efficiency of this pruning method grows quite fast.

As I’ve seen recently remarked, 50 languages suffice to generate about 3·10⁷⁶ binary trees, almost as many as there are atoms in the universe (not actually more, but close enough). — Meanwhile string bracketings grow rather more slowly, and with 50 dialects we only reach about 2·10²⁷.

This is still huge enough (i.e. about 2 000 000 000 000 000 000 000 000 000 trees) that I don’t think my method here will help much on studies on Austronesian phylogeny in particular. But in smaller cases, with on the order of 10-20 language varieties under investigation, the difference will be quite significant.


In a real-world situation, dialect geography rarely works quite this easily though. I suppose a slightly better test model might be polyhex connectivity graphs — these allow up to 6 neighbors for each individual language, something that ought to suffice for most cases. I might be tinkering with these in the near future.

Obviously this constraint can still be violated by occasional majorly expansive languages, e.g. Persian. I suspect we do not have sufficient information to tell right away which of the dozens of minority Iranian languages were its original neighbors. Though in these cases a solution that’s in principle available is to divide such a language itself into a large number of tiny dialects, and treat each of them separately…

Tagged with: , , ,
Posted in Methodology

Etymology squib: Ääli

Recently I’ve been gradually working on assembling (together with some other editors) an appendix of Proto-Finnic roots for the English Wiktionary. Today I ran into an interesting issue: the word for ‘voice’.

Finnish and Votic ääni, Karelian eäni ~ iäni, Ludian iäń, Veps äń allow straightforwardly putting together PF *ääni. Given Samic *jēnë ‘voice’ and Hungarian ének ‘song’, this is evidently inherited already from Proto-Uralic.

The reason for the long *ä in Finnic is not entirely clear. In wake of Ante Aikio’s recent re-defense of “Lehtinen’s Law”, i.e. the development *ä > *ee before single sonorant + *ə, we’d actually expect **eeni; his paper suggests that raising was in this word for some reason blocked in the word-initial position, given how Proto-Finnic has no *ee-. There is no counterevidence for this — but neither are there any parallels, and so the question appears to be impossible to settle.

Southern Finnic languages however at first glance seem to have a different root: Estonian hääl, Livonian ēļ, Votic ääl (in some other dialects; I don’t think I will check right now their distribution) suggest instead *hääli.

Yet this is similar enough that etymologists have long considered it possible that they in fact also derive from *ääni, with some kind of irregular deformation. According to SSA,[1] this observation is due to Ahlqvist already in the mid-1800s. If so, then it appears to be possible to presume an earlier alteration to *ääli, followed by secondary addition of h- in Estonian. There is no reason to assume that aspiration was ever present in Votic or Livonian (although, of course, if it were, it would have disappeared all the same).

But how back would this variant go, then? Estonian and Votic are fairly closely affiliated, but Livonian is not; arguably it is more distantly related to the two than Finnish and the rest of the Northern Finnic continuum are. Formally, the variants *ääni ~ *(h)ääli, as attested side-by-side in Votic, should therefore be both reconstructed already to late Proto-Finnic. (If it wasn’t for Samic and Hungarian supporting *n, we could even suspect that the direction of alteration was rather *ääli > *ääni, and that the “Northern” variant in Votic was an Ingrian loanword.)

Positing *ääli already in PF seems to allow / be further supported by a new etymology, for Finnish ääliö ‘idiot’. This is usually explained as “onomatopoetic”, but the morphology doesn’t support this idea; the ending -iO is just about nonexistent in onomatopoetic words. I suspect we rather have here an old example of stereotyping disabled people by their behavior, and that the word is a derivative of the *ääli variant of ‘voice’. Thus, *ääl-ijo ‘sound-maker’ > ‘person who makes weird noises’ > ‘idiot’.

It’s interesting to moreover compare Finnish hälistä ‘to be noisy, to chatter’, älistä ‘to cry, to moan’. These two are evidently onomatopoetic variants of each other — but if we allow the option that *ääli existed already in late Proto-Finnic, we could just as well also allow the existence of a variant *älə in early Proto-Finnic, prior to Lehtinen’s Law. And it appears to be the case that sufficiently early derivatives with an altered stem vowel were not subject to LL. Aikio has noted some examples such as Fi. säle ‘splinter’. Another interesting case I’ve noticed is *mälə > *määlə > LPF *meeli > Fi. mieli, Es. meel ‘mind’, contrasted with PF *mäl-o > Estonian mälu ‘memory’. If so, then a derivative *äl-ićə- could have regularly yielded älistä.

This scenario seems to additionally provide a tiny amount of extra support for the idea that *ää > *ee was blocked in word-initial position: we’d now have two examples of it, i.e. both *ääni and *ääli. (Effectively perhaps less, since they’re variants of one another.)

If I wanted to further proliferate stem variants, I could also posit *hälə coming into existence already this early, and derive hääl on one hand, hälistä on the other from this in a similar fashion. But a *h this early is anachronistic (LL precedes *ð > *t, which precedes *ti > *ci, which precedes *š > *h), and in light of words like hello, hallo it is clear that /h/ can have an onomatopoetic function in ‘voice’-related words. It seems thus reasonable to consider it to have come about independently in Estonian and Finnish.

[1] i.e. the etymological dictionary Suomen sanojen alkuperä (1992–2000), ed. Erkki Itkonen & Ulla-Maija Kulonen.

Tagged with: , ,
Posted in Etymology, Reconstruction

Linkday #1: On computational phylogenetics

I think I’d like to have more content up on this site, despite being tied up with studies and life’s other little distractions from research. Showcasing some interesting articles might work for that, even when I don’t have detailed critique to offer myself on their topics. (There might be some drift away from exclusively Uralic topics, while I’m at that.)

For starters, I’ll bring up a post from the evolutionary linguistics blog Replicated Typo: Reconstructing linguistic phylogenies — a tautology?

This is a relatively old post (2011), and yet captures a number of problems that I keep seeing in computational typological studies.

Two comments, though:

  • The diffusability of sound changes across lineages means that no, establishing a sound correspondence is not quite the same thing as establishing phylogeny. After all, if reconstructing a proto-language automatically also generated a phylogenetic tree of its descendants, there would likely be no need for computational phylogenetic studies in the first place!
  • I’m not sure if I agree that identifying words as cognates presupposes that the languages they occur in are related. There’s a narrower and a wider sense of “cognate” out there: the first is indeed restricted to words related by common descent — but when we’re talking about more hypothetical relationships, the word can also mean “of the same origin thru some means, possibly but not necessarily involving borrowing”. A typical example would be the Uralic and Indo-European words for ‘water’ or ‘name’, for which there is a widespread consensus for some kind of a relationship, but two different camps on how they should be explained.
Tagged with: , ,
Posted in Links, Methodology

A morphophonological place avoidance effect in Finnish

I brought up Similar Place Avoidance (SPA) a couple of posts ago. Here is a neat case study of it in action, one that I have already noted quite some time ago.

An Introduction

The Finnic languages are usually considered to have no strict noun/adjective division, and adjectives are analyzed as the same part of speech as nouns. But this does not mean that there would be no visible differences between words that have regular nominal semantics (“substantives”, in the Finnish grammatical tradition) and those that have adjectival semantics. While there are a couple of underived, bare-root adjectives (e.g. kova ‘hard’; nuori ‘young’), most Finnic adjectives are marked with an ending that reveals their semantic function.

In Finnish one of the more common adjectival endings in “adjectival roots” is the suffix -ea, -eä (in the forthgoing marked together as -eA). This is in contrast to suffixes that derive adjectives from words that, as their bare root, function as nouns. E.g. punainen ‘red’, with the highly common adjectival ending -inen, derives from a separate noun puna ‘redness’, whereas valkea ‘white’ does not allow a synchronic morphological division into a self-standing root with one meaning + a suffix with another. [1]

An Issue

There is however a curious statistical gap in the distribution of -eA: it seems to shun preceding dental consonants. Using the Nykysuomen sanalista wordlist as a reference, there are no Finnish adjectives ending in -neA to be found, and no more than six ending in -teA, either:

  • kiinteä ‘solid’
  • kostea ‘moist’
  • lattea, litteä ‘flat’
  • nuortea ‘youthful’
  • pirteä ‘cheery’
  • reteä ‘chill’

A seventh could be implied in vetreä ‘spry’, which seems to come via metathesis (perhaps by influence of potra ‘thriving, brisk; usually only in potra poika‘?) from earlier *verteä. Compare verrytellä ‘to stretch, to flex’, which appears to share the same stem /vert-/. [2]

For comparison, we can count e.g. adjectives ending in /-peA/. There are more than twenty of these:

  • apea ‘sad’
  • hempeä ‘romantic’
  • hilpeä ‘jocular’
  • hulppea ‘extravagant’
  • kalpea ‘pale’
  • kapea ‘thin’
  • kepeä ‘light’
  • kipeä ‘sore’
  • kirpeä ‘sour, crisp’
  • kopea ‘arrogant’
  • leppeä ‘mild (of weather)’
  • nopea ‘fast’
  • nyrpeä ‘grumpy’
  • rapea ‘crisp, crunchy’
  • ripeä ‘prompt’
  • suopea ‘benevolent’
  • suppea ‘concise’
  • turpea ‘swollen’
  • tympeä ‘stale’
  • upea ‘fantastic’
  • ylpeä ‘proud’

/t/ is, overall, a more common consonant than /p/ in Finnish, so getting this kind of a result is not a priori expected.

I have also counted examples with other consonants. An uneven distribution clearly biased against dentals continues: there are e.g. on the order of 60 adjectives ending in -keA, and about 20 in -meA.

The result is however quite understandable in light of the SPA principle. The Finnish adjective ending -eA comes from earlier *-eðA < Proto-Finnic *-edA < Proto-Uralic *-ətA. This would be the easiest to demonstrate using the peripheral Finnic languages Veps and Livonian, which retain PF *d, but even some examples older yet can be found:

  • Fi. dialectal kalkea ‘hard’ ~ Moksha /kalgəda/ ‘id.’ < *këlkəta
  • Fi. tankea ‘stiff’ ~ Moksha /taŋgəda/ ‘id.’ < *tëŋkəta
  • Fi. oikea ‘right’ ~ Moksha /viďä/ ‘id.’ < #wɜjkəta
  • Fi. pimeä ‘dark’ ~ Komi /pemɨd/, Udmurt /peĺmɨt/ ‘id.’ ~ Proto-Samoyedic *pəjmətä ‘id.’ < #pid₂mətä

So we can expect SPA to have intervened, over the course of millennia, to somehow clean out any undesirable sequences of two syllables beginning with dental stops. Of course, the modern Finnish ending has no signs of a dental element anymore, and so we could perhaps hypothetize that the six or seven exceptions have been formed only after the loss of the segment. (Indeed, as far as I can tell, none of them have exact equivalents in any related language, and most are a somewhat limited even in their distribution across the Finnish dialects.)

An Analysis

But how has this worked exactly in practice? There is no shortage of Finnic word roots with medial -t-, and a fair number with medial -n- as well. Does this imply that words that once upon a time ended in *-tedA and *-nedA have changed to something else?

I believe the solution has been instead morphological. As an adjectival ending, -eA still has several competitors in Finnish, and every so often we can find sets of essentially synonymous adjectives (with only minor differences in register and tone) that differ only in what suffixes are employed. Examples that can be noted in modern Finnish include -Ut (as in kevyt ‘light’), -(A)kkA (as in kalvakka ‘pale’, rivakka ‘prompt’), and participles such as the past active -nUt (as in turvota ‘to swell’ → turvonnut ‘swollen’).

One suffix that comes particularly close to -eA in shape is the regular present active participle -(e)vA, also commonly repurposed for deriving adjectives. [3] OK, the preceding vowel is taken from the verb stem and is not a part of the suffix: mene- ‘to go’ → menevä ‘going; busy’, but osu- ‘to hit’ → osuva ‘hitting; apt’, or paina- ‘to press, to weigh’ → painava ‘pressing; heavy’. But the interesting part is the existence of a couple of adjectives that seemingly possess this ending, and yet are not derived from any known verb. Frequently they seem to derive from nominal stems instead. What is more, quite a few of these are both apparent e-stems, and have a preceding /t/:

  • etevä ‘skilled’ ← esi : ete- ‘fore-‘ (but not ‘to advance’)
  • harteva ‘wide-shouldered’ ~ hartia ‘shoulder’ (no verb ˣharte- ‘to be shouldered’ exists)
  • jäntevä ‘wiry, spry’ ~ jänne ‘sinew’ (no verb ˣjänte- ‘to be sinewy’ exists)
  • kalteva ‘slanted’ ← kalte- ‘side’ (but not ‘to slant’)
  • kätevä ‘handy, dexterous’ ← käsi : käte- ‘hand’ (but not ‘to do with hands’)
  • lehtevä ‘leafy’ ← lehti : lehte- ‘leaf’ (but not ‘to be leafy’)
  • luonteva ‘natural, easygoing’ ~ luonto ‘nature’ (no verb ˣluonte- ‘to be natural at’ exists)
  • ponteva ‘vigorous’ ← ponsi : ponte- ‘motion, exertion’ (but not ‘to exert oneself’)
  • roteva ‘robust’ (seemingly underived)
  • varteva ‘tall (of people)’ ← varsi : varte- ‘stem’ (but not ‘to be tall-bodied’)

I also have counted a couple of cases where this kind of suffixation seems to have taken place before non-dental consonants, but these are clearly rarer. There is only one debatable case with -pevA: lipevä ‘slick, unctuous’. The bare root lipe-, used as a base for a large number of words related to slipperyness, is not verbal, no, although there is the quite close-by verb lipeä- ‘to slip’ (its regular present active participle is lipeävä). With -kevA there are six cases (e.g. väkevä ‘strong’ ← väki : väke- ‘people’ < *’power’). So in the end, probably the extension of -(e)vA from a regular participle function to another adjectival ending has taken place here as well. But we can still see a clear discrepancy between the 10 : 7 ratio of -evA to -eA adjectives when the previous consonant is /t/; a 6 : 60 ratio when it is /k/; and a 1 : 21 ratio when it is /p/.

Summary

What have we seen, and are able to conclude, so far?

  • The Finnish adjectival ending -eA has been disproportionally rarely applied to stems that have a medial dental stop.
  • By contrast, the ending -(e)vA has been disproportionally often applied to stems that have a medial dental stop; and, arguably, disproportionally rarely to stems that have a medial labial stop.
  • These results support viewing Similar Place Avoidance as a potential statistical linguistic universal.
  • The ending -evA has probably been originally extracted from the participles of e-stem verbs.
  • This extraction may even have happened specifically to acquire an alternative for -eA.

…and skipping a bit further ahead of syllogistic step-by-step argumentation: the most general statement of what is going on here is that derivational morphology is not random. In a morphology-rich language, affix alternants and synonyms will form an “ecology” where potential words are selected for according to their adherence to some kind of aesthetics, such as phonetics-rooted criteria. SPA is one example of such a criterion. There are probably several other relatively general ones that could be identified crosslinguistically. And indeed, there are some other examples that I could illustrate as well.

Some further questions

As for this particular case study: so far I’ve only shown that one particular Finnish adjectival suffix has a non-random limitation on its occurrence; and identified only one other suffix that has been taking on the work of -eA. There would likely be others as well. Most of my -tevA examples were, in the end, derived words, based specifically on e-stem nouns. So what about primary adjectives? Or adjectives derived from A-stem or O-stem nouns? Do they perhaps also have their own specific preferred adjectival endings? I don’t quite have an answer yet.

Also, what about the other Uralic languages? How have they solved this issue? A couple of the adjectival stems on showcase here have cognates elsewhere in Finnic, too (e.g. kätevä being also found in Karelian). But since the adjectival suffix *-ətA dates already to Proto-Uralic, we can expect this particular problem to come up several times before as well. Could we find similar limitations in its distribution in e.g. the Mordvinic or Permic languages?

Obviously this question could also be extended to any other suffix type. Deverbal nouns? Frequentative verbs? Deminutives? There’s a lot that could be studied about derivational morphology across the Uralic languages.

[1] Historically, there may well exist a derivational relationship, though. Note e.g. valo ‘light (n.)’, vaalea ‘light (a.)’.
[2] The ultimate root for these could be veri ‘blood’, the implied derivation being thru an unattested (?) verb ˣver-tä- ‘to be full of blood’ > ‘to be energetic’?
[3] This even has develop’d historically by a similar intervocalic lenition from *-βA < *-bA < *-pA, so at almost any given stage of Finnish prehistory stage it would have been the exact [+labial] counterpart of the [+coronal] *-ətA.

Tagged with: , , , ,
Posted in Etymology
Follow

Get every new post delivered to your Inbox.

Join 31 other followers