Gradation of *st in Finnic (and related complications)

The development of consonant gradation in Finnic (and why not, also elsewhere in Uralic) is one of those topics that really needs a new monograph-scale treatment one of these days. Not just for the sake of collecting the accumulated knowledge in a single source, either. Modern understanding of linguistic theory and methodology would probably allow not only improoved description of gradation as it works in the modern languages; it should also help better tackling the historical puzzles involved.

Some overall observations are simple enough to make. Perhaps the main historical trend in Finnic has been the gradual morphologization of gradation, departing from its original phonetic roots, and being generalized in some environments, levelled in others.

One good example is the gradation of /t/ in consonant clusters: we can reconstruct *nd, *ld, *rd as the original weak grades of the sonorant-initial clusters *nt, *lt, *rt (exactly parallel to *d as the weak grade of intervocalic *t); however, after further phonetic development, most Finnic varieties now rather have /nn/, /ll/, /rr/. Given the pattern consonant+t : geminate consonant here, it is not too surprizing that large swaths of Finnic varieties have also introduced the parallel alternation /st/ : /ss/, even though the weak grade clearly cannot originate from earlier **sd. [1] The innovation covers Karelian proper (but not Livvi/Olonetsian); Ingrian; and various eastern dialects of Estonian. Finnish dialects, though, have no trace of this.

— A small example for readers who are not especially familiar with how Finnic root-medial consonant gradation works in practice: the verb roots *anta- ‘to give’, *osta- ‘to buy’ yield in Karelian the 1st person singular forms annan ‘I give’, oššan ‘I buy’, with /nn/ and /šš/ as the weak-grade forms of /nt/, /št/. Finnish by contrast has similarly weak-grade annan, but a “strong-grade” form (rather: unaffected by gradation) ostan. In both languages, the underlying cluster also remains e.g. in the infinitive forms: Krl. antoa, oštoa; Fi. antaa, ostaa.

Which morphological forms show gradation can be predicted from the Proto-Finnic syllable structure. 1PS *andan, *ostan have a closed 2nd syllable, triggering lenition of *t, while the infinitives *antadak, *ostadak have an open 2nd syllable, and the original voiceless stop remains. However, as we can see, in which exact phonetic environments /t/ is affected by gradation varies by language. In this case gradation appears more heavily morphophonologized in Karelian than in Finnish; both languages however retain the original phonetic conditioning factors of gradation (a closed 2nd syllable in annan, vs. an open one in antoa/antaa), and hence these cases of gradation could be still called morphophonological, not yet purely morphological.

Back on track. Interestingly, in the middle of the above-mentioned innovative area where original *st has been subjected to gradation, also another development is found: in Votic, *st is reflected as /ss/ in the strong grade, and single /s/ in the weak grade (thus e.g.: *ostada > õssaa, *ostan > õsaa). Traditionally this has been attributed to an early separate development: a general sound change *st > *ss would have taken place already before the analogical extension of *Ct-gradation. This would have been then followed instead by the analogical extension of the gradation pattern geminate voiceless stop : singleton voiceless stop also to the voiceless fricative /s/, now abundantly found as a geminate as well.

Looking at Votic in isolation, this seems like an entirely possible account. However, in an areal context this is less clear. Votic is the most persistently innovative Finnic language with respect to consonant gradation: it is applied productively even to recent loanword consonants and clusters from Russian (resulting in such unique alternations as /pk/ : /bg/). In this light, it seems like an unusual coincidence that, at the epicenter of the eventual /st/-gradating area, [2] Votic would have already early on been established as an island that opted for a different solution altogether. An alternate hypothesis would be to suppose that Votic once used to have the more widespread pattern *st : *ss as well; and that the attested pattern /ss/ : /s/ represents simply a further development for this. This is fairly easy to arrange. We can continue to assume *st > /ss/ as a regular sound law, and would have to merely combine it with also assuming *ss > /s/, as a mini-chainshift of sorts.

Not everyone will like this kind of complication just for the sake of neater geographical generalizations, I’m sure. But there’s an interesting piece of evidence that appears to be in favor of my new analysis. The inessive case ending, reconstructible as Proto-Finnic *-ssA, [3] is in fact found in Votic as -za ~ -zä. This also suggests exactly the development *ss > *s, followed by voicing between two unstressed syllables (— also known as “suffixal gradation”, though the process differs from regular gradation in a number of ways).

This seems to be all the evidence we can hope to get together on the development of *ss in Votic, though. As far as I know, there are no root-medial cases of *ss that could be reconstructed for Proto-Finnic, and not even any especially old loanwords. The most widespread might be the obviously recent Fi/Krl. pyssy ~ Es. püss ~ Vo. püssü ‘gun’ (from Low German).

It’s again possible to stitch together a different, Votic-internal analogical explanation for the inessive. Kettunen in Vatjan kielen äännehistoria has done that already a century ago, starting from how we can find in Votic also a “strong-grade” [4] illative ending -sEE < *-sEn somewhat more widely than in the average Finnic variety. But generally I think a phonological explanation that takes care of multiple problems should be considered preferrable to unrelated analogical accounts, as long as no additional problems are introduced.

Moreover, if gradation of *st entered Votic as a relatively late analogy, we should perhaps expect it to fail to take root in environments where no analogical motivation for its extension was available. Such a case can indeed be found: the adverbial ending *-stik, found in Votic as simply -ssi(g). Contrary to Kettunen, early loss of final *-k cannot be blamed, since Eastern Votic is one of the few Finnic dialect groups where it in fact remains in some individual varieties: e.g. alassig ‘naked’ (and not ˣalazig). [5] Eastern Votic fails to apply gradation also to infinitives of s-stem verbs, e.g. pessäg ‘to wash’. Infinitives of resonant-stem verbs (e.g. tulla(g) ‘to come’, mennä(g) ‘to go’) would have provided a weak source of analogy though, explaining Western Votic pesä etc.

Even this last-mentioned analogy actually makes more sense if we assume it to have been earlier in relative chronology: from strong-grade *pestä(k) to weak-grade *pessä(k), rather than from strong-grade pessä(g) to weak-grade pesä. In the latter case I would definitely expect the pattern single consonant stem : geminate consonant infinitive to be more salient than the rather abstract grade difference. [6]

On the other hand, a “relatively late” date for the new look of *st-gradation could still be fairly early in absolute chronology. Evidence from Krevinian (an enclave dialect of Votic once spoken in Latvia, separated since the 15th century) suggests that the gradation pattern /ss/ : /s/ existed already by the late Middle Ages, and Kettunen even calls this “one of the oldest changes in Votic” (presumably meaning: one of the oldest uniquely Votic changes). But given that Proto-Finnic dates to around 0 CE, this still leaves plenty of time for the development of first the usual form of, and later a uniquely Votic type of *st-gradation.

There are a couple of what might look like chronological issues to this scenario. E.g. the introduction of /rt/ : /rr/ is, IIRC, perhaps as late as 17th century in parts of western Finland. Though I wonder if this should be taken as indication that, contrary to traditional Finnocentric default assumptions of all influence within Northern Finnic varieties having flowed from the west to the east, this gradation pattern is in part Karelian influence in western Finnish. (Perhaps even /lt/ : /ll/ and /nt/ : /nn/?!)

I also see it now and then assumed that Votic is the sole autochthonous language of Ingria, and that all varieties of Ingrian represent later intrusions from north of the Gulf of Finland (much like we know to be the case with Ingrian Finnish). However, at least the case of the Kukkuzi dialect is problematic. Traditionally analyzed as heavily Ingrianized Votic, it is also at least equally well analyzeable as a Voticized Northern Finnic variety. And a bit further west, the same problem arises with northeastern coastal Estonian as well: the dialect shows clear effects of Finnish contact, but also indications of original, more deep-reaching Northern Finnic affinity, e.g. the complete absense of õ, in whose place we find a perfectly etymological distribution of e versus o. (I wonder if anyone’s ever tried hunting for isoglosses connecting Kukkuzi and NECEs in particular.)

If Votic has continuously had Northern Finnic neighbors already since early on (early contacts are traditionally indeed assumed on lexical and morphological grounds, but this generally seems to have been considered to be a separate issue from contact with Ingrian in particular), again all the easier to assume that at one time, it too had the general “Central-Eastern Finnic” gradation pattern *st : *ss.

These days I even find myself wondering if all the other main dialects of Ingrian proper can really be derived from Karelia, even if we also count the Karelian Isthmus. Their differences strike me as sharper and starker than those between the individual dialects of Karelian. It seems clearly implausible to assume a single late migration, followed by rapid diversification within a minisculous geographic area. Assuming 3-4 wholly separate backmigrations would also be contrived, at least if we cannot find historical correspondences for them (like how the introduction of Ingrian Finnish has been connected to Ingria’s brief period as a part of the Kingdom of Sweden). Perhaps coastal Ingria in general was simply never Votic-speaking, and is in fact rather the original Eastern Finnic homeland…? A topic for another post, another time though.

[1] The even more widespread gradation pattern /ht/ : *hd must have a different explanation, though. There is, as far as I can tell, no evidence for a geminate **hh. I presume this pattern instead emerged very early, around the time middle Proto-Finnic *š had finished its trek backwards and settled as /h/, i.e. no longer an obstruent; and *d had in most Finnic varieties lenited to /ð/, but had not yet been lost. Around this time [ɦð] would have nicely paralleled clusters such as [lð] and also the likes of [ɦl], [rɦ]; especially if we assume that /ð/ did not phonologically hold the status of a dental fricative, but that of a dental approximant.
[2] As measured by the general patterning of Finnic isoglosses, not by raw geography. The latter method would leave Votic off at the southeastern fringe, and would probably put the center-of-mass of the innovation somewhere in southeastern Finland…
[3] South Estonian and Southern Ostrobothnian Finnish indicate the more archaic variant *-snA, but I would assume that this was established as a free or stylistic variant by Proto-Finnic, instead of being retained in an allomorphic distribution of some sort. South Estonian in particular appears to have been eager to extend consonant cluster assimilations from unstressed positions also to post-tonic positions (e.g. *koktu > kõtt ‘stomach’, *maksa > mass ‘liver’, *sakna > sann ‘sauna’), so it would be quite odd if here a “strong-grade” allomorph has instead been generalized from positions with secondary stress.
[4] I am not fully convinced that the Proto-Finnic alternation *s : *h, found in some suffixes and in *s-stem nominals, has anything to do with consonant gradation, given its total absense root-medially; not even in words like *vasikka (or *vasëikka?) ‘calf’, where there should not have been any possibility for the analogical reintroduction of /s/ from strong-grade forms. If the case of Fi. lähellä ~ läsnä ~ lästä ‘near’ would constitute an example doesn’t seem clear. For one, as cognates have only been found from Mari and Samoyedic, we do not know for certain if the root is to be reconstructed as *läšə- or *läsə-; and läsnä, lästä could potentially be explained as based on the inessive and elative (pre-PF *lähe-snä, *lähe-stä?) rather than continuing the archaic locative and ablative (pre-PF *läS-nä, *läS-tä). For two, these being adverbs, *s > *h in prosodically unstressed positions seems possible (as is probably the case for the 3rd person pronouns *hän, *hek).
[5] Similarly also e.g. Karelian alašti, not ˣalašši.
[6] A possibility that however remains is that the Eastern Votic forms might be rather analogical reintroductions, again on the basis of the geminate infinitive forms of n, l, r-stems. Still this seems to be “the wrong way around”, given that these forms retain final –g as a transparent trigger for gradation, while the weak-grade Western Votic forms do not.

Tagged with: , , , , , , ,
Posted in Reconstruction

Finnic o-umlaut, continued

I’ve often seen the Finnic languages considered to demonstrate that vowel harmony acts a counterforce to the common tendency for second-syllable (“stem”) vowels to trigger various conditional developments (umlauts) of first-syllable (“root”) vowels. At least within the larger Uralic comparative context, this indeed appears to be the case. There is even the illustrative case of Livonian, a Finnic language which has both lost vowel harmony and innovated a process of *i-umlaut (appearing e.g. in the nominative singular forms of nouns: *käci > ke’ž ‘hand’, *tammi > täm ‘oak’, etc.)

This however does not need to imply that vowel harmony languages are somehow categorically immune to umlaut developments. I’ve already briefly examined a possible shift *ë-o > o-o for Votic. It also seems another somewhat similar case can be found as well, this time though appearing wider across Finnic. More interestingly, this involves umlaut “against the grain” of Finnic vowel harmony — in backness.

To start from the beginning, today’s observation traces its roots to Janne Saarikivi’s paper “ystävästä, uskosta ja vokaaleista“, published 2010 in the eminent Finnish etymologist Kaisa Häkkinen’s Festschrift Sanoista kirjakieliin (SUST 259). This treats the Finnic word group for ‘friend’, whose representatives include Finnish ystävä, Estonian ustav, Livonian ustõb (and which has also been borrowed to Samic; e.g. Northern Sami ustit). The words clearly resemble fossilized participles, but various competing ideas have been suggested on what the original root would be, exactly. Saarikivi argues convincingly that the best option appears to be connecting the words to the same root as e.g. Fi. usko(-) ‘belief, to believe’: i.e. *uskV-(t)ta- > *usta- ‘to be true/reliable, to consider a friend’ > *usta-ba ‘(one who is considered a) friend’.

His explanation for the phonetic development is, however, slightly awkward. Drawing in some known parallels from Permic and Khanty, and bringing in some new Samic evidence, he suggests that this word group could be traced back to a Proto-Uralic (transitive) verb root *iskə- ‘to believe in’. From this a reflexive (intransitive) derivative *iskə-w > *isk-o- would have been created, which would have been backed to *usko- in Proto-Finnic; followed by re-fronting in Finnish, for “similar unclear reasons” (a very hazardous form of argument IMO) as appears in some other words, e.g. Fi. muhku ~ myhky ‘clump’.

Petri Kallio has instead proposed in passing what seems to me like a clearer explanation; again in his paper “Jälkitavujen diftongit kantasuomessa” that I seem to have brought up a couple of times by now. [1] According to him (and I agree), a more expected initial development in Finnic would be *iskə-w- > *iskü- > *üskü-. The labiality assimilation here is a known sound development — and incidentally presents another minor example of umlaut in Finnic. Following this he suggests resuffixation: *üsk-o(-) > usko(-). This latter step involves what seems like a previously unproposed sound development: *ü > u by the influence of second-syllable *o.

Phonologically, this sounds reasonable: it’s generally accepted that *o in unstressed syllables remained outside vowel harmony in Proto-Finnic, and o/ö-harmony as found in modern Finnish/Ingrian/Karelian (and partly Veps) only emerged later. In other words, PF second-syllable *o was indeed specified as [+back], and could pass this feature also to a first-syllable vowel.

(Also, though Kallio does not say as much, this two-tier scenario seems to even explain Fi. ystävä. Back when *UstA- was still around as a separate verb ‘to consider reliable/a friend’, we could consider *üskü-(t)tä- > *üstä- the regular development, attested in Finnish; competing with a variant *usta-, attested in southern Finnic and Samic, which would have gained its /u/ by analogy with *usko(-).)

Getting to the point though, this idea has drawn my attention to what looks like a phonotactic gap in Proto-Finnic. Although we can reconstruct PF *-o following most first-syllable vowels (e.g. *ilo ‘joy’, *veto ‘pulling’, *käko ‘cuckoo’, *pato ‘dam’, *pëlto ‘field’, *kolo ‘hole’, *puno- ‘to weave’ [2]), there do not seem to be any recognized cases of the vowel structure *ü-o. Even in modern Finnish, cases of y-ö are fairly rare. This seems like grounds to formulate a hypothesis. I suggest that Kallio’s proposed “o-umlaut” development is not merely an isolated sporadic example, but a full-fledged soundlaw: Proto-Finnic *ü-o has regularly yielded later u-o.

Investigating this possibility is going to be a bit difficult, though. PF *-o continues to have no firmly established regular origin (other than the dissimilation *ai > *oi in unstressed syllables after *a, *e, *i, which is not relevant here), and is mainly concentrated in derivatives and loanwords. Some particular morphological groups’ cognates in Mordvinic suggest the development *Aw > *o, but in others there seems to be no indication of this.

Regardless, here are a couple of doublets & such I’ve identified in Finnish that might be indicative of this same “o-umlaut” as usko(-):

  1. ulotta- ~ ylettä- ‘to reach smth’. The former’s been considered to derive from the postposition root ulko- ‘out’, the latter from ylä- ‘up’. While these are fairly close-by concepts, and while there are even particular expressions that appear to support this derivation (e.g. ulottaa kätensä ‘to extend / reach out a hand’, ylettää kattoon ‘to reach (up ’til) the ceiling’), these could regardless be due to semantic contamination. This seems to be confirmed by Veps /uluta-/, Livonian ulātõ, which both suggest roughly PF *ulotta-, not *ulgotta-. Alas, the 2nd-syllable vowels in these are aberrant, which suggests that the history here has probably been somewhat more complex.
    SSA proposes an alternate analysis: deriving ulotta- and ulko- both from a root *ula-, allegedly also present in ulappa ‘open sea’ and ullakko ‘attic’, but this seems to run contrary to all regular patterns of Finnic word derivation. [3]
  2. luppo ‘beard lichen’. No Finnic variants pointing to *ü are known, but the word’s Samic cognates interestingly enough uniformly indicate an original front vowel. Although there are various known cases of *ë/*o vacillation in Samic, both PS *lëppō and the Finnish form could also simply derive from earlier *lüppo.
    A possible problem though is that the Finnic word is only found in Finnish and northern Karelian, and perhaps is rather to be explained as a Samic loan. It would be possible to speculate with late retention of *ü in early Samic, or an umlaut development of *ë-ō in the loaning Samic variety, but I’ve nothing solid to go on on with that line of thought.
  3. pursto ~ pyrstö ‘tail’. This is one of the examples of “frontness alternation” that Saarikivi mentions. Supposing *pürsto as a starting point would allow some Finnic dialects to evolve pursto, others pyrstö. On the other hand, it might be a problem that the words likely derive from Germanic *burstō- ‘bristle’ (which may seem semantically distant, but Karelian and some Finnish dialects retain an intermediate sense ‘dorsal fin’). [4] SSA suggests that pyrstö, the variant with a narrower distribution, could be due to contamination with pyrise- ‘to shake’ (intr.), pyristä- ‘to shake’ (tr.), which seems equally possible.
  4. ruho ‘body’ ~ ryhä ‘hump’. A comparison that would not have struck me as obvious, but SSA analyzes the latter as a “variant” of the former. Normally we’d expect an A-stem to be more original than an o-stem though. This is also suggested by the loan etymology of the words from Germanic *xruza- ‘corpse, pile’. Or, given the y-vocalism, perhaps rather from Old Norse *hryRa-? Thus *rühä → *rüh-o > ruho seems like a possibility.
  5. runno- ‘to cram, mangle’ ~ ryntää- ‘to rush (into)’. The interference of various other words is possible (e.g. ruhjo- ‘to injure, mangle’, säntää- ‘to rush’) but what makes me suspect indeed common origin is the irregular variation nn ~ nt, appearing in both groups here. In standard Finnish the two verbs themselves have ended up in different “grades” (not quite in accordance with regular consonant gradation), but further derivatives include e.g. runtu ‘dent’, rynniä ‘to rush suddendly’ (punctual).
  6. rusto ‘cartilage’ ~ rysty ‘knuckles’. SSA connects the two as being of “similar descriptive origin”, but they could be connected as straightforward parallel derivatives (*rüst-o, *rüst-ü). Less clear is if rusikka ‘fist’, also mentioned by SSA, is a part of the same cluster. This does not seem necessary, but if it does, *u will probably have to be more original (and we’re back to square one).

None of the cases appear crystal clear, but being able to get six examples together still suggests to me that this is probably on to something. OTOH also chronology seems slightly problematic here. The Saarikivi–Kallio scenario for usko would require that *ü-o > u-o was later than the assimilation *i-ü > *ü-ü; yet this has also been quite late, being e.g. found in (standard) Finnish (*pisü- > pysy- ‘to stay’, *pistü > pysty ‘erect’) but not in several dialects of Karelian (pisy-, pisty). [5] My proposed new derivation of ryhä~ruho from Proto-Norse rather than Proto-Germanic would also require a date well after Proto-Finnic for this change. But if it rather dates to the common (Western) Finnish era, then even counterexamples with narrower distribution in Finnic become relevant. Some troubling cases might be the following:

  • kylvö ‘sowing’, kyntö ‘plowing’ (← kylvä- ‘to sow’, kyntä- ‘to plow’). These are a part of the wider pattern of deverbal -O/U-nouns. The pattern -tA- : -tO is particularly productive though (kääntö ‘turn(ing)’ ← kääntä- ‘to turn’, säätö ‘adjustment’ ← säätä- ‘to adjust’, ääntö ‘articulation’ ← ääntä- ‘to articulate’, etc.), which might have motivated the creation of kyntö in place of expected kunto, followed by semantic analogy to produce also kylvö? Dialectally, a variant kylvy is fairly widespread (and even kynty is attested).
  • pyytö ‘plea’ (← pyytä- ‘to ask’). A similar derivative as the above two, but here we could additionally suppose that long yy was perhaps unaffected by this umlaut.
  • kytö ‘slash-and-burnt field’. Likely also a derivative of the above type, from kyte- ‘to smoulder’. However, from an e-stem verb I’d expect kyty, which is indeed attested in a few dialects (cf. also e.g. kylpy ‘bath’ ← kylpe- ‘to bathe’, käsky ‘order’ ← käske- ‘to order’, sylky ‘spittle’ ← sylke- ‘to spit). Could kytö be a late re-suffixation to avoid homophony with kyty ‘brother-in-law’?
  • tyttö ‘girl’; fairly widespread, with cognates found in most of Northern Finnic, as well as in Votic. However, the variant tytti seems to be older yet, with cognates extending also to Estonian and Livonian. This seems like a similar innovation as the replacement of Fi. isä ‘father’ with iso in Karelian.

It appears that a clearer picture of the development of 2nd-syllable labial vowel suffixes in Finnic will be needed for making progress here.

[1] Footnote 13, to be specific. For the further ref details, see e.g. my previous post. Perhaps I should establish a policy to add anything I cite at least twice to my Literature page?
[2] The idea of a contrast between *e-o and *ë-o (cf. Estonian vedu vs. põld : põldu-) is provisional and not particularly crucial to the point.
[3] I wonder rather what’s exactly the relationship between ullakko and lakka ‘roof’. SSA suggests irregular contraction from *ula-lakka, but perhaps there is rather some kind of a prefix *ul- in here. In that case, it would be also possible to analyze ulappa as *ul-appa, where the 2nd component would probably derive in some fashion from Proto-Norse *haba- ‘sea’ — probably thru Samic *āpē ‘open space’, in light of the developments *h > ∅ and *b > pp (similarly to what we see in northern Finnish aapa ‘open bog’). While very hypothetical, this approach still seems more promising to me than the notorious proposal that ulappa would be one of the no more than two or three Finnic words to have allegedly retained the Proto-Uralic derivational suffix *-ppa. — It would even be formally possible to derive ulotta- as *ul-otta-, i.e. based on the verb otta- ‘to take’?… probably not a good idea though.
[4] E.g. Wiktionary appears to have *bursti- though, which could allow deriving an i-umlauted reflex in Finnic after all; but this looks like a reconstruction mainly based on English. Hellquist’s old Svensk etymologisk ordbok suggests *burstiō-. I am not able to assess offhand which of these stem type variants is best supported.
[5] Or could these be late analogical reversals, on the basis of related formations such as *pistä- ‘to stick’?

Tagged with: , , , ,
Posted in Commentary, Etymology

Proto-Finnic *c in Karelian

During some casual investigation of Karjalan kielen sanakirja, I appear to have stumbled on something interesting.

One of the more distinctive innovations among the Karelian dialects is the reflexation of Proto-Finnic *s. In Northern Karelian, and in the northernmost dialects of Southern Karelian (including the Tver, Tihvin and Valdai dialects, spoken by Karelians displaced much further south), this is by default retracted to postalveolar š: see e.g. kešä ‘summer’, oštoa ‘to buy’, šilmä ‘eye’. [1] The development, though, is blocked by a preceding *i, as in e.g. aisa ‘shaft (in harness)’, pisteä ‘to stick’, muistoa ‘to remember’, viisi ‘5’.

In Livvi aka Olonets Karelian (likewise also in Ludian and Veps), funnily enough *s is inversely shifted to š only after *i. This probably indicates that Old Karelian & Old Veps had no **š, and this sibilant split initially produced an allophonic contrast of palatalized [ś] versus unpalatalized [s]; in the northern part of the dialect area, the latter was eventually retracted, and in the southeastern part, the former.

There are several complications to this textbook picture, though. First, as is common, dialect loaning etc. seems to have generated a rather messy border for the sound changes… The Paatene, Mäntyselkä and Porajärvi dialects appear to be particularly inconsistent cases. (Later palatalizations like *-si > *-śi could maybe explain some cases.)

But more interestingly, there are also some other cases where s is found thruout almost all of the Karelian varieties. And it appears that sometimes these can be archaisms of a sort: the distribution goes regularly back to Proto-Finnic *c, rather than *s. So far I have located the following cases:

  • *acja > asie ‘matter, thing, errand’
  • *acrajin > asrain ‘trident’
  • *kecrädä- > kesrätä ‘to spin thread’
  • *käci > käsi ‘hand’
  • *ocra > osra ‘barley’
  • *suci > šusi ‘wolf’
  • *toci > tosi ‘true’
  • ? *vacara > vasara ‘hammer’ (South Estonian vassar does not suggest *c, but the word’s origin from Indo-Iranian *vadźra does.)
  • *veci > vesi ‘water’
  • *ükci > yksi ‘1’

Possibly also *vooci > vuosi ‘year’, which has š (< ? *s) in “central Karelian” (Repola, Rukajärvi, Paatene, Mäntyselkä, Porajärvi), but s (< ? *c) in “Northern Karelian proper” (Uhtua, Vuokkiniemi, Kontokki).

It’s notable that the cases here do not only include cases of *c resulting from the Proto-Finnic assimilation *ti > *ci; they also include all cases known from Karelian of the somewhat anomalous PF cluster *cr — which therefore appears to confirm Petri Kallio’s recent proposal that it should indeed be reconstructed as *cr, not as *str.

How to understand this correspondence? It does not seem to be possible to assume that the development has been *c > *ś. Not only would this be phonetically awkward (especially since the geminate *cc is reflected as non-palatal čč across all of Karelian), it also seems to be the case that the Tunkua dialect has *c >> š, in contrast to *ś > s. On the other hand, assuming the retention of PF *c as an affricate all the way ’til the full disintegration of Karelian would also be problematic, for starters e.g. because the weak grade of PF *cc would most likely have already also become a short affricate *c by this time. Yet a gradation pattern ttš : s has not been attested from anywhere in Karelian (nor elsewhere in Finnic), AFAIK.

So I would suppose that the reflex of *c in Old Karelian must have been a second sibilant phoneme; which might be simplest denoted *s₂. Precedents from elsewhere (e.g. Castilian Spanish, Old French, High German) suggest though that the most likely phonetic value for this would be a laminal sibilant [s̻], contrasting with *s₁ < *s as an apical sibilant [s̺]. This would additionally explain why the shift *s₁ > š exists in the first place: to enhance the contrast between this and *s₂ (just as also occurred in German).

A hypothesis of *[s̻] in Old Karelian also seems to offer some new possibilities for explaining the later development of PF *cr. I have already for quite some time wondered about its development to tr in Eastern Finnish dialects on one hand, in Ingrian on the other; which seems to be an odd contrast to sr in Karelian (and also Livvi/Ludian/Veps). In Ingrian it could be easily attributed to contact with Estonian/Votic, but this is further complicated by *sr > zr appearing after all in the Soikkola dialect. Moreover, Western Finnish has *sr > hr. All this makes the Eastern Finnic tr-area look more like a secondary innovation than an isogloss shared with Southern Finnic. Perhaps a development along the lines of *sr > *θr > tr could be assumed? — And at this point it becomes quite handy that *[s̻] > [θ] is also a fairly common innovation (cf. Castilian Spanish again; or, Old Persian). If we only had to assume the fronting of this phone in particular, the scenario here seems to become a little bit better-grounded.

Not every instance of PF *c yields this Karelian *s₂, though. The usual reflexation of *s₁ appears in e.g. the following cases:

  • (intervocalic:) *kaca > kaša ‘corner (of ax)’, *keüci > keyši ‘rope’, *kuuci > kuuši ‘six’, *täüci > täyši ‘full’, *uuci > uuši ‘new’
  • (postconsonantal:) *kakci > kakši ‘2’, *küpci > kypši ‘cooked’, *lapci > lapši ‘child’; *künci > kynši ‘nail’; *hirci > hirši ‘log’, *orci > orši ‘perch’, *varci > varši ‘shaft’, *virci > virši ‘hymn’

There are also three cases yielding *ś after *i (going per the Tunkua reflexes): *niici > niisi ‘heddle’, *raici > reisi ‘thigh’, and the abovementioned *viici > viisi, which suggest that *c > *s in this set of words is altogether probably quite old, and they’ve after this simply developed as any inherited PF *s. At least *rc > *rs and *Uc > *Us could be simply regular.

Another question is word-initial *ci-, *si- (e.g. *cilta ‘bridge’, *silmä ‘eye’). A brief scan suggests that these seem to be reflected identically; though the Paatene dialect now consistently seems to have s (likewise word-initially before any other front vowels).

If any additional traces of this *s₁ / *s₂ contrast could be found elsewhere in Eastern Finnic (e.g. in Upper Luga Ingrian, which also has the change *s > š) will have to be left for further study.

[1] Though I could ask if this isogloss might make a better boundary between Southern and Northern Karelian than medial voicing; while pretty easy to locate, the latter is also a relatively trivial feature that seems likely to have just rubbed off on Southern Karelian as Russian influence. — KKS actually includes the dialect of Jyskyjärvi as a part of Northern Karelian even though it does have medial voicing. I also wonder what’s the motivation behind this choice.

Tagged with: , , , , , ,
Posted in Reconstruction

Some observations on Votic õ versus o

One of the bigger open problems of Finnic historical phonology is the shift *o > õ in Southern Finnic.

The non-front non-open illabial vowel õ found across Southern Finnic — the exact realization varies from /ɤ/ to /ɨ/ — most regularly corresponds to Northern Finnic e in words of back harmony; e.g. Estonian mõla ‘paddle’, põld ‘field’ ~ Finnish mela, pelto. If these cases should be reconstructed with front *e or back *ë in Proto-Finnic remains disputed, but the correspondence pattern is fairly unambiguous. [1]

Frequently, though, õ is also found in correspondence to NF o, e.g. in Es. õlg ‘shoulder’, õlg ‘straw’, hõbe ‘silver’, kõrv ‘ear’, lõhi ‘salmon’, sõrm ‘finger’ ~ Fi. olka, olki, hopea, korva, lohi, sormi. Cognates from elsewhere in Uralic, or in loangiving languages, fairly consistently indicate that Northern Finnic retains here the original state of affairs, and anything along the lines of a Proto-Finnic central rounded vowel **ȯ probably should not be reconstructed. [2] However, o also frequently remains; e.g. Es. oja ‘brook’, ots ‘forehead’, kolm ‘three’, tohtima ‘to dare’ ~ Fi. oja, otsa, kolme, tohtia.

The task, then, is finding the conditioning for the delabialization *o > õ. It’s been observed already long ago that no easy solutions are available. Which words show this change and which do not varies greatly already depending on the language variety in question. Previous analyses by e.g. Viitso and Raun have distinguished some 4-5 main distribution patterns. It seems probable that the issue cannot be fully answered without detailed analysis of the dialectal diversification of Estonian. Interdialectal loans, especially to and from the literary standards, [3] have probably also muddied the original distribution quite a bit.

This all regardless does not prevent partial progress being made. In particular, Livonian is known to be a clear-cut case: delabialization is here found effectively solely in the diphthong *ou and in the sequence *ovV, which entirely regularly yield õu / õvV.

This in mind, a look at how the change has played out in its other geographic extremum — Votic — might also be fruitful. This does not seem to have been done, though. Lauri Kettunen’s Vatjan kielen äännehistoria (1915) does not even attempt an analysis, and in case anyone else has since him examined the historical phonology of vowels in Votic in similar detail, I am not aware of it.

I will not be presenting a full analysis here, either, only a couple of hypotheses, based on a relatively quick look-over of the lexicon of the Mahu dialect. A look into data from other, fuller-documented dialects of Votic (these days well-summarized in Vadja keele sõnaraamat) will be necessary to confirm or deny the following ideas.

The main impression that emerges when examining Votic on its own is that attempting to determine conditions for *o > õ is probably the wrong approach. Delabialization occurs much more often than not, in almost any phonetic environment imaginable. Only two sources of modern o are immediately clear: firstly, long oo is entirely unaffected by delabialization (as also elsewhere in Southern Finnic); and secondly, recent loanwords, from Ingrian, Finnish, and Russian, consistently retain their short o. There appear to be a number of other word shapes where short stressed o is probably inherited — but they are quite few, and what seems like an approach worth exploring is that this development has been conditional, while delabialization would be simply the default reflexation.

One particularly promising environment are words with original *o also in the 2nd syllable. Clear cases are the words koto ‘home’, roho ‘grass’. Loaning is not an option for either of these: the former was in Eastern Finnic (and standard Finnish) replaced by the form koti, while the latter shows vowel shortening before h, a sound change absent from Ingrian and Finnish. [4] The same rule could be used to account for some other words as well: kokottaa ‘to bawk’, mokoma ‘such’, orko ‘valley’.

Two very interesting words moreover turn up: opënë ‘horse’, toro ‘acorn’. These have not had original *o at all, and they rather derive from Proto-Finnic *hëpoinën, *tërho (or *tëroh?) — cf. Finnish hevonen, terho. This suggests that the history here was actually not the retention of stressed *o before a subsequent *o, but instead a later back-assimilation *ë-o > *o-o (and in ‘horse’, this was presumably folloed by an even later assimilation *-o-ë > /-ë-ë/). [5]

This opens also a new possibility for etymologizing koko ‘pile’. Given Finnish keko, perhaps this similarly derives from earlier *këko; in which case the synonymous Fi. koko, and its other Finnic cognates, would seem to turn out to be loanwords from Votic? (If the same would go also for koko ‘whole’, koota ‘to gather’ is not clear. I could also imagine these being derived from *kokë- ‘to check, e.g. traps’, perhaps via a meaning ‘to collect’.)

But on the contrary: no assimilation of this sort is seen in e.g. nõvvoa ‘to advice’, põlto ‘field’, sõtkoag ‘to mix dough’, võrkko ‘net’, võso ‘sprout, young tree’ (all five inherited words of the *ë-o type). The first could be maybe explained as being due to the general ban on ˣ/-ouC-/, and the last two similarly due to the ban on ˣ/voC-/. But ‘field’ and ‘to mix’ resist easy explanations. I wonder if both of them being loosely agricultural vocabulary has any relevance.

Another environment where o could perhaps be regular is the context C_(C)kka. In my sample, no examples with õ are found in this position, but instead there’s a full handful of examples of o: hoikka ‘thin’,  kokka ‘hook’, kolkka ‘corner’, nokka ‘beak’, rokka ‘cabbage soup’. At least the first is evidently an Ingrian loan, on account of retained /h/, but given the average proportion of such loans, I am not sure if we should expect the same to be the case for all of these.

It’s unclear what phonetic motivation could exist here, though, since a lone coda /k/ is not enough to block delabialization; e.g. *oksa > õhsa ‘branch’.

A third case where o fairly often remains is the diphthong *oi. From Mahu there are five positive inherited examples: koira ‘dog’, koivu ‘birch’, moisio ‘manor’, poika ‘son’, poiz ‘away’; and three negatives: õikõa ‘right’, nõissag ‘to rise’, sõittaag ‘to scold’. The sixth could suggest a shift *oi-ë > õi-õ; and the seventh may have something to do with how the rest of Finnic rather indicates earlier *ou (Es. tõusma, Fi. nousta), perhaps suggesting something like *novise-? [6] But no especially clear pattern seems to emerge here.

The issue of õ will for now still remain unsolved, but at least it is clear that some clues will emerge as long as one is willing to look for them.

[1] Given that this correspondence mostly occurs in loanwords from Indo-European that once had *e-a, *e-o or the like (e.g. pelto derives from Proto-Germanic *felθō), it’s clear that there has been a retraction from earlier *e at some point, but what is not clear is if we should assume this to have been an areal Southern Finnic innovation, or a common Proto-Finnic innovation followed by a backshift in Northern Finnic. The question is out of the scope of this post, but suffice to say that I narrowly lean in favor of the latter (mainly per some indirect arguments from relative chronology, which however rely on a few so far unreleased ideas of mine).
[2] In principle I am however open to the possibility that in some cases Southern Finnic might regardless retain the more original state of affairs, especially since by now we know that Proto-Uralic also featured a non-front non-open illabial vowel *ë. This is normally continued by Finnic *a, but it’s imaginable that some conditional exception developments exist. One particularly interesting gap to observe is that while PU seems to have allowed a fairly wide variety of vowel + semivowel combinations, no cases of the sequences *-ëw-, *-ëj- have been reconstruted so far.
[3] That is, both the modern Nort Estonian-centric standard, mainly based on the Tallinn dialect, and the old South Estonian-centric standard, mainly based on the Tartu dialect.
[4] Somewhat mysteriously this feature is however found also in Veps. — In principle it’s of course also possible that shortening before h is simply later than *o > õ.
[5] Interestingly enough, also Standard North Estonian seems to have this change in hobune ‘horse’. Yet ‘acorn’ remains tõru. Could early loaning from Votic to Estonian be involved?
[6] This reconstruction would, then, rather look like a derivative *nov-ise-, and we could ask if the root is somehow connected with PU *ńoxə- ‘to chase, follow’ (whence also PF *nou-ta- ‘to fetch’). But the semantics do not seem to work even elementarily, especially since *-ise- is well-attested only as deriving onomatopoetic verbs of all things.

Tagged with: , , , , ,
Posted in Reconstruction

Inheritance in Phonology

It occurred to me that there’s one concept I have never seen anyone else define or use, although I’ve been working with it in my own research for a while now: that of an inheritance phoneme.

This is in effect the polar opposite of the well-known case of the loanword phoneme. As the audience of this blog probably mostly knows, a loanword phoneme refers to a sound that is absent from the native lexicon of a language, but occurs in one or more of its contact languages, and has been taken on from there into the language itself. Clear examples include /b g f ʃ/ in modern Finnish.

But sometimes, we can by contrast find in a language a phoneme that is absent from its contact languages, and is only found in the native-enough lexicon. [1] In Finnish a recent example might be the labial opening diphthongs /uo/, /yö/. Although found as reflexes of earlier *oo, *öö even in some not especially old loanwords from e.g. Swedish (including tuoli ‘chair’, kyöpeli ‘kobold’; yet more recently also fluori ‘fluorine’), they appear to have within the last about 200 years become a “closed class” that, for now, is no longer acquiring new members. [2] Of course, this is not “closed” in the same sense as a morphological word class might be — the diphthongs remain entirely possible in new ideophones and onomatopoeia (blyögh ‘barf!’), blends (Suomalia ‘an area in Finland with a relatively large current or predicted Somali population’), and derivatives based on pre-existing roots.

Better examples can probably be found, from languages having some more strongly marked phonemes. For example, I’d expect Czech ř or German pf to be not very common in current loanwords, and to have been so for a good while; or the nasal vowels in French to be absent from most modern loanwords, with the exception of those from Portuguese or sub-Saharan African languages.

Even then, this concept seems less clearly defined than the loanword phoneme. While a loanword phoneme is established by its one-time inadmissibility in the language altogether, there is nothing in a language’s internal structure at any given time that could prevent a given phoneme from appearing in loans. This situation can only be an incidental fact about its contact languages — and if the contact situation changes, anything’s possible again. (Put a Czech speaker community in regular contact with speakers of Toda, and I for one would bet that ř would then start regularly turning up in some loanwords.) A phoneme could also be only “partially inherited”, in being found in some loan strata but not in others — as I hypothesized to be the case with French nasal vowels.

On the other hand, what is interesting here is that while words containing loanword phonemes allow setting up a terminus post quem for their acquisition into the language (if we know that Finnish circa 1600 had no /ʃ/, then all modern Finnish words with the consonant must be more recent, even if their etymology were unknown) — inheritance phonemes may allow establishing a terminus ante quem. This seems like a fairly powerful tool; usually we can backdate a word only by the comparative method, and even then not watertightly either. But, given a word like Fi. tuoksua ‘to smell’ (of unknown origin, not attested before the end of the 17th century, and in contrast to the more widespread native Finnic synonym haista), we can regardless consider it probable from its diphthong that this is not an especially young word, perhaps dating at least to the Middle Ages. Given an absense of known loan etymologies from any obvious candidates for a loangiver (Swedish, Russian etc.) would furthermore suggest that we can with slightly lower confidence add a couple of centuries more yet. [3]

We can also define similar concepts such as loan cluster and inheritance cluster. The former, although to my knowledge never explicitly named, is again a known phenomenon. Finnish continues to work as an example: while Modern Finnish clearly allows e.g. word-initial consonant clusters, it is not too hard to find phonological analyses that dismiss them as non-native and proceed to posit a “basic” syllable structure (C)V(V/C)(C). Jorma Koivulehto has also made good use of this approach in research of early loanwords, having e.g. shown that all Finnic word roots with the medial cluster *-rt- are ultimately Indo-European loans, and not of Uralic inheritance. [4] (This, however, is not to be confused with the occurrence of *rt in word stems, where it can well result from inherited *r + a suffix such as causative *-ta-; as in Fi. vieri ‘side’ → vier-tä- ‘to be or go beside smth.’)

It seems similarly possible to consider e.g. Finnish tk for the most part an inheritance cluster that indicates relatively native vocabulary. No examples of this cluster in old loans are known; and given that already in Late Proto-Indo-European, the inherited “thorn” clusters of dental + velar were metathesized or otherwise reduced, it seems likely that none will be found anytime soon either, at least not from an Indo-European direction. (Much newer examples can be found though, e.g. Atkinsin dieetti, votka; and in far-northern dialects, e.g. vietka ‘adze’, from Sami.)

I could explore various further examples here, but for now, this post should do for a point of reference for later use.

[1] “Nativeness” is a relative concept, of course, not an absolute one. E.g. Finnish kauppa ‘store’ can be considered a “native” counterpart of the more recent loans puoti (← Swedish), lafka (← Russian), basaari (ultimately ← Persian) etc., but ultimately it is a Germanic loanword as well. Similarly, even words reconstructible back to Proto-Uralic can in principle be loans at some deeper time-level yet (e.g. we can suspect on semantic grounds that pata < *pata ‘pot’ might be one).
[2] The illabial opening diphthong /ie/ remains possible in loans, e.g. fiesta, siesta, DJ Tiësto.
[3] For some speculation though, something could be perhaps made of the similarity to Swedish doft, German Duft ‘smell’. If these could be analyzed as earlier *duf-t-, perhaps in turn some kind of a labial-stop extension of PIE *dʰewh₂- ‘to smoke’ (PG *dup-?? Svensk Etymologisk Ordbok connects here also Greek τυφος ‘smoke’), then we might be able to assume that the Finnish word derives from pseudo-PF *tupa/*tupo ‘smell’ → *tuβa-ks-u-/*tuβo-ks-u- ‘to put out smell’ > *tu.aksu-/*tu.oksu-, with a similar late contracted diphthong as in words like siellä < *si.ällä < *siɣällä < *sigä-llä ‘there’, or haukka < havukka (attested dialectally) < *haβukka < *habukka ‘hawk’.
[4] See in particular: Koivulehto, Jorma (1979): Baltisches und Germanisches im Finnischen: die. finn. Stämme auf -rte und die finn. Sequenz VrtV. In: Schiefer, Erhard F. (ed.), Explanationes und tractationes Fenno-Ugricae in honorem Hans Fromm, pp. 129–164. München.

Tagged with: , , ,
Posted in Methodology

Weighing etymological distributions

I’ve sometimes remarked (but until now, not on this blog) that one interesting difference between Uralic and Indo-European studies is radically different approaches to lexical reconstruction. Uralic studies have for long hung on to the idea of a deeply stratified family tree, and accordingly, word roots dating to the same, nearly identical stage of phonological reconstruction have been varyingly separated as “Proto-Finno-Samic”, “Proto-Finno-Volgaic”, “Proto-Finno-Permic”, “Proto-Ugric”, “Proto-Finno-Ugric” or “Proto-Uralic” — depending simply on in which branches of Uralic have descendants survived. While on the IE side, all available reconstructions are generally treated under the title “Proto-Indo-European”, no matter if we’re dealing with a word root with a narrow distribution covering only e.g. Germanic and Balto-Slavic, or one found everywhere from Irish to Bengali and from Hittite to Tocharian. (Fairly often also quite different reconstruction stages are equated, at least in name; mostly in connection to laryngeal theory, which I find to be in mostly poor shape when it comes to distinguishing between comparative and internal reconstruction.)

Ironically enough, both sides appear to have been wrong. The evidence for most of the traditional intermediate groupings of Uralic has either evaporated long since, or has turned out to have been illusory all along; while studies on the dialectification of Indo-European fairly consistently keep suggesting the status of Anatolian and possibly Tocharian as early splits.

Focusing more on the IE side for once: there do not, yet, seem to be general-purpose sources that would examine how many of the numerous typological and allegedly synchronic analyses of Proto-Indo-European would hold even if we restricted our view to just the oldest material. (There are individual papers out there somewhere I’m sure, but admittedly I have not been looking especially heavily for them.) But in order to get some kind of a rough idea, I’ve started a small project: taking Wiktionary’s list of Proto-Indo-European roots as a starting point and indexing them according to their distribution across the better-documented IE languages (i.e. no Phrygians or Messapics). You can check on the work in progress over here. Sure enough, while convenient, this is probably also a fairly unsystematic sample of data. I might want to follow up on this by taking at some point a look at some more comprehensive modern rootlists, such as the LIV.

This anyway comes out as a type of dataset I have some practice with by now: a distribution matrix, recording the lack or presence of a root in a subgroup. [1] There are some interesting things you can do with such data, although I think a generally applicable theory remains undeveloped. I already have several similar projects involving Uralic data in preparation — of these, the two in the best shape are a spreadsheet database of the common Samoyedic lexicon (about 780 entries, mostly from Janhunen’s Samojedischer Wortschatz; currently not missing much else than finishing translating the German glosses into English), and another one listing the best-preserved common Uralic lexicon (with reflexes in six or more of the nine main Uralic subgroups, which comes out at about 200 entries; currently not missing much else than finishing adding the intermediate Proto-Samic/Proto-Finnic/Proto-Samoyedic forms). [2]

With PIE and the Indo-Hittite question, one followup could be similar filtering of the evidently abundant “Common IE” lexicon (= everything not attested from Anatolian and Tocharian). It’s after all probable that a lot of vocabulary that once occurred in Anatolian and/or Tocharian remains simply undocumented in the literary records of the languages; and, other things being equal, a word root attested widely across the modern IE languages is more likely to be an archaism (or an erroneous comparison) than one reconstructed on the basis of more fragmented data.

But at this point I run into the question: what kind of a metric should I use for assessing how well has a given proto-root been retained? A flat sum-of-branches function seems to still work decently for Uralic, but for IE, not so much. The fundamentally underdocumented Anatolian and Tocharian are one type of problem, while another are the “family-isolates” Albanian and Armenian, where an order of magnitude less inherited vocabulary is found than in the old major groups like Greek or Indo-Iranian. [3] It seems clear that if a Common IE root is only lost from Alb.+Arm., this is not as big a deal than if it were instead lost from Gr.+II. But how much so exactly? And suppose I were to treat II reflexation worth e.g. one point, but Albanian reflexation worth one half — should I then also treat e.g. Slavic reflexation worth something like 0.8, given that the group is also clearly younger (and has had more opportunities for renewal of vocabulary)?

Initially it may seem that just noting the overall rate of lexical retention should work. Let’s say Albanian has lost 70% of the Common IE lexicon, while Germanic has lost 10%; does this means that loss in Albanian is therefore seven times less valuable as evidence?

This approach however would seem to conflate lexical archaicity and lexical diversity. Even if, say, Germanic and Indo-Iranian are both subfamilies that retain 90% of the common IE vocabulary, this does not imply that their histories have been essentially identical. As far as we know from history and archeology, this “symmetry” would be due to the former having been for long hanging out in the margins of Northwest Europe, and has not had as many opportunities for renewing its lexicon; while the latter has split into further subgroups already early on, including several languages first attested soon afterwards, and so the odds are good that any given IE root could have been retained in at least a few descendants somewhere.

Another variable to take into account thus might be the amount of lexical diversity within a language group. But I also have yet to work out how to formulate a metric for this, exactly. And the question kind of iterates… determining the lexical diversity within e.g. Indo-Iranian is probably going to require a way to assess the lexical distribution between its main branches; and then likewise for determining the lexical diversity within e.g. the Persid languages; and then finally the same also for varieties of modern Persian. Ultimately this then reduces to a question on how well have individual language varieties been documented in the first place.

I might simply need a clearer theory of what am I trying to assess about etymological distributions in the first place. In principle, there seem to be at least two somewhat distinct issues involved:

  • attempting to determine the “internal rate of loss/innovation” for a particular lexeme (which, contrary to even the more sophisticated lexicostatistic theories out there, is in all likelihood not a constant, but rather something further depending on a language’s sociolinguistic situation and other such external variables); and from this approximate how much further back from its oldest strictly reconstructible stage is it likely to date
    (e.g. if we can reconstruct the Common IE roots *kakka- ‘poop’ and *pléwmon- ‘lung’, we could perhaps assume from just the semantics, already before any sound-symbolic or similar considerations, that the former is younger than the latter)
  • attempting to determine how likely it is that a particular widespread word root is actually a later areal innovation rather than common inheritance
    (e.g. all other things being equal, a putative PIE root that has not been attested from any Celtic language is more likely to be a lexical innovation that never reached the westernmost Late PIE dialects, than one that does extend there; or, for that matter, a word root attested only from Latin, Greek and Anatolian carries a bigger risk of involving serial loaning than a word root attested only from Umbrian, Slavic and Anatolian)

Both of these approaches would provide evidence on how likely is it that e.g. some Common IE root was or wasn’t already present in Proto-Indo-Hittite. But they regardless involve distinct historical processes.

[1] Technically these should be considered probabilities, not boolean variables. If a reflex is uncertain or has unclear features, we can mark this uncertainty as a 0.8 or 0.5 or 0.1, instead of a plain 1 or 0. And even the zeros and ones should perhaps be actually considered to be shorthand for ɛ and 1-ε, for some miniscule ɛ approximating the probabilities that we’re in fact wrong about how the history of e.g. Greek works, or how historical linguistics and etymology works in general.
[2] Further information on these and my other similar projects available on inquiry.
[3] Although it is interesting to note that, so far, almost all vocabulary with Anatolian parallels seems to be fairly well-retained even in Alb. and Arm. compared with poor retention otherwise. Perhaps this indicates the greater resilience, and attestability, of core vocabulary compared to peripheral vocabulary? But already the “Indo-Tocharian” layer seems to fare worse. We’ll see if this pattern carries thru.

Tagged with: , , , , ,
Posted in Methodology

Etymology squib: Huoma & co.

An interesting paper I’ve recently found, by Kirill Reshetnikov from 2011: “Новые этимологии для прибалтийско-финских слов”, Урало-алтайские исследования 2 (5): 109–112. A Russian-only journal is a slightly odd location for publishing research on Finnic etymology, but I suppose technically still fair.

Two of his four comparisons I was already aware of, and I have no major complaints about them:

  • Finnic *kasvot (plurale tantum) ‘face’ ~ Samoyedic *kat ‘face’ < PU *kas-
    I’ve noticed on my own the possibility of connecting these words as well. The previous suggestions that I have seen for deriving the Finnic word from *kasva- ‘to grow’ have struck me as semantically forced. What I am not sure of, though, is reconstructing *kaswV as the common Proto-Uralic form. While loss of *w in consonant clusters would be regular in Samoyedic, there are no precedents for clusters of obstruent + semivowel in PU, [1] and also the loss of a stem vowel in Samoyedic after an original consonant cluster would be exceptional. Perhaps the root is simply *kasə, and *-vo- in Finnic is a suffix or two. [2]
  • Finnic *oh-ut, *oh-kainen ‘thin’ ~ Ob-Ugric *waaɣəɬ ‘thin’ [3] < PU *wokšə
    Seemingly independently also proposed by Ante Aikio in the 2nd volume of his recent Studies in Uralic Etymology article series. Multiple non-trivial developments are involved, but the etymology still appears to be entirely regular.

while one comparison I remain hesitant on: North Finnic *turkki ‘fur coat’ ~ Samoyedic *tər ‘fur’. A three-consonant cluster *-rkk- would not be possible for PU; yet an analysis of the Finnic form as something like *tur-kka-j or *turk-ka-j, as Reshetnikov proposes, would seem unusual as well. There are some Finnic roots analyzable as *CVC-ka, I think a few might be analyzeable as *CVw-kka or *CVj-kka, and there’s even *po(n)č-ka ‘shank’, derived from PU *pončə- ‘tail’ — but most of the time “heavy” consonant-stem formations still seem to only involve dental suffixes. Another morphological reason to be suspicious is that the Finnic word is an unalternating *i-stem, a root type usually found in loans. True, some words of this type are old *j-derivatives (e.g. *kota ‘house’ → *koti ‘home’), but in those cases the underived root seems to be almost always still around as well.

Phonologically, I also do not think the development of PU *u to Samoyedic *ə can be considered normal in a consonant-stem root.

Interestingly however, no root for ‘fur’ or ‘fur coat’ seems to have been reconstructed for PU previously (despite a wide number of roots reconstructed with the meaning ‘skin’ [4]), nor even for Proto-Finnic proper; so perhaps that counts as a small point in favor of the etymology.

But mooving on to Reshetnikov’s last, and most interesting, proposal: Finnish/Karelian huoma- ‘possession, care’ (mostly in adverbs) ~ Hungarian óv ‘to protect, guard’. This seems like a good catch. Pre-Hungarian *-w has generally been vocalized word-finally (in words like *köw(ɛ-) > kő : köve- ‘stone’, *low(a-) > ló : lova- ‘horse’, *saw(a-) > szó : szava- ‘word’, *taw(a-) > tó : tava- ‘lake’), so a word-final -v like this would probably come from earlier *-β; which, in turn, is in inherited vocabulary perhaps more often than not from an earlier *m (in words like PU *nimə > név ‘name’, PU *ńälmä > nyelv ‘tongue’, Ugric #äľmV > enyv ‘glue’). [5]

Previously a different etymology for huoma has been proposed: borrowing from Germanic *sōmijan- ‘to fit; to honor’. The common Finnic verb *hoomat- ‘to notice’, and some derivatives such as Fi. huomio ‘attention’ probably do come from this source — but I agree that ‘possession’ is not an obvious development from this meaning at all.

Alas, at this point Reshetnikov seems to skip over some work. Observing that both Finnic and Hungarian have a long mid back vowel *oo / ó, he simply proceeds to reconstruct PU *šoma/*šooma. This was a paper released back when the idea of long vowels in Proto-Finno-Ugric was still mostly unchallenged — but even then, as far as I know, Hungarian ó has never been considered to descend from earlier *oo or *o! Most cases rather have í (e.g. ín ~ PF *sooni ‘vein, sinew’; nyíl ~ PF *nooli ‘arrow’) or a, á (e.g. nyal ~ PF *noolë- ‘to lick’; három ~ PF *kolmë(t) ‘3’; ház ~ PF *kota ‘house’). A short o in possible correspondence to Finnic *oo does appear in orr ‘nose’ ~ Finnic *voori ‘mountain’, and some examples of o ~ *o are known too.

But when it comes to Hungarian long ó, it seems that this is in all cases a secondary vowel resulting from vocalization of coda *w (when following pre-Hungarian *a and *o). The v-stems such as , szó, mentioned above demonstrate the process well. Other examples where the same can be assumed within a root include PU *kuŋə > ‘moon’. Hence the common PU root should be rather reconstructed as something like *šowə-ma, *šoxə-ma, *šoŋə-ma or *šuwa-ma, with the Finnic and Hungarian long vowels both resulting from contraction. Of these, I think the last option looks the best: as the *ma-derivative seems to date to PU already, the *ə-stem variants would have been at a risk of reduction to a consonant-stem formation *šoGma, from which we’d rather expect Finnic **houma or **huuma.

Finnic-internal evidence supports an analysis as a derivative as well. Beside huoma-, there exists also a largely parallel huosta- ‘possession, custody’; again attested only from Finnish and Karelian. The two are analyzeable as derivatives from a common root *hoo- (although -sta- is a rare formant; it usually derives local nouns, such as Fi. alusta ‘base’), which would probably have been a verb meaning approx. ‘to take care of’.

I suspect there’s even a third derivative of this hypothetical verbal root *hoo- to be found in Finnic, hidden in plain sight: *hoolë- ‘to take into custody, to take care of’ (not retained as an underived verb, only as several parallel derivatives, e.g. Fi. huolehtia, huolia, huolita). This has normally been analyzed as identical to the homophonic noun root *hooli ‘worry, bother’ (probably ← Baltic), but the semantics of this derivation seem to be a bit off. In that case, the morphology would also point to the bare root being treated both as a noun and a verb, which is attested in some old inherited words (e.g. *tuuli ‘wind’ : *tuulë- ‘to be windy’) — but I don’t think I’ve ever seen a case of this in loanwords, where instead dummy verbal suffixes are often piled on even for no reason whatsoever (e.g. *he-i-ttä- ‘to throw’, and not **hee-, from Germanic *sēa- ‘to sow’). Thus, an analysis as an old frequentative *hoo-lë- seems to work better.

Later on, this and ‘worry’ have probably semantically bled into each other, leading to e.g. Fi. huolehtia meaning both ‘to take care of’ and ‘to worry about’; or huolia meaning both ‘to take into custody’ and ‘to bother taking’.

If we can therefore establish a PU verb root *šuwa- ‘to take care of’, this also opens one interesting possibility. The common Samoyedic verb for ‘to give’ is reconstructed as *tə- (compare Nganasan ta-, tə-, Selkup *tatə-; but Tundra Nenets tā-). Unsurprizingly, it has been considered a reflex of PU *toxə- ‘to give’; but the development *o > *ə would be quite irregular. [6] On the other hand: the regular PU source of *ə is *u-a; and we expect *-w- to be lost in Proto-Samoyedic, so *šuwa- would seem to be a potential proto-form for this verb. An open stem vowel *-a- is, again, not expected to be lost, but if we reconstructed *təå- or *təə-, with a later contraction to *a or *å (in some forms?) in some languages, this might even explain the irregular long ā in Nenets [7] and open a in Nganasan.

The semantics, I admit, do not match very well at all. But an interesting further formation is the synonymous PSmy *tə-tå- ‘to give’. For this, a derivation from *šuwa- ‘to take care of’ would not be too odd; a causative derivative ‘to give in someone’s care’ could perhaps develop into a neutral word for ‘to give’. If it would make sense for this meaning to then propagate back to the base root is on the other hand a more difficult question.

— One last question to explore might be if this PU root has anything to do with PIE *h₁su- ‘good’, but that would run into a bit too many questions and tangents to be fruitful to get into right now.

[1] The “spirants” *d₁, *d₂ that can be found in a few roots like *käd₂wä ‘female animal’ do not count, I think; phonotactically they seem to behave more like liquids.
[2] I have recently developed a suspicion that there might exist an old obsolete noun-deriving suffix *-wa in a couple other Finnic words as well. One case that seems clear is *päivä ‘day, sun’, in light of an observation due to Janne Saarikivi: only *päj(ə)- should be considered a part of the original root, given several other words that appear to be related. In the UEW we can find some listed under the roots *päjä ‘fire’, *päjV ‘white, to gleam’; also relevant is Estonian päike ‘sun’, which cannot be derived from anything like **päiväkkä.
— As a further aside, this analysis moreover seems to show that the word is not actually an exception to the sound law *ä-ä > Finnic *a-e, which as of recently has been subject to ongoing discussion by at least Kallio, Zhivlov, and Aikio.
[3] *wooɣəθ according to László Honti; however, as I might have remarked before, I am skeptical of Honti’s *aa, which almost never seems to appear in Uralic vocabulary; and of his phonetically backwards sound change *oo > Mansi *aa. It seems more probable to me that if a separate Proto-Ob-Ugric stage existed, then its *aa was simply retained in Mansi, raised to *oo in Khanty. Commenting also on the reconstruction scheme of Eugene Helimski and Mikhail Zhivlov, where this vowel correspondence is reconstructed as short *a, would take me too far off the track here though.
[4] Reasonably well-reconstructible cases include: *čomčə ‘skin layer’ (S, Kh) | *iša ‘skin’ (S, F, P, ?Mo, ?Ma) | *ketə ‘skin’ (S, F, Mo, Smy) | *kopa ‘skin, bark’ (F, Ma, P, Smy) | *küpsɜ ‘skin on paws’ (P, Ms, Kh) | *perə ‘skin’ (Kh, Smy) | *śuka ‘bark / skin?’ (F, Ms, Kh, ?H)
[5] Though there are a couple of odd exceptions, where either expected *β is also vocalized (PU *lämə ‘broth’ > ? *lɛβ > lé : leve- ‘juice’), or *w / *ɣ is not (PU *jekä ‘year’ > ? *ēw / *ēɣ > év).
[6] Given that PU *toxə- has been considered a loan from PIE *doh₃- ‘id’-, it’s also not clear if this verb ever existed in pre-Samoyedic to begin with. The other widespread PU word for ‘to give’, *a/ëmta-, has also not been attested from Samoyedic or even Ob-Ugric, and it looks likely to be a causative derivative from a simpler root.
[7] The contrast between Tundra Nenets ā and ă has often been analyzed phonologically as /a/ vs. /ə/, but since TN in fact also has a separate reduced vowel (usually transcribed °), and contrasts long í, ú with short i, u, I consider this analysis untenable; as far as I know, no other language in the world contrasts reduced and unreduced mid central vowels. (In general, the common habit in Uralic linguistics to treat vowel reduction and vowel centralization as separate phenomena seems troublesome to me.) Instead, I would propose simply analyzing ā as long /aː/, and ă as short /a/. The fact that ā comes from earlier full *a, and ă from earlier reduced *ə, should not be considered relevant; especially since ° is historically largely derived from PSmy *ə just as well, and since *ə is regularly reflected as a plain full vowel /a/ also in the Southern Samoyedic languages.

Tagged with: , ,
Posted in Commentary, Etymology

*je-: A Reprise

Summer’s wrapping up, a new academic year’s about to roll in, and if all goes well, I might be returning to more active blogging around here.

I have also returned, about a week ago, from the 12th International Congress for Finno-Ugric Studies. You can check out my presentation online, too: Semivowel losses and assimilations, in Finnic and beyond. Longtime readers may recall me having first explored the ideas within many blog posts (and one blog platform) ago. Evidence has continued to turn up, and I’m by now quite convinced that my newfound soundlaw *je- > Finnic *i- indeed exists.

The title is admittedly a bit more general than what might be warranted per the presentation’s contents. For space concerns, I was not able to treat the topic of initial semivowels in Uralic languages in more general.

I could mention one fairly simple addendum here, though — while *wo- has been already traditionally well-established, and I attempt to show that abundant evidence of *je- can be found as well, by contrast it seems to me that **wu- and **ji- were not possible sequences in Proto-Uralic (and they remain impossible in most descendant languages as well).

Only one widely accepted instance *wu- has been proposed: the word for ‘new’ (> Fi. uusi, Hu. új etc.), traditionally reconstructed (modulo notation) as *wud₂ə. I however suspect that this should be instead reconstructed with *o, and that the evidence suggesting *u is due to the shift *o-ə > *u-ə in open syllables, as first proposed by Janhunen (1981) for the Finno-Permic end of the family. (Seen also in e.g. *lomə > *lumə > Fi. lumi ‘snow’. Sammallahti has later suggested that the change also affected Ugric; I am skeptical, however. More on this at some point in the future.) — For *ji- there are a few more potential examples, but the best-looking cases (? *jikä ‘age’, ? *jitɜ ‘night’) fall among those where I believe *je- should be rather reconstructed.

If so, then it seems to me that we can likely apply for Proto-Uralic a phonological analysis also known from various other languages of the world: to unite *j *w on one hand and *i *u on the other as allophones of each other.

Tagged with: , , ,
Posted in News, Reconstruction

On comparison in Proto-Uralic

Here is a somewhat speculative idea that recently occurred to me. I don’t think I will be able to deliberate on all the comparative implications just now, but it wouldn’t surprize me too much if something similar had already been proposed.

A relatively well-known suffix element usually reconstructed for Proto-Finno-Ugric (implicitly Proto-Uralic, in wake of Finno-Ugric turning out to be probably not a genetic grouping) is the comparative suffix *-mpa, reflected quite well in the best-known Uralic languages: Finnish -mpi : -mpa-, Estonian -m, Hungarian -(a)bb. [1] The Samic languages also have clear cognates, e.g. the Northern Sami bisyllabic adjectives’ comparative ending -t : -bu- (from slightly earlier *-b : -bu-).

No trace of such a comparative suffix though is known elsewhere in Uralic. This smells slightly suspicious. Hungarian is, overall, quite innovative, and usually whatever clear old Uralic features have been retained there, can also be traced in at least some of its Russian relatives; especially Ob-Ugric and Permic. There’s a proposed Samoyedic cognate, but as far as I recall seeing, found only in Nenets — and only used in an approximative sense.

It’s also the case that there is a quite well-established PU participle ending *-pa. These two suffixes share the privilege of being just about the only places in comparative Uralic inflectional morphology where *p occurs; and both of them have very roughly adjectival semantics. Might it be possible to thus segment *-mpa as *-m-pa? We’d like to know if this can be made to make semantic sense; and if we can find a reasonable candidate for what the nasal element comes from.

The former question can be roughly reformulated as: “if a thing is greater (than another one), what is it doing?” To me it seems the answer would be “exceeding, being greater”. A PU “comparative” form such as *wod₂ə-mpa (> Fi. uude-mpi, Hu. új-abb ‘newer’) could then be instead analyzable as *wod₂əm-pa, meaning ‘that which *wod₂əN-s’; and which could have independently developed into an IE-style nominal comparative in Hungarian and Finno-Samic. Originally it’d have been instead the verb stem *wod₂əN- that captured the “comparativeness”, meaning something like ‘to be newer’. The approximative sense in Nenets seems well-derivable from this as well — we can easily imagine the base meaning as just ‘to be new’, and derive from this on other hand an amplification ‘to be newer’, on the other hand a mitigation ‘to be newish’.

Above I’ve written the nasal of my internally reconstructed verb stem as just -N-. While Proto-Uralic allowed heterorganic nasal+stop consonant clusters with a coronal stop, [2] there seem to be no examples of this with a peripheral stop. Only *mp and *ŋk can be reconstructed stem-medially, while there are no **np, **nk, **ŋp, **mk. So I suppose all nasal consonants are fair game here. (And, of course, in Finnic and Hungarian all such heterorganic clusters assimilate anyway.)

— Now consider Finnish verbs derived with the suffix -ne-: e.g. iso ‘big’ → isone- ‘to become bigger’; mätä ‘rotten’ → mätäne- ‘to rot’; pimeä ‘dark’ → pimene- ‘to become dark(er)’. The suffix is used almost exclusively on adjectives, and typically forms verbs meaning indeed increase in quality. This seems to provide a great candidate for a Proto-Uralic derivative class, on which the nominal-type comparatives of “Western European Uralic” could have been based. Altogether, originally a word like Fi. isompi ‘bigger’ would have been a consonant-stem participle, equal to modern Fi. isoneva ‘that which increases, becomes bigger’ (pseudo-PU *ićäwmpä ~ *ićäwnəpä).

A chief remaining problem would be whether we really can reconstruct this verbal suffix all the way to PU though. SKRK reports similar usage across Finnic, as well as possible cognates in Ob-Ugric and Samoyedic; these however indicating original *-m-! Another hypothesis mentioned would be comparison to a momentane suffix -n in Hungarian, found in some fossilized forms such as villan ‘to flash’ (seemingly related in some way to világ ‘world; (archaic) light’ < PU *wëlkə). This sounds a bit better with respect to my reconstruction, but I’d like having some more supporting evidence. And of course, I’d also have to check how well the development of comparison constructions elsewhere in Uralic can be lined up with this scenario.

[1] For clarity, I’m ignoring vowel harmony in this post.
[2] Perhaps the clearest case is *tumtə- ‘to know’, whence e.g. Fi. tuntea, NS dovdat, Hu. túd, Tundra Nenets tumtă-.

Tagged with: , , , , ,
Posted in Etymology, Reconstruction

Linkday #2: FUF online

A small discovery to report: looks like someone from University of Toronto has kindly digitized a few back issues of Finnisch-Ugrische Forschungen, old enough to be out of copyright, and uploaded them on; findable e.g. under the keyword “Finno-Ugric languages — Periodicals“. Currently available are issues 1, 5, 6, 7, 9, 10, 11, 14, 15, from between (straightforwardly enough!) 1901 and 1915.

(Though the last one actually includes an article from a young Y. H. Toivonen (1890-1956), and which would according to Finnish copyright law remain under copyright for a while still… but one hopes the Toivonen estate will not consider this a terrible injustice.)

Posted in Uncategorized

Enter your email address to follow this blog and receive notifications of new posts by email.


Get every new post delivered to your Inbox.

Join 33 other followers