Linkday #5: Free Resources in Linguistics (Uralic & Otherwise)

I try to keep my sidebar at a manageable size by limiting it to blogs and resources on historical linguistics; but there are obviously many other linguistics sites worth checking out out there as well. One not strictly directly neighboring blog I’ve enjoyed for a while now for has been Humans Who Read Grammars (frequency of gratuitous .gif usage aside perhaps); their most recent post is perhaps exceptionally useful as a concept, in compiling a brief “list of lists of lists” on linguistic resources. (I wonder if a more centralized location such as Glottopedia would be optimal for compiling this type of work eventually.)

Which then brings me to one section on the meta-list: lists of open access journals in linguistics, one compiled by George Walkden, another in-house by HWRG. I am happy to note that Uralic studies seem to be quite well-represented, ranging from traditional establishment journals such as Suomalais-Ugrilaisen Seuran Aikakauskirja (Journal de la Société Finno-Ougrienne) (est. 1886) to newcomers such as Finno-Ugric Languages and Linguistics (est. 2012). At least one older journal also seems to have recently joined the ranks without me having noticed this before: Études Finno-Ougriennes, France’s only regular publication in the field, now added to my links. As could be expected of a slightly older journal (est. 1964), currently only a newer issues are available online. I would still hope to see them expand their coverage further back eventually, though. From what I have heard from the SUSA crew, copyright issues may be of some trouble with digitizing backcatalogue from a few decades back; but surely not undealwithable. The best example for this in Uralic studies is perhaps Hungary’s long-running Nyelvtudományi Közlemények, whose comprehensive online archive spans more than 150 years of issues! — from 1862 to 2015 (though they remain excluded from being considered an open access publication due to their embargo on the most recent issues).

Maybe these developments will also help a bit in dispelling Uralic studies’ alleged status as an arcane and poorly accessible subfield of linguistics. Now, in part this dubious fame is surely a language barrier issue: having substantial amounts of literature only available in Finnish, Hungarian or Russian is a hurdle for most people of any background. And for today’s scholars used to the convenience of literature being available mostly in English, the long-and-thin research history of Uralic studies moreover also makes more substantial demands in German and partly French reading skills than many other subfields. Easy access to literature will therefore not fix everything right away… It will still be a big help for many of us following along from home; I have myself recently moved further away from downtown Helsinki and I am already missing convenient university library access 20 minutes away. Yet it’s also clear that, at least in comparative-historical Uralistics, we continue to be lacking accessible up-to-date reference materials on many of the basics of our field. (I could mention also the lack of reference grammars or dictionaries that would be up to modern standards for many languages — though we’re probably still well ahead of the global curve on this.) That target is regardless drawing slowly closer.

Advertisements
Tagged with: , ,
Posted in Links

Etymology squib: *paliti

Ranko Matasović, in a recent paper “Substratum words in Balto-Slavic“:

Balto-Slavic also has a number of verbal roots which do not appear to have any cognates elsewhere. (…)
• BSl. *pel-/ *pāl- ‘burn’ > PSl. *paliti ‘burn’

I will take his word on the nonexistence of clear Indo-European cognates. However, we can find a near-identical root right next door in West Uralic (= Samic-Finnic-Mordvinic): *pala- ‘to burn’. This seems like a much clearer point of comparison than Matasović’s proposal of metathesis from PIE √leh₂p- ‘to shine’.

A traditional further comparison within Uralic has been with with Ugric *pad₂ɜ- ‘to freeze’. I’ve never found this compelling. The semantics display a “thermal inversion”, and phonologically this only works by recourse to the dubious PU *ľ, and even then only halfway: in Khanty we’d expect then *Ľ, not *j. I’m more inclined to accept instead the recent connection in Aikio ’16 with the long-known word-family of *pala ‘bit’, more specifically with verbal reflexes in Ugric and Samoyedic meaning ‘to devour’.

Originally I was planning on simply quoting the Uralic material and concluding with “∎”, but no, this does not yet add up to a trivial etymology. For one, even though the narrower distribution clearly suggests West Uralic → Slavic, we could still consider also the opposite direction of loaning (at the cost of abandoning the East Uralic cognates, geographically too far off to be of Slavic or even Balto-Slavic origin; but also at the benefit of dispensing with the semantic shift from ‘devour’). For two, the correspondence Uralic *a ~ Slavic *a poses some difficulty. These are identical graphemes — but before the Great Common Slavic vowel shift, the latter “*a” would have been phonologically a long vowel *ā. [1] Could *a on the Uralic side, not originally subject to a length distinction, been substituted as long *ā > CSl. *a, instead of short *a > CSl. *o? Possible, perhaps, but no such phenomenon surfaces in any of the (rather few) known old Uralic loans into Germanic and Baltic. Alternately, could this be a loan late enough to have skipped the CSl. vowel shift altogether? Again, maybe this is possible. But we clearly end up with some uncertainties in how this supposed loan could have been routed.

For three, the WU root alternates with an “ablaut variant” *pol-(t)ta- ‘to burn (tr.)’, which has never been properly explained. Under current knowledge, we could maybe derive the Samic and Mordvinic variants (*poaltē-, *pultə-) from earlier *palə-ta- ~ *palə-tta-, though Finnic still remains problematic. [2] The existence of comparanda in Slavic opens some new options, though. Some kind of back-loaning is one possibility; for another one, since I am not 100% convinced that this is a U → Sl loan and not the opposite, maybe we could derive the Uralic variants from actual IE ablaut variants, such as an earlier full grade *pōl- versus an otherwise lost zero grade *pal- (from earlier *poh₃l- ~ *pᵊh₃l-)?


Later on in the paper, Matasović also gives a list of various voiced/voiceless doublets, mostly from Baltic. He then adds a strange comment: “In some cases, words showing this alternation may be Uralic loanwords, or they may reflect the pronunciation of originally Baltic words by speakers of Uralic, who underwent language shift.” This does not seem to be combined with any attempt to find Uralic equivalents though… and in many cases such a search would be doomed anyway. At minimum, doublets with word-initial consonant clusters (a bit over half of the cases, e.g. Latvian sniekt ~ sniegt ‘to give’; Lith. klusnus ~ glusnus ‘obedient’) would be clearly alien to Uralic phonology. The only case in his list that I could possibly see as connected to any actual Uralic words is Lithuanian viskėti ~ vizgėti ‘to swing’, which has some similarity with Finnic *viskat- ‘to throw, cast’ (but maybe not enough to actually matter).

I don’t want to harp on Matasović in particular. But this regardless strikes me as a part of a wider disconnect between IE and Uralic studies. The oversights here — the false negative of *paliti being supposedly isolated, and the (weak) false positives of words like klusnus being called possibly Uralic — both fit into a pattern where Uralic gets unwarrantedly treated as lexical terra incognita, despite extensive research to the contrary; much of it in readily accessible languages like German and English, even.

Plenty of Uralic-IE lexical comparisons have of course been compiled over the times… by Nostraticists and Indo-Uralicists. Skepticism on macro-comparison hypotheses like these should not be taken as a reason to neglect the raw data used, though: by now many cases have been shown or at least proposed to be loans from IE to U instead. I would expect close analysis to also reveal some number of cases better explainable as U to IE loans, too, if we would only pause to at least consider the possibility. [3]

[1] This type of error is committed every so often in IE/U loanword studies, where numerous traditional transcription schemes clash. Also worth mentioning is at least the similar graphemic identity but phonological non-identity of PU *a (an open back vowel, [ɑ ~ ɒ]) and PII *a (a non-close central vowel, [a ~ ə]); which explains e.g. why PII *ćata ‘100’ gets loaned as PU *śëta (with *ë, a mid non-front vowel, [ʌ ~ ɤ]) rather than *śata.
[2] I’ve sometimes wondered if the stem type shift *a-ə > *o-a could have under some conditions, such as after *p, extended to Finnic as well. However, by this approach it would be mysterious why we end up with Fi. polttaa and not a contraction verb polata : polaa- (*polat-). Estonian põlema ~ Votic põlõa ‘to burn (intr.)’ also seem to evidence instead a base root *polë- < *polə-. One other solution would be to suppose instead *palə- > *poolë- (just as in *pälä > *palə > *pooli ‘side, half’), followed by suffixation to transitive *pool-tta-; which might have been then shortened to *poltta-, due to a general ban on CVVCC syllables in Proto-Finnic times. After this Estonian and Votic might have back-derived the stem *polë- from this…
[3] For one other possible example, cf. the case of Uralic *pisə- ‘to put (in)’ ~ Balto-Slavic *p(e)is- ‘to push, to fuck’, briefly discussed in this blog’s comments earlier.

Tagged with: , , , ,
Posted in Commentary, Etymology

Observations on second-syllable vocalism in Khanty

This summer I’ve finished digitizing the main bulk of comparative data from László Honti’s Geschichte des obugrischen Vokalismus der ersten Silbe (1982): his 724 Proto-Ob-Ugric reconstructions and their descendants in the individual Mansi and Khanty varieties. Before making this available in any form though, I’m planning on eventually cross-checking at least a few other key sources. For one, there are Steinitz’ DEWOS, Kannisto’s recently released Vogulisches Wörterbuch, and some other materials for additional dialect coverage; for two, there are UEW and similar sources covering inherited vocabulary that has only been retained in one of Mansi and Khanty; for three, I will be also adding the data Honti includes but considers uncertain (this part already underway). [1] A potential fourth extension could be the known loanwords of Komi / Turkic / Tungusic / older Russian origin, at least whenever attested in both Mansi and Khanty: they should be able to offer substantial evidence for constraining speculation on historical phonology.

However, even at this stage, the data can be assumed to contain a substantial part of the inherited lexicon of Mansi and Khanty. So I have taken the opportunity to do some preliminary comparative analysis.

One interesting underresearched topic is second-syllable vocalism, which actually includes even the basic groundwork within Mansi or Khanty. This might have importance for Uralic comparison in general, since our current understanding of Proto-Uralic word stem types comes mostly by extrapolation from Finnic and Samic. Although the basic division into *A-stems (~ *a-stems & *ä-stems) and *ə-stems (~ *e-stems or *i-stems) finds some substantial confirmation from Mordvinic and Samoyedic, it fares substantially more poorly with Mari, and within Permic and Ugric, there is not too much direct evidence to work with second-syllable vowel contrasts at all the first place. Attempting to reconstruct other second-syllable contrasts from conditional vowel developments in the first syllable is theoretically possible (I believe Zhivlov (2014) is still the most recent example of this), but this carries often a risk of circular logic, and if low on data, may also run into accidental correspondences between unrelated phenomena.

There is regardless some direct evidence of second-syllable vocalism in Ugric. Looking in the rest of this post at Khanty in particular: the Khanty evidence has been explored in the 60s in some aspects by Gerhard Ganschow and Gert Sauer, [2] but mostly the topic has gone without detailed research. Steinitz’ Geschichte des ostjakischen Vokalismus (1950) does not treat the subject and only focuses on the first-syllable system.

A few overview notes on unstressed syllables, without detailed analysis of the data, are given by Sauer in Die Nominalbildung im Ostjakischen (1967) and Honti in Chrestomathia Ostiacica (1984). These outline a division into five stem categories:

  1. Basic consonant stems (the most common).
  2. *A-stems, with an open full vowel (*ää, *aa). Decently preserved in inlaut (verb roots, CVCAC and other longer stem types), but in absolute auslaut in the nominative of noun roots, the vowel is widely reduced and possibly lost entirely.
  3. *I-stems, with a close full vowel (*ii, *ïï). Preserved somewhat more widely, again better in inlaut than in auslaut.
  4. A third vocalic stem type, yielding *I-stems in Eastern Khanty but *A-stems in Western Khanty.
  5. *əɣ-stems: these behave as ordinary consonant stems in EKh, but vocalize in WKh to merge with the *I-stems.

This certainly covers most of the bases. A close look at the comparative data, however, suggests that this picture should be probably modified and perhaps also expanded.

The *I-stems, reinterpreted

I would propose as an initial adjustment that the *I-stems are to be reinterpreted as a part of the consonant stems: as *əj-stems. This is indirectly suggested by the absense of stems of the shape *CVCəj from the Proto-Khanty lexicon, even though *CVj is well-attested (e.g. *ɬöj ‘pus’, *pooj ‘ice crust’, *saaj ‘goldeneye’) and examples of *CVjəC occur too (*kaajəm ‘ash’, *waajəɣ ‘animal’). Direct support is provided by at least *ńooɣïï ‘meat’, *sooɣïï ‘clay’, cognate to Mansi *ńaawľ, *suwľ respectively. Instead of parallel suffixation, these can be analyzed as reflecting the typical sound correspondence Mansi *ľ ~ Khanty *j (< PU *ď; the intermediates are not obvious, but that question is irrelevant for now). Dating *-əj > *-I as a proto-Khanty innovation does not seem to be possible either, since in verb stems, Southern Khanty still retains /-əj-/: e.g. ‘to break’, Far Eastern (Vakh-Vasyugan) /aarïï-/ ~ Southern /oorəj-/. (And see below for some related considerations concerning the *əɣ-stems.)

This also accounts for a minor typological paradox. Why is second-syllable *-I better retained than *-A in the Khanty dialects, even though we would expect a close vowel to be more readily subject to reduction? A promising answer would be that the vocalization *-əj > *-I, despite being reflected in all Khanty varieties, is more recent than the partial reduction of *-A in some varieties. Sound changes *ej > /iij/ ~ /ij/, *je > /jii/~ /ji/ are also well-known in Northern Khanty, [3] and I suspect this is additionally a part of the same wave of vowel coloring, in these varieties further generalized to the first syllable. This would date *-əj > *-I as at minimum more recent than the Southern / Northern split.

I have seen the sound change *-əj > *-I mentioned in various works already (Sauer, Honti, Helimski…), but not anyone willing to bite the bullet and note that this can be taken as the definitive original source of this stem type.

Also, one secondary sound change. It appears that in Obdorsk Khanty, word-final *-I > /-aa/ after /x/: *ńalkïï > /ńalxaa/ ‘Siberian fir’; *ńooɣïï > */ńoxaa/ ‘meat’. This appears related to loss of vowel harmony. In NKh, first-syllable *ïï > *ee before velars instead of > *ii elsewhere, and I suspect something similar is involved here. I would assume first *-kïï > *-xïï > *-xëë, then *-ëë lowers to /-aa/ instead of backness neutralization to **-ee.

This would then seem to show that yes, Western Khanty too (or at least Obdorsk Khanty) has gone through a vowel-harmonic stage with *-ii ~ *-ïï, instead of directly vocalizing *-əj to front *-ii everywhere.

The *Aj-stems

Reconstructing *əj-stems also sheds light on the fourth stem category in the outline above. I would side with Honti in reconstructing these as *Aj-stems. Southern Khanty provides clear evidence in favor: e.g. /xašŋääj/ ‘ant’ ~ Far Eastern /koočŋïï/. Sauer considers /j/ in SKh to be instead epenthetic, generalized from inflected forms where a vowel-initial suffix followed, but we can again appeal to comparative evidence from Mansi, where we find e.g. Northern /xooswoj/. I would add that with correct relative chronology, the development *-Aj > *-I in EKh drops right out of the other attested sound laws, with no need to posit any additional changes particular to this stem category: start with the reduction *A > *ə, follow up with *-əj > *-I.

The origin of the *Aj-stems also appears to be clarifiable. Words such as ‘ant’ point in the direction that they often originate in compounds. I believe that in many cases, their second member is likely the root seen in Ms *wuuj ‘animal’, though found independently in Khanty only in the suffixed form *waajəɣ. Some other *AAj-stems in Khanty that seem to have this origin include: *jeetərɣääj ‘black grouse’; *kaaməɭkaaj ‘water beetle’ (maybe with a first component akin to *koomɭəŋ ‘bubble’); #karŋaaj ‘woodpecker’ (and thus, contra Honti, not segment-for-segment identifiable with Hungarian harkály); #wuurŋaaj ‘crow’.

Many of these words also show irregular vacillation between medial *-ŋ- and *-ɣ-. My hypothesis is that this might be a trace of the PU genitive suffix *-n, and e.g. what I write as approximate #karŋaaj (Obdorsk metathesized /xaŋraa/; Konda spirantized /xaxrääj/; Surgut /kajaaïï/ and Far Eastern /kajərkïï/, maybe by metathesis and dissimilation: < *kaɣərKəj < *karɣəKaaj?) should be thus reconstructed as something like #karkə-n_waaj > #karɣəŋɣaaj, reflecting an original genitive attribute construction: ‘animal of the beak’, or something to that effect.

Compound origin would additionally explain also the complete absense of *Aj-stems among verbs.

It’s also possible I am late to the scene here. I’ve seen references to a 2003 paper by Anna Widmer “Zur Geschichte des obugrischen Tiersuffixes”, [4] and it sounds like this covers this same topic, but I do not (currently) have access to it.

The *əɣ-stems

Among the *əɣ-stems, an interesting complementary distribution appears that I have not seen remarked on before. Many sources note that the reflexation in Northern Khanty in nouns is somewhat inconsistent: in some cases we find Kazym /-i/, Obdorsk /-ii/, the same as in *I-stems; but, in others, we find Kazym and Obdorsk zero. (Southern Khanty and the “transitional” Nizyam dialect have consistently /-ə/ in both cases.) Verbs also only show the development to *-I-.

This split distribution seems to be conditioned by the preceding consonant: *-əɣ > *-I appears after obstruents, *-əɣ > ∅ after sonorants. Some examples of the former:

  • ‘owl’: Vakh /jewəɣ/ ~ Kazym /jipi/
  • ‘Khanty’: Vakh /kantəɣ/ ~ Kazym /xanti/
  • ‘birch bark’: Vakh /tontəɣ/ ~ Kazym /tonti/
  • ‘barbel’: Vakh /mööɣtəɣ/ ~ Kazym /meewti/
  • ‘duck’: Vakh /wääsəɣ/ ~ Kazym /waasi/
  • ‘knife’: Vakh /kööčəɣ/ ~ Kazym /keeši/
  • ‘pine’: Vakh /ɔɔɳčəɣ/ ~ Kazym /wooɳši/

And some examples of the latter:

  • ‘song’: Vakh /äärəɣ/ ~ Kazym /aar/
  • ‘roach’: Vakh /läärəɣ/ ~ Kazym /ɬaar/
  • ‘crane’: Vakh /taarəɣ/ ~ Kazym /tɔɔr/
  • ‘bowl’: Vakh /ääɳəɣ/ ~ Kazym /aaɳ/
  • ‘lightweight’: Jugan /köńəɣ/ ~ Kazym /keeɳ/
  • ‘bog’: Vakh /kɔ̈ɔ̈ɭəɣ/ ~ Kazym /kaaɭ/
  • ‘animal’: Vakh /waajəɣ/ ~ Kazym /wɔɔj/

There is only one example involving Proto-Khanty *L (a cover symbol representing both *ɬ and *l, which are medially neutralized everywhere). [5] It appears to align with the sonorants:

  • ‘rope’: Nizyam /keetə/ ~ Kazym /keeɬ/

Inconveniently, here *-L- continues PU *-d-. It is therefore not possible to clearly tell if we are dealing with Proto-Khanty *-l- or *-ɬ-, since both paths of development have been suggested. In principle, though, this example would support a claim that the development was in fact first to *-l- (a sonorant), as also in Permic / Mansi / Hungarian.

I am not sure how the split development here should be interpreted phonetically, either. The core motivation seems to be a general cross-linguistic one at least: sonorant codas are more licensable than obstruent codas. But at least secondary loss of /-i/ after sonorants is ruled out, since in genuine Proto-Khanty *I-stems (*əj-stems) this remains. Examples are not numerous (by far most occur following /r/), but they exist:

  • ‘riverbed’: Vakh /uurïï/ ~ Kazym /woori/, Nizyam /uurə/
  • ‘sturgeon’: Vakh /köörii/ ~ Kazym /kari/, Nizyam /karə/
  • ‘scab’: Vakh /kaľïï/ ~ Kazym /xaɬ´i/, Nizyam /xaťə/

This thus ends up further supporting my above-suggested chronology, where *-əj > *-ij > /-i/ took place only after the separation of Northern Khanty: the *-əɣ > ∅ group likely never went through an *-əj-stage. In other words, whatever the exact split development here was, it would have predated the common (but not Proto-!) Western Khanty shift *-əɣ > *-əj.

Maybe this could even be equated with the development of post-tonic (“non-stem”) *ɣ to /j/ in Obdorsk Khanty under certain conditions (e.g. ‘father’: EKh /jeɣ/, Nizyam /jiɣ/, Kazym /jiw/ ~ Obdorsk /jiij/; ‘power’: Vakh /wööɣ/, Nizyam & Kazym /weew/ ~ Obdorsk /weej/). This would then require rather early separation between Obdorsk and the other NKh dialects though, perhaps early enough to invalidate the concept of “Northern Khanty” as a genetic group altogether, and turning it into merely an areal subset of Western Khanty varieties.

I would not take this last corollary as a huge problem though, since I actually suspect the same already on other grounds as well… For just two examples:

  • The word for ‘grass’. Far Eastern and Obdorsk have /paam/, while the other dialects have reflexes pointing to *pɔɔm. This surely involves an irregular (“non-provably regular”?) labialization between two bilabial consonants; [6] and yet this labialization cuts across the conventionally accepted grouping of the Khanty dialects.
  • The treatment of supposed Proto-Khanty *ɔ̈ɔ̈ and *öö. These yield in some contexts /oo/ in Obdorsk, but *ää and *ee respectively in the rest of Western Khanty. Yet, the elimination of front rounded vowels is pan-WKh, and e.g. Honti and Steinitz claim it as indeed proto-WKh. [7] But if so, we have to route Obdorsk /oo/ differently. I wonder if another early shunt will work: if, following Helimski etc. we reconstruct lax open *ä, *a instead of *ee, *öö, *oo, then it will be possible to re-route “*öö > /oo/” as *ä > *a > /oo/, involving a pre-Obdorsk conditional retraction of *ä to *a in some environments.

— For some reason, nearly all words of the *-əɣ > ∅ group also involve Proto-Khanty low *aa, *ää, or mid *ee, *öö, *oo (= *ä, *a?). Perhaps there is also something more going on in here. This is also suggested by one example with a close vowel, where in Northern Khanty we find metathesis instead, viz. ‘eight’: Vakh /ńïïləɣ/ etc. ~ Nizyam /ńiwtə/, Kazym /ńiwəɬ/, Obdorsk /ńiijəl/ (< virtual PNKh *ńiiɣəɬ).


I also wonder how the changes *-əɣ > *-əj > *-I would interact with another innovation common to all of Khanty: the cluster contraction *-jt- > *-ć- (often involving the PU verbalizing suffix *-ta-, e.g. in *uj-ta- > PKh *ɔɔć- ‘to swim’). The more economical approach — that *-jt- > *-ć- was Proto-Khanty while *-əj > *-I was post-PKh — would however predict that we should find cases where an *I-stem noun or intransitive verb has a corresponding intransitive or transitive verb (respectively) ending in *-əć-. Offhand I cannot locate any such cases, however. But maybe this type of derivation was morphotactically impossible in the pre-PKh period? For comparison, in Finnic *-i < pre-PF *-j is a common suffix of deminutive nouns, and *-i- < *-j- is a common suffix for iterative verbs, but these generally do not form further verbal derivatives: any corresponding verbs are instead formed from the underived root.

At least one word also suggests the possibility of *əj > *-I being earlier than the contraction to *-ć-: ‘to split’, Vasyugan /ɭaaŋkïït-/ ~ SKh /laaŋxət/ ~ Kazym /ɭooŋkit-/, where we would seem to have PKh *ɭaaŋkəjt-. However, this could also be a later derivative, formed after *-jt- > *-ć- had ceased to operate.

There also seems to be a lack of PKh words ending in coronal + *-I, that is, earlier  *-təj, *-səj, *-nəj, *-Ləj. (There are a few examples with a /Ct/  consonant cluster though, e.g. *aŋtïï < *aŋtəj ‘horn’; *maartïï < *maartəj ‘mythical land of birds’.) Maybe this indicates a parallel palatalization, and pre-Khanty *-Cəj or *-CjV resulted in a stem-final palatal instead of an *I-stem. Stems of the shape CVĆ are not very common in the current dataset either, though. But maybe any examples of this simply have not been connected with their equivalents in Mansi or elsewhere in Uralic yet?

Retaking inventory

Since it turns out that close second-syllable vowels in Khanty are secondary, from the Proto-Khanty perspective I should be probably talking about vocalizable stems, not “vowel stems”. This then suggests that a sixth category should be also distinguished: PKh *Aɣ-stems. These would then fill up a neat 2×3 system:

  • vowel stems: *-A(C), *-Aj, *-Aɣ
  • consonant stems: *-∅/-əC, *-əj, *-əɣ

A few words ending in *-Aɣ are indeed reconstructed by Honti, and they indeed also show distinctive development of their own. A representative example would be the adverb *koɳčaaɣ ‘on back’: Far Eastern /koɳčaaɣ/, Surgut /koɳɣïï/, Southern /xončää/, Nizyam & Kazym /xonšaa/, Obdorsk /xonsaa/. So we have here:

  • loss/vocalization of *-ɣ in WKh, versus its retention in EKh (same as in *əɣ-stems);
  • retention of *-A in not just EKh but also WKh, presumably protected by the earlier word-final consonant (partially same as in *Aj-stems);
  • a strange development to /-ɣïï/ in Surgut, perhaps through metathesis (*-aaɣ > *-ïïɣ > *-ɣïï)?

Kind of paralleling *Aj-stems being mainly animal names, all of Honti’s examples seem to be adverbs. The other two are *koomtaaɣ ‘overhead’, *pertääɣ ‘back’. I would add to this group also *maakaaɣ ‘previous’, which he reconstructs as *maakaaj, despite SKh /maxaa/ and not ˣ/maxääj/.

The *A-stems

Moving onto the main bulk of *A-stems, these may also need to be analyzed as partially secondary. This, however, requires taking a few steps back to look at the wider context.

While the modern Khanty varieties and also most reconstructions of Proto-Khanty abound in consonant stems of the shape CVC, CVCC or CVCəC, it is clear that this is an innovation, and that in Proto-Uralic the dominant root structure was bisyllabic *CV(C)CV. It is also clear that the transition towards consonant stems across a wide central area among the Uralic languages has taken place mostly as areal drift, not as a diagnostic subgroup innovation. Marginal languages of this type, such as Estonian, Nenets and Skolt Sami, still remain at a “thematic inflection” stage, showing consonantal nominative singular forms but vocalic inflectional stems. A good example would be Estonian nom.sg. silm : gen.sg. silm-a ‘eye’, where the latter form is at least from a historical point of view better viewed as silma-∅ (and thus structurally identical to Finnish silmä-n). Verbal roots, which generally cannot stand alone, also generally retain original second-syllable vocalism. And due to the lucky fact that the largest clear subgroups of Uralic all occur near the edges (Finnic, Samic, Samoyedic), in all of these cases we will be able to compare these languages with close relatives that remain at a firmly vowel-stem-centric inflection type (e.g. Votic, Inari Sami, Nganasan, respectively).

A transitional stage, one of several possible, is represented by Hungarian, where nouns retain a trace of thematic inflection (nom.sg. hal : plural hal-a-k ‘fish’; but nom.sg. dal : pl. dal-o-k ‘song’). However, in adjectives and verbs, presumable earlier lexically determined stem vocalism has been levelled entirely, and in most word forms second-syllable vocalism is now better analyzed as morphologically determined. Constantly vocalic stems have also been reintroduced among nouns, primarily in loanwords (e.g. balta : baltá-k ‘axe’, from Turkic), but also in derivatives (e.g. apa ‘father’, where -a has been interpreted as a fossilized possessive suffix).

Sauer’s old work proposes that *A-stems would be a retention from Proto-Uralic in one environment specifically: stem-finally in nominals, as suggested by a few equations like PU *neljä > PKh *ńeLää ‘4’. This would imply that elsewhere they aren’t retentions. The PKh situation as currently reconstructed therefore seems to derive from something close to the Hungarian situation, where original stem vowels have first been almost always phonetically reduced or analogically reshuffled away; then new ones are introduced.

Loanwords can of course fill in new second-syllable vowels, e.g. EKh /aarkaan/ ‘thick rope’, from Turkic; *ajaa > EKh /ajaa/ ~ /ajə/, WKh /aj/ ~ /oj/ ‘luck’, from Tungusic. In native vocabulary though, the most natural source for new second-syllable vowels are original third-syllable vowels. Given the original trochaic stress pattern of Proto-Uralic (as still continued in Samic, Finnic, partly Hungarian and Samoyedic), foot-final vowels would be expected to be the first ones to fall. After this, earlier 3rd-syllable vowels will move one syllable forward, becoming new unreduced 2nd-syllable vowels.

In at least some of the examples I’ve discussed above, 2nd syllable *-A clearly derives from an original 3rd syllable. *koomtaaɣ ‘overhead’, for example, is probably a derivative of PU *kuma- ‘overturned’, i.e. descends from pseudo-PU *kuma-takV. The entire animal name group also falls under this.


Now, the crucial question is — at what point in the history of Khanty was the distinction between “primary” 2nd syllable vowels, retained since PU, and “secondary” 2nd < 3rd syllable vowels lost for good? I think there’s reason to think that this, too, was post-Proto-Khanty.

Relatively poor retention of absolute final *-A is maybe best attributed to specifically word-final reduction/loss. The numeral ‘4’ for example, does not surface with a final full vowel anywhere: the reflexes are Far Eastern /ńelə/, Surgut /ńeɬə/, SKh /ńetə/, Nizyam /ńitə/, Kazym /ńaɬ/, Obdorsk /ńiil/. In many other cases, only the Vasyugan dialect delivers: e.g. *paraa ‘raft’ > Vy. /paraa/, Vakh, Surgut & Demyanka (SKh) /parə/, Obdorsk & most SKh /par/, Nizyam & Kazym /por/.

(It’s unclear at least to me what’s up with the loss of *-A in SKh and Nizyam in ‘raft’, versus its retention as /ə/ in ‘4’. Both patterns have further examples; retention is more common. I’m not sure if I would want to utilize a “primary/secondary” distinction just for these.)

A bigger problem though is that “primary” *-A is mostly lost also in verbs, even though in these the vowel would have been always protected by an inflectional ending. For example *kalaa- ‘to die’ yields Far Eastern /kalaa-/, Surgut /kaɬ-/, SKh & Nizyam /xat-/, Kazym /xaɬ-/, Obdorsk /xal-/. This is in clear contrast to “secondary” *-A in words such as ‘height’: VVy /peläät/, Tremjugan (Surgut) /peɬiit/ (?), Nizyam /pataat/, Kazym /paɬaat/, Obdorsk /päläät/ — which, again, clearly comes from a longer proto-form, being a derivative from PU *pidə > PKh *peL ‘tall’ (and probably further cognate to also e.g. Fi. pituus : pituude- ‘length’, allowing a PU reconstruction #pidə-(w)Otə).

There seems to be some evidence for a “primary/secondary” distinction to be found in *-AC nominals, too. A good example might be *raɣaam ‘relative’ > Vakh /raɣaam/, but Tremjugan /raɣəm/, WKh /raxəm/; derived from a base verb ‘to approach, be near’ — only attested in WKh, and it could be from PKh *raɣaa- rather than simply *raɣ-.

Even if Proto-Khanty had a contrast between two types of *A-stems, trying to reconstruct this in the original 2nd syllable / 3rd syllable fashion seems like the wrong approach, though. In cases like ‘height’, this would lead to awkward vowel-cluster reconstructions such as **peLəäät. In cases like ‘overhead’, nothing would immediately stand out typologically in reconstructing **koomətaaɣ, but this still has at least one undesirable consequence: we can no longer treat *ə as a purely epenthetic vowel in PKh, inserted to resolve consonant clusters (reconstructions like *waajəɣ ‘animal’ are in fact better taken as phonologically */waajɣ/), and at least some cases would have to be assumed underlying.

I have another hypothesis in mind: the distinction may have been prosodic. 3rd syllable vowels in PU would have originally born secondary stress, and this might have been retained in some form even after the loss of a preceding 2nd syllable.  It’s not clear if an outright iambic stress pattern should be assumed though (*peˈLäät), or if something like a monosyllabic initial stress group followed by secondary stress will suffice (*ˈpeL|ˌäät). In principle it would be also possible to leverage the tenseness distinction, well-attested in initial syllables: *peLäät with tense *-ää, versus *ńeLä with lax *ä? For now, I will notate this distinction as *-À (“primary”, “unstressed”; individually *-a, *-ä) versus *-Á (“secondary”, “stressed”; individually *-aa, *-ää). Regardless of the phonetic specifics, later on *-À would have been generally reduced (*raɣam > /raɣəm/), while *-Á would have remained (*peLäät > /peLäät/).

The stress hypothesis finds some amount of direct confirmation as well: cases of fully iambic second-syllable stress have been reported at least from Eastern Khanty (Far Eastern /peˈläät/, Surgut /peˈɬäät/).

Stress in EKh does not appear to be a direct archaism, however. Per all descriptions I have seen, the attested distribution is purely phonological: stress is primarily initial, except when the 1st syllable contains a lax vowel and the 2nd syllable a tense one. This also rakes in cases of “unstressed” *-À; e.g. Far Eastern /kaˈlaa-/ ‘to die’. This seems like another point in favor of some kind of a more subtle distinction in PKh. I would suppose that in varieties of EKh, *-À was early on partly tensed to merge with *-Á, and could have actually acquired stress only later. Wherever this change failed to take place (including in all varieties of WKh), *-À was then reduced/lost.

In summary

Altogether, I propose the following general chronology for the development of second-syllable vocalism in the Khanty varieties:

  1. The partial merger of *-À and *-Á in Eastern Khanty (with variable conditions); including *-Áj > *-Àj.
  2. The reduction of remaining *-À across all of Khanty; loss of *-əɣ in Kazym and Obdorsk after sonorants.
  3. *-əɣ > *-əj across all of Western Khanty.
  4. *-əj > /-I/ across all of Khanty (with variable conditions); in parallel, *-Aj  > /-A/ in Northern Khanty.

All of these changes are very heavily areal, and do not seem to define any substantial genetic subgroups. The main divisions of Eastern Khanty, the Far Eastern and Surgut groups, would have to be assumed to have split already before step 1 (*kala- > *kalaa- vs. *kal-); the Nizyam / Kazym / Obdorsk dialects of Northern Khanty, already before step 2 (*äärəɣ > *äärəɣ vs. *äär). The split of Nizyam and Southern Khanty could be in principle delayed until step 4 (making Nizyam a “Northernized Southern” rather than a “Southernized Northern” dialect after all), but this seems like a poor idea, even if for now I cannot refute it explicitly.

Areality seems to be further proven by how most parts of this scheme have parallels also in Mansi (e.g. *-əɣ > Northern and Pelymka (Western) Mansi /-iɣ/, Eastern and rest of Western Mansi /-i/; *-A > EMs, WMs -∅). But a detailed look into this will be a task for later.

Further implications

So what can we do with this?

The above analysis leads to at least one more general interesting corollary for Khanty historical phonology. If PKh *À-stems were in the early common Khanty period reduced en masse — then this opens the possibility that several cases could have been lost entirely from the data. Already Sauer notes that all inherited word-final cases of PKh *A-stems seem to occur either following the PKh lax vowels (*e *ö *o *a), or the traditionally reconstructed tense mid ones (*ee *öö *oo). Other cases could have existed as well … we may just be currently unable to directly distinguish them from consonant stems.

There may be, however, indirect evidence to draw such distinctions. The notorious Khanty “ablaut” system (which I am afraid I cannot explain in detail in this post) has for a while now been explained as being instead a partly morphologized system of former umlaut. [8] Per this hypothesis, alternations like EKh (*)ɬɔɔj ‘finger’ ~ (*)ɬuuj ‘thimble’ would continue something like earlier *ɬɔɔj(A) ~ *ɬuuj-(i), either with i-umlaut of *ɔɔ to *uu in the derivative ‘thimble’; or a-umlaut of *uu to *ɔɔ in the base root ‘finger’. I am more inclined to side with the latter (Honti’s view) than with the former (Helimski’s). If close/open ablaut in Khanty is fundamentally based on a-umlaut, the assumed umlaut trigger could be then identified as *-À, and we could then amend ‘finger’ to PKh *ɬɔɔja instead. This in turn also accords fairly well with the PU reconstruction: *suwd₂a (with Samic *čuvðē, Samoyedic *təjå clearly indicating an original *A-stem). By contrast, Helimski’s assumed *I-stems seem to be nowhere supported by actual data: they are simply circularly inserted into proto-forms where a close-grade vowel eventually surfaces.

Perhaps even un-umlauted *ɬuuja is a possibility for PKh. Vowel alternation in many cases occurs only in EKh, not WKh, and I would not dismiss offhand the possibility that this reflects unstressed vowel isoglosses in early common Khanty. In this case we indeed find WKh *ɬuuj (SKh /tüüj/, Kazym /ɬuj/, etc.) and not *ɬɔɔj > **ɬooj. Instead of assuming levelling from ‘thimble’, or from possessed forms (Vakh /luujəm/ ‘my finger’), maybe no umlaut took place here to begin with, and the discrepancy between EKh *ɬɔɔj ~ WKh *ɬuuj goes back to already earlier *ɬuuja ~ *ɬuuj(ə), with some kind of an early conditional loss of *-À in WKh.

Some other cases of “umlaut” might turn out to be illusory entirely. I am on board with the “Helimski school” reanalysis of “Steinitz school” PKh *ee, *öö, *oo as lax open vowels, and PKh *e, *ö, *o as lax close vowels (though I would be content to keep on using the symbols *e, *ö, *o for the latter). However, the associated reanalysis of Steinitz’ lax open *a as close *ï seems unsatisfactory. In most cases, this continues PU open *a; it is also continued as lax open /a/ in most Khanty varieties. Moreover, we can identify numerous instances where this occurs in an *À-stem instead. The clearest evidence are “thematic verbs” such as ‘to die’, where at least in Eastern Khanty the surface alternation is between /oo/ (/kool-/) and /a-aa/ (/kalaa-/). Since Helimski considers *ï to be the i-umlaut counterpart of *a, he ends up proposing the phonetically nonsensical solution that *A-stems would have triggered i-umlaut!

Instead of a back-and-forth development *a > *ï > /a/, purely for the sake of making way for *a > /oo/, I would propose that the rewriting of *ee, *öö, *oo as *ä, *a does not reflect mechanical identity. Rather, the alternation of the sort /oo/ ~ /a-aa/ is again perhaps post-Proto-Khanty entirely. PKh lax *a and *ä were only tensed and raised to /oo/, /ee/ ~ /öö/ when stressed; when unstressed, they were left as is (and not umlauted to anything at all). The first-syllable alternation /oo/ ~ /a-aa/ should be taken back to an earlier stress alternation /á(-ə)/ ~ /a-á/, in turn going back to earlier *á-ə ~ *á-a, through the Eastern Khanty stress retraction shift *-À > *-Á.

Filling up the details on this hypothesis (and possible similar approaches to other ablaut patterns) will need a much closer analysis, though. But ultimately, it may be able to reduce the somewhat sprawling Proto-Khanty vowel system into a more manageable shape.

[1] Infuriatingly, he does not provide any comments on what has motivated the division of the data. There are hints, of course. Much of the “second-tier” data seems to have relatively limited dialect distribution on one or both sides, e.g. only in Northern Mansi, or only in Southern Khanty; or relatively irregular sound correspondences. I get the impression that he considers it likely that some of this data is either unrelated; are parallel loans from some third source; or consists of loans from Khanty to Mansi (or perhaps vice versa). On the other hand, I think even the main part of the data likely contains a number of cases of this kind. Are these oversights, or does he have any actual reasons in mind to consider some initially spotty-looking cases stronger than others?
[2] In their respective C2IFU contributions “Zur Geschichte der Nominalstämme in den ugrischen Sprachen”; “Nominalstämme auf *-a/*-ä im Ostjakischen”.
[3] Bear in mind that Proto-Khanty had a contrast between full and reduced vowels, not in vowel length, and e.g. “long” *aa *uu should be read simply as [ɑ] [u]. “Short” *e is then a reduced vowel, [ə] or [ɪ], and is traditionally indeed transcribed ə in close transcription by fieldworkers on Khanty. Thus, *ej > /iij/ does not involve seemingly unmotivated lengthening, but rather tensing: [jɪ] > [ji].
[4] Published in László Honti’s Festschrift (Ünnepi kötet Honti Lászó tiszteletére). The University of Helsinki library does have a copy, but it’s on loan currently. If by any chance the culprit happens to be reading this, please feel welcome to get in touch with me…
[5] The overall rarity of roots ending in *-Ləɣ in Khanty is not a mystery: it is due to the common (Proto-?) Ob-Ugric metathesis of PU *-lk-, *-sk- > East Uralic *-lɣ-, *-ɬɣ- > OUg *-ɣl-, *-ɣɬ-.
[6] At least two other examples exist of *aa > *ɔɔ before bilabials. 1) ‘Bird cherry’: *jɔɔm in place of expected *jaam, from PU *ďëmə. 2) ‘Hair’: Far Eastern *aawət < *aapət regularly continues PU *ëptə, but other dialects, including Obdorsk, indicate *ɔɔpət. On the other hand, there are counterexamples against assuming a regular change, e.g. *kaam ‘coffin’ (~ Mansi *kaməl), *kaap ‘boat’ (~ Mansi *këëpə), *saam ‘scales’ (~ Mansi *sëëmə, < PU *sëmə).
[7] To be exact, Steinitz and Honti only claim this about tense *üü, *öö, *ɔ̈ɔ̈. PKh reduced *ö has labial reflexes more widely in WKh, including fronted [ɵ] in SKh. However, this is only the case adjacent to velars; elsewhere we see the expected delabialization to *e. I would propose that this development involves “double cheshirization” (and is areally connected to the same in Southern Mansi): *kö > *kʷe, then re-coloring: *kʷe > South [kɵ] (= phonemically /ko/), North /kuu/.
[8] For a starting point, see e.g. E. Helimski (1999): “Umlaut in Diachronie – Ablaut in Synchronie: Urostjakischer Umlaut und ostjakischer Ablaut.” — Diachronie in der synchronen Sprachbeschreibung. Mitteilungen der Societas Uralo-Altaica 21: pp. 39–44.

Tagged with: , , , , , , ,
Posted in Reconstruction

Workflows in historical linguistics

A few too many of my blog posts seem to end up ballooning into mini-articles and consequently spend months if not years languishing in my drafts. Let’s see if I can keep this one brief.

An adage sometime seen in historical linguistics is “classification before reconstruction”. On one level, I agree. But, on a few others, this seems to be often abused as an excuse to skimp on proper rigor.

What this means, in my opinion:

  • It’s not possible to do comprehensive comparative reconstruction work with data from unrelated languages. Reconstruction can only be attempted once we have a reasonable amount of certainty that some particular language family exists at all.

What this does not mean:

  • Classification having to precede work in historical phonology entirely. Realiable classification cannot be done by vague casual eyeballing of data. “A reasonable amount of certainty” for the relatedness of some particular languages requires being able to locate regular sound correspondences within their shared vocabulary (preferrably non-trivial ones, but any regularity is a start). [1] In the absense of regular sound correspondences, all vocabulary comparisons can potentially be suspected to be either coincidental, or loanwords rather than strict cognates.
    In other words: sound correspondences are not reconstructions, in themselves. In the case of binary comparison, this distinction may end up blurred, since it’s possible to kind of put together an initial “trivial reconstruction” by just listing all your correspondences, and giving each of them some kind of a vague phonetic label. [2] If the family has more members, though, the bare sound correspondences typically end up looking more like networks — since sound correspondences are not transitive. If /tʃ/ in language 1 can correspond to /s/ in language 2, and /s/ in language 2 can correspond to /h/ in language 3, this does not automatically guarantee that a correspondence /tʃ/ ~ /h/ between 1 and 3 would be demonstrable, or even expected at all. Perhaps /s/ in language 2 is a merger of two separate proto-phonemes; perhaps these correspondences do continue the same proto-phoneme, but under mutually exclusive conditions; perhaps one of these correspondences indicates loanwords after all and not native vocabulary.
  • Subclassification having to precede reconstruction. On the contrary, it is reconstruction that often allows us to put together arguments in favor of subgroups, by providing a root for our sound correspondences. If we have a correspondence such as t ~ t ~ s ~ s, it’s likely that either the t-group or the s-group has innovated, and constitutes a subgroup. But it is also very possible that the other group has not, and is paraphyletic. Without reconstruction work, this is not resolvable.
  • Reconstruction being unable to inform classification. A reconstruction of the parent of a set of languages might end up coming out closer to some other language, that we may have suspected (but haven’t dared to declare) to be also related. It could even turn out that this language newly under comparison is not only related, but it is indeed a direct descendant of this same proto-language; just a very divergent one! — Or maybe the proto-language turns out to be substantially less similar to the other language being compared, and the earlier suspicion of a relationship evaporates entirely, or has to be reanalyzed as a late loanword layer.
  • Language isolates‘ history being unreconstructible. Internal reconstruction combined with loanword evidence can allow identifying probable sound changes and lexical intrusions just fine… though I suppose it will be unlikely to get especially far with this technique.

A more detailed workflow for historical linguistics, if starting from zero, would therefore look something like the following:

  1. Acquire data; sort out some initial vocabulary comparisons that look promising.
  2. Analyze sound correspondences; use these to look for more comparisons.
  3. Look at the big picture to see if some particular subset of languages should be indeed considered related.
  4. Attempt reconstructing the proto-language.
  5. Use the proto-language POV to clarify the status of issues like problematic etymologies, possible external relatives, or possible subgroups.
  6. Use modified analyses of data to improve the proto-language reconstruction.
  7. Iterate 5 and 6 until you’ve run out of insights to gain from the data.

This could also work as a kind of a typology of how far along research on a particular language family is. To date, I don’t think any language family has yet exhausted stage 7. Most are stuck in limbo somewhere around stage 3; only a few have reached stage 5, and Indo-European might be the only one to have indisputably gone through one cycle of stage 7. Big disputed hypotheses grouping well-accepted families together can probably be divided according to if they’re closer to stage 1 (e.g. Amerind, Nilo-Saharan) or stage 2 (e.g. variations of Nostratic). Smaller disputed hypotheses often seem to be either at stage 2 or stage 4, depending on who you ask (e.g. Altaic). (To which I might reply: if these really are supposed to be already at stage 4, bring on stage 5, please.)

Of course there are many major facets of historical linguistics still missing here. We also want to account for typology at some points, morphology too at others, semantics three, periodically research loanwords and then weed them out of the proto-language, maybe entertain some substrate hypotheses.

[1] Some people will claim that vocabulary is strictly optional and you can show relatedness solely on the basis of grammar. I am skeptical; but if this were to be the case — then the implication is that we will not be doing any lexical reconstruction work at any point at all.
[2] Maybe with subscripts to disambiguate overlapping sets if you’d prefer, but anything goes in principle. If your heart desires to see more wingdings in linguistics papers, there is nothing formally wrong in re-labeling a t ~ tʰ correspondence as *☕.

Tagged with: , ,
Posted in Methodology

Consonant clusters growing, wilting and syllabic

From a Uralicist perspective, one thing that I find goes underappreciated in Indo-European studies is the extensive phonotactic complexity of most IE languages. Certain types of studies on PIE consonant clusters can be found these days in abundance, yes… but these mostly focus on the resolution of the most extreme things that the morphology of PIE, with its abundant zero-grade morphemes, can come up with: monstrosities like *HHR-, *CRH-, *RHC-, *-CHCR-. The fate of the more common, though still remarkable on a worldwide scale, consonant clusters like *bʰl-, *sp-, *tw-, *-zd-, *-ktj- appears to be considered basically trivial. (I am open for reading suggestions, though: IE studies is a big field and I expect I am still missing out on many specifics.)

Within Europe, at least the fate of simple two-consonant initial clusters really is at least mostly trivial, though. The Germanic and Balto-Slavic languages retain most PIE initial clusters fairly well, incidental changes in the individual consonants aside (as in *tw- > English thw-, Lithuanian tv-). Latin and Greek are not far behind, though they mostly get rid of *sR clusters (as in e.g. slime ~ līmus; snow ~ nix). We would have to look at Albanian and the more eastern languages (Armenian, modern Indo-Iranian) before seeing major cluster simplification or transformation trends. As for Celtic, Tocharian and Anatolian, I can’t say I have much of a handle on the big picture at all… which is one reason why a detailed overview of phonotactics issues in the IE languages, either from the perspective of particular classes of clusters or particular languages’ overall histories, would sound appealing to me.

To be fair, it’s not as if this kind of a thing has been done much in Uralic studies either. There have been a few phonotactic analyses of the cluster stock in various reconstructed proto-languages, though with naïvely synchronic methodology. From a more firmly diachronic angle, a few interesting topics that may require more detailed investigation could be

  • the nearly complete cluster simplification trends in Permic, Hungarian and Enets, transforming the inherited *(C)V(C)CV root structure into roughly √(C)V(C)(V). To a lesser extent similar things happen also in e.g. Mari and Proto-Samoyedic.
  • the rise of numerous complex clusters in Mordvinic, e.g. in initial position, Erzya kši ‘bread’, kšna ‘strap’, pśkiźems ‘to have diarrhea’, promo ‘gadfly’. This seems to run a bit too deep-set to be blamed just on late Russian influence: the first two are earlier Baltic or Balto-Slavic loanwords (~ Fi. kyrsä ‘loaf’, hihna ‘strap’), the last two native Uralic (~ Fi. paskoa ‘to shit’, paarma ‘gadfly’).
  • the slightly less daunting but still strong expansion of consonant cluster complexity in Finnic (as I’ve briefly covered before) and Samic, probably mainly due to Indo-European loanwords.

But back to IE, for a few scattered observations.

At least one of the initial consonant clusters reconstructed for Proto-Indo-European is an exception of sorts to any retention tendencies, even from an European perspective. This is *sr-: the cluster is alien to most European languages today, even ones that may otherwise allow sibilant+/r/: English shr-, German schr- from earlier *skr-. (The Slavic languages do have newly created examples though, generated after syncope; e.g. Polish srebro ‘silver’ < *sьrebro.) Given the wide palette of word-initial clusters of the type CR- and even sTR- tolerated in IE languages, this is a notable hole in the system.

In Greek *sr- is simplified the usual way, through *s-aspiration, yielding word-initial ῥ- /rʰ/. Elsewhere, however, special developments seem to kick in.

Germanic and Balto-Slavic share here a non-trivial isogloss: *sr (of any position) is resolved by epenthesis of *t, generating correspondences such as stream, Latvian straume, Polish strumień ~ Greek ῥεῦμα (< *srew-m-os, *srew-m-eh₂). The change has however not reached standard Lithuanian, which still has e.g. sraumuo; [1] therefore showing that this is a relatively late diffused sound change, not a data point in favor of a Germano-Balto-Slavic proto-dialect. Perhaps even one that has been innovated multiple times in parallel: homorganic stop epenthesis in clusters of continuant+glide is commonplace after all (æmyrge > *emrə > ember in English surely requires no especial connection with hominem > *homre > hombre in Spanish), and while the phonetic development is less trivial here, the prior existence of *str- has probably helped to motivate *t-epenthesis.

This sound change likely also accounts for the intrusive -t- in ‘sister’ in Germanic (sister etc.) and the relevant parts of Balto-Slavic (OCS сестра, Old Prussian swestro, but again, Lithuanian sesuo; and as I’m looking these up, I am also learning that Latvian has apparently lost this word entirely!). This was probably generalized from the genitive, *swesrés or *susrés. Some degree of analogical support from the mother, father, brother, daughter group surely has played a part as well, but I would think the fact that this only occurs in languages that also show *sr > *str as a general sound change is not a coincidence.

This development also seems to have interesting interaction with the PIE syllabic consonants. Some time ago I ran across a small article by Krzysztof Witczak (1991), “Indo-European *sr̥C in Germanic“, which proposes that this epenthesis also took place before syllabic *r̥. The evidence is scarce but looks believable. Interestingly, this then demonstrates that at some point an actual syllabic [r̩] must have indeed occurred in Germanic (contra some of my earlier suspicions that some kind of an epenthetic schwa might have been hanging around all along in here).

Also, returning to ‘sister’: while I have no ready means to see if this checks out in the other older Germanic languages, Wiktionary actually gives a PGmc genitive *swesturz > Gothic swistrs, which looks more like pre-Gmc *swesr̥s.

Even more interestingly, there seems to be some evidence for similar business also in Baltic.

The word for ‘roe deer’ in Latv. and Lith. is stirna, corresponding to Slavic *sьrna. These look like derivatives from the ‘horn’ root, *ḱer(h₂)-, or in particular the derivative *ḱr̥(h₂)nos, as reflected also in e.g. Germanic horn. Derksen’s etymological dictionary of Baltic (2015) has no comment other than that “the anlaut is problematic”… I suspect however that the Baltic words could be explained by a development *šr̥ > *str̥, taking place before the breaking *r̥ > *ir. [2] This all will also have to be later than *ḱ > *š, but this is already assured to be quite early by the evidence of loanwords in Finnic.

On the other hand, there are more than enough other words, even derivatives from this same root, that show no such epenthesis, e.g. Old Prussian sirwis ‘roe deer’ < *šr̥wis (whence also Fi. hirvi ‘elk’); Latvian sirsenis, Lithuanian širšė ‘hornet’ < *šr̥Hšō (whence also Fi. herhiläinen). To get around this issue, we would probably need to assume either dialect mixture of some kind — as will be already required to explain why we have *t-epenthesis now showing up in Lithuanian also. An irregular shift from *šr̥nos to *sr̥nos might also work. (Or as long as I’m fucking around with relative chronology, even the regular shift of *š to *s in Latvian?)

This is moreover complicated by how all these words must be, to some degree, analogical anyway. The reason for this is “Weise’s Law”: [3] the neutralization of *Ḱr- and *Kʷr- as *Kr-, common to all Satem languages. We would again not expect this to distinguish between syllabic *r̥ and non-syllabic *r, and apparently the Sanskrit data indeed confirms this. Thus Balto-Slavic *šr̥nas and other such derivatives (including, from Sanskrit, śiraḥ ‘top’ < *ćr̥Has) would have to be assumed to get their palatal onset by analogy with the abundant other derivatives of *ḱer(h₂)-. So… another possibility is then that stirna is the earliest word where *ḱ > *š was restored in this way, followed by epenthesis, followed by the remaining cases of analogical *š-restoration.

Or maybe this is all barking down the wrong root entirely. Something that also looks worth further investigation is if the Baltic words for ‘roe deer’ might be actually rather cognate with German Stirn?


A different angle on getting rid of *sr- is exhibited in Italo-Celtic: > *θr- > fr-, reflected at least in Brythonic (e.g. Welsh ffrwd ‘stream’) and in Latin (the best examples seem to be word-medial and have an expected further development to -br-, e.g. crābrō < *kr̥Hsrō 'hornet'). Irish has what looks like retained sr- (e.g. sruth ‘stream’). Schrijver proposes that this is a reversal from the *θr stage, [4] but given the situation in Baltic, I would not bet on it. Note that reversal in Lithuanian is clearly not possible, since inherited *str- remains. Again, it seems plausible that the first stages of the Goidelic/Brythonic split go far back enough that the latter could have still participated in common developments with Italic.

Irish also seems to have a general shift *st- > s- (ser ‘star’, sab ‘staff’, etc.), so actually even an earlier development of the Germanic-Balto-Slavic flavor is theoretically possible.

A quick scan-over of IE etymological sources at my disposal reveals no special developments of *sr̥- in Celtic or Latin. LIV has two Latin examples that seem to have retained s-: sariō ‘I hoe’ < *sr̥h₃yé-, sarciō ‘I mend’ < *sr̥kyé-. Witczak's article gives Latin fariō ‘salmon trout’, compared with the Germanic sturgeon word family and derived from *sr̥Hyón-; but this also seems to come from Old Latin sariō, thus aligning with the previous group. That these all have -ar- rather than the usual -or- as the reflex of *r̥ however probably indicates a relatively early epenthesis of *ə > *a. Schriver reconstructs a rule *CCCC- > *CaCCC- being already common Italo-Celtic (argued in full in The Reflexes of the Proto-Indo-European Laryngeals in Latin).


At any rate, the moral is that simplifications or epentheses in consonant clusters of the shape *CR might make a more general opening for investigating the history of the PIE syllabic sonorants.

I’ve another example as well, though probably less illustrative. Sticking still to the European languages, there is perhaps something to be made of PIE *Tl-. Word-initially this was a rare cluster, but one established example is *dl̥h₁gʰos ‘long’ (> e.g. Slavic *dьlgъ, Greek δολιχός, Sanskrit dīrgha-). Now, the Baltic languages are known to have word-medially eliminated *-tl-, *-dl- by dissimilation to *-kl-, *-gl-. So would we find a similar initial development here?

We do not; but we do find something unusual: wholesale loss of the initial consonant, resulting in Lith. ilgas, Latv. ilgs! Perhaps this could be again explained by assuming word-initial *Tl-, *Tl̥- > *l-, *l̥-, already before *l̥ > *il? A previously known case with non-syllabic *Tl- is Lith. lokys, Latv. lācis ‘bear’ ~ Old Prussian clokis ‘bear’ (which would then show that this simplification is Eastern Baltic specifically). Unfortunately, there are again also several counterexamples with *Tl̥- > *Til-, e.g. Lith. tiltas, Latv. tilts ‘bridge’ < *tl̥h₂tós. Go figure…

[0] This post has been prompted by me resuming work for a little while on constructing a reference table on the fate of PIE consonant clusters on Wikipedia.
[1] Jānis Endzelīns (1973), Comparative Phonology and Morphology of the Baltic Languages: 73 informs that other dialects of Lithuanian, however, do have this change, and so we can also rule out this as a datapoint in favor of a Latvian-Slavic grouping (as has sometimes been suggested). Interestingly even Old Prussian has this epenthesis, so this all could instead testify for the Latvian-Lithuanian split, maybe even some of the inter-Lithuanian dialect splits, going quite a while back. — Most evidence I’ve seen in favor of the East Baltic group in fact looks quite easy to reinterpret as more or less areal: e.g. the sound change bundle *ai > *ei > *ē > ie is basically trivial, and has parallels in most neighboring languages (the first in Slavic, Scandinavian and core Finnic; the second in Swedish and Livonian, as well as Slavic in a different form; the last in Western Slavic and in most of Finnic).
[2] I’m not going to start probing the issue, but a sound change or two along the lines of *št > *st might also help in explaining the famously inconsistent application of RUKI in Baltic; e.g. Lith. pisti (not ˣpišti) ‘fucks’ ← PIE √peis- ‘to crush, push’.
— It also just now occurs to me that western Uralic *pisə- ‘to put, stick (in)’ (Samic, Finnic, Mordvinic, Mari) is probably derived from this last-mentioned IE root. This contrasts with widespread native Uralic counterparts: #pënə- ‘to put’ (absent only from Samic and Hungarian), #texə- (maybe *tejwä-??) ‘to push’ (F, P, Hu, Ms, Kh), *puskə- ‘to poke’ (S, F, Ms, Kh), which is usually a good indication for an innovation of some sort.
[3] An old idea, but only recently named and reviewed by Kloekhorst. — I would suggest though that his group of six counterexamples involving derivatives of the type *CeḰ-ro- should not be accounted by “phonetically regular analogy”: they might rather indicate Weise’s Law applying only to syllable-initial palatovelars (*Ḱr-, *-Ḱr̥-) but not to syllable-final ones (*-Ḱ.r-). This would also cover his three counterexamples of the shape *CeḰ-ru-, in which case there is then no need to date the law as any older than common Satemic.
[4] Schrijver, Peter (2015): “Pruners and trainers of the Celtic family tree“.

Tagged with: , , , , ,
Posted in Reconstruction

Assibilation in Finnic iteratives

With the assibilation *ti > *ci > si being one of the best-known innovations in Finnic, one would think it would have been researched to exhaustion long since. But there still seem to be new discoveries available.

The best-known examples of assibilation are paradigmatic alternations in inflection, either in nominals (e.g. Fi. kaksi : stem kahte- ‘2’) or verbs (tietä- : imperfect stem tiesi- ‘to know’); and instances affecting the overall shape of a word root (sinä ‘2PS’ < *tinä, silta ‘bridge’ < *tilta, asia ‘thing’ < *atja) or a suffix (kala-si  ‘your fish’ < *kala-ti). However, cases in word derivation such that a morpheme boundary originally occurred between *-t- and *-i- seem to have been left with less attention.

One morphological category where we could suspect previously understudied examples of assibilation hanging around are iterative verbs in -i-. That assibilation can take place in these is not news per se: at least one clear example has been known for long, namely sortaa ‘to break down, oppress’ → *sorta-j- > *sorti- > *sorci- > sorsia ‘to tease’. This appears to be the only example in modern Finnish where an underived and unassibilated verb stem still clearly survives alongside an assibilated one, though.

A bit more common are examples derived from nominal roots ending in -si : -te-. Here it is possible to however consider later derivation from the nominative singular or from the plural stem (uusi(-) + -i-uusi-), instead of Proto-Finnic derivation from the oblique stem (*uutə-j- > *uuti- > *uuci- > uusi-). At least the first two verbs seems to have quite limited dialect distribution, and so are probably not independent examples of assibilation.

  • kirsi ‘frost’ → kirsiä ‘to soften when thawing (of the ground)’
  • korsi ‘culm’ → N. Krl. koršie ‘to grow longer (of grain)’
  • kynsi ‘nail’ → kynsiä ‘to scratch’
  • niisi ‘heddle’ → niisiä ‘to thread warps through the heddle’
  • uusi ‘new’ → uusia ‘to renew’

At other times, assibilation is identifiable only by comparison with distant relatives or parallel derivatives. Three likely and one further possible example are found in modern Finnish (all involved etymological connections already appear in earlier literature, though they have not necessarily been explained through *-ti- > -si-):

  • jyrsiä ‘to gnaw’: likely < *jürci- < *jürtä-j-, from unattested *jürtä-, in turn segmentable as a causative *jür-tä-. Known cognates elsewhere in Uralic (Permic *jɨrɨ-, Mansi *jär-; both likewise ‘to gnaw’) suggest that the basic root was simply *jürə-.
  • kursia ‘to stitch together’: perhaps similarly < *kurci- < *kur-ta-j-, derived from the same root as kuroa ‘to stitch together, to stretch together’; perhaps an applicative derivative = *kur-o-. The basic root *kurə- has known cognates in Samic *korë-, Samoyedic *kur-å- (where *-å- must be a derivative element, per the mismatch with Samic and the absense of the regular sound change *u-a > *ə-å). [1]
  • suosia ‘to favor’: likely < *sooci- < *soota-j- ← unattested *soo-ta- ← *soo- (> suo-) ‘to grant, to provide’.
  • talsia ‘to walk slowly’: appears to be likely related to tallata ‘to tread’. However, assuming a common root *talta- has the problem that the latter verb shows unvarying -ll-, e.g. Veps tallata (not ˣtaldata). To uphold this connection, it would seem to be necessary to assume generalization of the weak grade -ll- somewhere in the western Finnic area, followed by diffusion of the newly reformed verb to the rest of the family. Also, we would actually expect *talta-j- > **taltoi-! Some kind of analogical formation therefore seems more likely than soundlawful Proto-Finnic development.

From Karelian I can additionally find viršie (Northern) ‘to dawdle’. If from *vir-tä-j-, this might be connectable with viruo (~ Fi. virua, etc.) ‘to lay about, be sick’.

A relatively similar scenario could be moreover crafted for Krl. polzie (Southern) ‘to crawl’, which seems in theory derivable from polvi ‘knee’; a Proto-Finnic intermediate derivative *polwə-ta- > *polw-ta- *polta- ‘to kneel’ would need to be posited. However, this is much more straightforwardly explainable as a loanword from Russian ползать ‘to crawl’… [2] and so what we gain here instead is a reminder about the unreliability of etymological connections built on multi-stage derivational assumptions.

A common thread in these examples however seems to emerge, which I think provides some extra backing for reconstructing unattested “intermediate” verb stems such as *jürtä-, *virtä- (your call if this is actually decisive). This is an avoidance of verbs of the shape **CVRi-, especially from base roots of the shape *CVRə-, [3] upheld by deriving the iterative instead from a causative or pseudo-causative extended stem, formed by the common verbal suffix *-tA-. I have no idea what motivation this constraint could have behind it, though.


I think there is also one other larger category of iteratives that show assibilation. These are verbs formed with a suffix -(e)ksi-, predominantly from basic intransitive verbs:

  • haave ‘daydream’ → haaveksia ‘to daydream’
  • imeäimeksiä ‘to suck’
  • istuaistuksia ‘to sit (around)’
  • kantaakanneksia ‘to carry’
  • kulkea ‘to go’ → kuljeksia ‘to walk about’
  • kustakuseksia ‘to piss’
  • lukealueksia ‘to read’
  • niellänieleksiä ‘to swallow’
  • nuollanuoleksia ‘to lick’
  • olla ‘to be’ → oleksia ‘to stay at’
  • pierräpiereksiä ‘to fart’
  • piilläpiileksiä ‘to hide’
  • purra ‘to bite’ → pureksia ‘to chew’
  • ripistä ‘(of rain or raindrops) to make noise’ → ripeksiä ‘to rain lightly, drizzle’
  • seisoa ‘to stand’ → seisoksia ‘to stand around’
  • surra ‘to mourn’ → sureksia ‘to be sad’
  • sylkeäsyljeksiä ‘to spit’
  • tunkea ‘to cram’ → tungeksia ‘to crowd, throng’
  • töpätä ‘to make a small mistake, hit a snaggle’ → töpeksiä ‘to make a lousy job at smth’
  • uni ‘dream’ → uneksia ‘to dream’
  • vuollavuoleksia ‘to whittle’

Many of these seem to have developed a more durative than iterative meaning, but at least verbs like kuseksia, nieleksiä, pureksia, syljeksiä clearly refer to iterated actions. It’s also worth noting that again, none of these verbs have simpler -i-iteratives such as ˣimiä, ˣkusia, ˣnuolia, ˣsuria

I also think that this group needs to be separated from a distinct group of “sensive” verbs, mostly derived from adjectives, indicating considering something similar to the base word. Unlike the above, these are transitive verbs coexisting with synonymous verbs ending in -(e)ksU-:

  • halpa ‘cheap’ → halveksia ~ halveksua ‘to look down on smth’
  • hylätä ‘to discard’ → hyljeksiä ~ hyljeksyä ‘to shun smth’
  • kumma ‘odd’ → kummeksia ~ kummeksua ‘to wonder, be puzzled over smth’
  • nyreä ‘grumpy’ → nyreksiä ~ nyreksyä ‘to be picky over smth, accept smth grudgingly’
  • paha ‘bad’ → paheksia ~ paheksua ‘to disapprove of smth’
  • vähä ‘few, small’ → väheksiä ~ väheksyä ‘to belittle smth’

Hakulinen in SKRK notes the difference as well, though drawing the separating line mainly on the basis of if the verbs in question are derived from verbs or from nominals (thus placing e.g. haaveksia and uneksia instead in the 2nd group). He suggests that the second group might be built on the transitive case, ending in -ksi (probably correct), while the first group might be built on the denominal suffix -s : -kse- seen in e.g. kutoa ‘to weave’ → kudos : kudokse- ‘weave, textile’.

What I find more promising is the possibility of deriving the first group’s compound suffix -ksi- instead from Proto-Finnic *-kci- < earlier *-kti- < *-ktA-j-, where *-ktA- is the preform of the common causative-transitive verb suffix -ttA-. In many cases we can indeed still locate such a derivative alongside -ksi-iteratives:

  • imeksiä < ? *imektä-j- ~ imettää < ? *imektä- ‘to suckle’
  • istuksia < ? *istukta-j- ~ istuttaa < ? *istukta- ‘to sit someone down; to plant’
  • kanneksia < ? *kandëkta-j- ~ kannattaa < ? *kandakta- ‘to support, hold up’
  • kuljeksia < ? *kulgëkta-j- ~ kuljettaa < ? *kulgëkta- ‘to transport’
  • kuseksia < ? *kusëkta-j- ~ kusettaa < ? *kusëkta- ‘to feel like peeing, cause urination’
  • lueksia < ? *lugëkta-j- ~ luettaa < ? *lugëkta- ‘to make someone read smth’
  • oleksia < ? *olëkta-j- ~ olettaa < ? *olëkta- ‘to assume’
  • piereksiä < ? *peerektä-j- ~ pierettää < ? *peerektä- ‘to feel like farting, cause flatulence’
  • seisoksia < ? *saisokta-j- ~ seisottaa < ? *saisokta- ‘to make smth stand’
  • sureksia < ? *surëkta-j- ~ surettaa < ? *surëkta- ‘to make/be sad’
  • syljeksiä < ? *sülgektä-j- ~ syljettää < ? *sülgektä- ‘to feel like spitting, cause excess salivation’
  • uneksia < ? *unëkta-j- ~ unettaa < ? *unëkta- ‘to make/be sleepy’

Since I am basically working here with the internal reconstruction of Finnish, rather than from properly comparative Finnic data, there is of course the risk that some of these verbs may have been derived secondarily, as simply root+ksi-. One particularly good candidate might be Fi. surra and its derivatives. These have taken on the meaning ‘to mourn, be sad’ secondarily from suru ~ surku ‘sadness’, which is a loan from Scandinavian (Old Norse sorg). The original meaning, preserved in e.g. Es. surema, is instead ‘to die’ — and we definitely do not expect a verb of this meaning to have had any original iterative (habitual, frequentative…) derivatives. Regardless, the existence of this general pattern at all seems like sufficient evidence to conclude that at least some examples here probably date to Proto-Finnic times already. I would bet in particular on the “secretion verb” group (kuseksia, piereksiä, syljeksiä) and the “consumption verb” group (imeksiä, nieleksiä, nuoleksia, pureksia), both of which are entirely built on common Uralic primary verb roots.

This etymology for the suffix -ksi- also has one interesting implication: it confirms that Finnic -ttA- indeed derives from earlier *-ktA- (as continued also in Samic *-ktē-, Mari *-kte-, Permic *-ektɨ-) and not from earlier *-ptA- (as continued in Khanty *-ptə-, Samoyedic *-ptA-). The representation in Mordvinic (*-ftə-), Hungarian (-t-) and perhaps Mansi (*-t-) remains ambiguous though, and hence it is unclear to me which form(s) of this suffix represent the original Proto-Uralic situation.

[1] Samic *koarō- ‘to sew’ also seems related somehow. If *kurə- were from earlier *korə-, we could consider the possibility that the sound change *oCə > *uCə was later than *-əw- > *-o- (*korə- : *korə-w- > *kurə- : *koro-), but the Samoyedic cognate with *-u- seems to render this impossible.
[2] As pointed out to me by Niklas Metsäranta.
[3] Note that iteratives or similar derivatives based on roots of the shape CVRA — e.g. Fi. kerä ‘ball of twine’ → keriä ‘to roll up’; pesä ‘nest’ → pesiä ‘to nest’ — would have still been diphthong stems such as *kerei-, *pesei- in Proto-Finnic.

Tagged with: , , , , ,
Posted in Etymology

Etymology squib: Pyytää (and a tangent on Mansi velars)

The Finnic verb root *püütä- (Fi. pyytää, etc.) has two distinct senses: ‘to ask for’ on one hand, ‘to hunt’ on the other. These could plausibly be considered connected, with the former as the original sense, the latter developing as an euphemism. At least the former sense also clearly seems to derive as a loanword from Germanic *beudan- ‘to offer’; most likely relatively late from a form such as Old Swedish biūþa.

A competing etymology also exists: that ‘to hunt’ would be instead a derivative *püü-tä-. This finds immediate support within Finnic from two directions. The first is the existence of what look like parallel derivatives, e.g. Finnish pyynti (? < *püü-ntei) ‘hunt’, Estonian püük (? < *püü-kkV) ‘hunt’. Second is the fact that the sense ‘to ask’ shows a somewhat limited distribution, being found only in a number of the more Scandinavian-influenced varieties: Finnish, Karelian, Estonian and Kukkuzi Votic/Ingrian [1]. The more marginal Ludian and Veps, as well as also both mainstream Ingrian and Votic, only know the sense ‘to hunt’.

Sources such as SSA actually suggest a compromise of sorts between these two approaches; according to this, *püütä- would be across the board an original Proto-Finnic verb meaning ‘to hunt’, and only the meaning ‘to ask’ would have developed by Scandinavian influence. This would allow a much earlier date of contact, though I’m not sure what exact benefits this assumption is supposed to have… Even relatively new Swedish loanwords have relatively often reached Karelian through Finnish, and loanwords homonymous with native vocabulary are by no means an unknown phenomenon.

A derivational etymology of course implies an original shorter root *püü. The meaning of this is not immediately obvious, though. SSA refers to a suggestion that this would be = *püü (Fi. pyy etc.) ‘hazelhen’; hence the verb *püü-tä- would have originally meant specificially ‘to hunt for hazelhen’, only later being generalized to ‘hunt’. On the other hand: Fi. pyynti suggests that the original root was actually a verb, since -nti regularly only forms names of actions (e.g. tuo- ‘to bring’ → tuonti ‘bringing, import’; syö- ‘to eat’ → syönti ‘eating’). I would therefore posit something like *püü- ‘to hunt (intransitive)’, *püü-tä- ‘to hunt (transitive)’.


This so far Finnic-internal reconstruction turns out to have connections in Ugric. A verb root *puŋV- has been known for long, reconstructed on the basis of Hungarian fog- ‘to grasp, to catch’ ~ Mansi *puw- ‘id.’ (the lenition *ŋ > *w in the latter may be regular; there does not seem to be inherited vocabulary in Mansi with *-uŋk-). While an original back vowel *u would be troublesome, there is however a natural explanation. As explored in my previous post, several branches of Uralic show evidence for a backing development of Proto-Uralic *ü in the vicinity of velar consonants. This seems to be the case here as well. Finnic *püü-, as uncovered above, therefore suggests that a better reconstruction will be PU *püŋə-.

This yields all reflexes involved quite regularly. *püŋə- > Hungarian fog- has an exact parallel in *püŋə > fogoly ‘hazelhen’, and there is also the rather similar *piŋə > fog ‘tooth’ (although my previous reservations on not fully understanding the intermediate phonetics of this development still apply). In Mansi, only *ü seems to have been subject to this backing: contrast *päŋk ‘tooth’. *püŋə- > *puw- does not have exact equivalents, but Steinitz’ example of *pükkV-nV > *pukńi ‘navel’ remains a decent parallel. In a small article on the topic, [2] he also cites Northern Mansi /puki/ ‘belly’ ~ Khanty *pökii ‘bird’s crop’. To me it looks like these could perhaps be from a common root with ‘navel’ (*pükkV-j?). UEW gives instead Finno-Permic cognates pointing to *päkkä, but the irregular vowel correspondence leaves me doubtful. [3]

The similarity between Finnic *püü ‘hazelhen’ and *püütä- ‘to hunt’ does not have to be accidental, though. It might be worth asking if the derivational relationship has instead been the opposite: if PU *püŋə ‘hazelhen’ had rather been derived from *püŋə- ‘to hunt’? This might be further supportable by how many of the reflexes show later suffixation, e.g. Samic *pëŋkōj; Hungarian fogoly; Moksha /povńä/; Livonian pīki (= Es. püük ‘hunt’, as mentioned above?). Selkup /pee-/ ‘to look for’ : /peekä/ ‘hazelhen’ seems particulary interesting (at least as a semantic parallel — I hesitate to claim that this, together with its other Samoyedic cognates, would derive from *püŋə- at all, since the vowel developments would be highly irregular [4]). The underived appearence of Finnish pyy, Estonian püü etc. on one hand, Khanty *peŋk on the other, could then end up being a kind of a backformation from earlier compound terms, facilitated by the loss of the bare verbal root.


There is a chronological issue with the Mansi data, though. A form such as /puki/ ‘belly’ clearly cannot be taken back to conventional Proto-Mansi *puki: we would expect the usual development *k > [q] > /χ/ to kick in (compare e.g. *taŋk > /toŋχ/ ‘hoof’). For Northern Mansi in particular, it might be feasible to assume similar relatively late backing as in /puŋk/ ‘tooth’, but this then fails to explain the non-Northern reflexes (e.g. West /püxəń/ ‘navel’).

I also have already earlier argued against the traditional reconstruction of Proto-Mansi *ü. Instead of setting up here a marginal Proto-Mansi *ü after all, which occurred only in the context /p_k/, I have a different suggestion: it will be possible to reconstruct here plain *u for Proto-Mansi — if we assume that the contrast *k : *q had already been phonemicized! While many overviews of the velar backness split in Ugric assume that it was only phonemicized by the development *q > /χ/ (in Northern Mansi, most of Eastern Mansi, all of Northern and Southern Khanty, and in pre-Hungarian), the detailed field records still faithfully and consistently transcribe = /q/ for most of the other Ob-Ugric varieties as well. Actual reference grammars, as opposed to historically-minded works, often recognize the uvulars and velars as distinct phonemes as well. [5]

I would thus set up the following develoment:

  • Pre-Mansi (“Proto-Ugric”) *ku, *uk > *qu, *uq (> North /χu/, /uχ/)
  • Pre-Mansi *kü, *ük > *ku, *uk (> North /ku/, /uk/) (after the lowering of primary PU *ü!)

Later on, then, in Western and Eastern Mansi, a back-development *ku, *uk > /kü, ük/ takes place, completing a kind of “cheshirization cycle”, further cemented by *q > /k/ in a few Western dialects (e.g. Pelymka).

Steinitz’ Geschichte des wogulischen Vokalismus (Berlin, 1955) already lists a few examples that show what I mark here as *ku-, as distinct from *qu-. One is Northern /kurɣ-/ ~ Western /kürr-/ ~ Eastern /körɣ-/ ‘to growl’ < *kurɣ-. Further examples occur in loanwords, such as N /kuľ/ ~ E /köľ/ ‘devil’ (← Komi /kuľ/).

Most such words do not seem to have been attested in Southern Mansi, though. If we followed the usual (and also geographically reasonable) assumption that Southern has been the first dialect area to split away, it seems that “disharmonic” *ku- is in most cases only reconstructible for Core Mansi, not Proto-Mansi proper. In native vocabulary, only the marginal example of *puk- from earlier *pükk- seems to be found.

The most important benefit of this reanalysis, however, is that the marginal contrast *k : *q does not need to be limited to the root type *pükk- > *puk-. It will be possible to explore also other similar contrasts, such as *koo : *qoo (> Core Mansi *kuu : *quu). These seem likely explain a variety of rare or seemingly irregular vowel correspondences between the Mansi dialects: e.g. N /kuur/ : W /küür/ ‘oven’, a loanword from Komi /gor/ ‘id.’ More on this later, though…

[1] Considered either Ingrianized Votic or Voticized Ingrian, depending on who you ask. I would lean on the second, but the last word on the topic has probably not been said yet. — ‘To ask’ is in here most likely a loan from Ingrian Finnish though, so the question does not matter for today’s purposes.
[2] Steinitz, Wolfgang. 1956. “Zur ob-ugrischen Vokalgeschichte”. — Ural-Altaische Jahrbücher 28: 241–247.
[3] There also seem to be compareable words in neighboring families, e.g. Evenki /hiken/ ‘sternum’; /hukēn/ ‘crop’ (the latter with further Tungusic cognates). Since these still show h- < *f- < *p-, any possible connection would have to go quite far back, though.
[4] Janhunen in Samojedischer Wortschatz fails to reconstruct a single PSmy proto-form, giving instead three variants: *pü- (Nenets), *pö- (Nganasan), *pä- (others) (in his reconstruction: *pe-).
[5] For example, a phonemic contrast /k/ : /q/ is explicitly presented for Surgut Khanty in Márta Csepregi’s recent reference grammar Szurguti osztják chrestomathia (Szeged, 1998).

Tagged with: , , , , ,
Posted in Etymology

Etymology squib: Moknams

Reading old source literature is often dreary kind of work, but it has its occasional rewards: you might find out that some problem you’ve been dwelling on has actually long since received a solution, or at least a sketch to one. Tonight comes my way an observation by Wolfgang Steinitz; originally from his Geschichte des finnisch-ugrischen Vokalismus (1943: 26–27), but properly brought to my attention by a footnote in his slightly later Geschichte des ostjakischen Vokalismus (1950). I have mostly read the former already, but I guess cursorily enough to have missed things here and there.

The point in question is a small detail on the development of vocalism of the Mordvinic languages. While the history of vocalism in the Uralic languages is complicated enough to fill a couple shelf-meters of literature, original vowel frontness is usually well retained; at least in those branches that show at least some degree of vowel harmony. However, in Mordvinic there are a number of cases where a back vowel /o/ turns up as the reflex of what looks like an original front vowel (*i, *ü, *e, *ä). What Steinitz notes at this point is that, while *i and *ü normally merge in Mordvinic (> *ɪ > /e/), before a velar consonant we instead find *ü merging with *u (> *ʊ > /o/). This would be phonetically reasonable enough, and also indeed seems to check out on closer inspection of the etymological data. Additionally worth remarking is that even Erkki Itkonen seems to accept this rule in his generally anti-Steinitzian megapaper “Zur Frage nach der Entwicklung des Vokalismus der ersten Silbe in den finnisch-ugrische Sprachen, insbesondere im Mordwinischen” (1946: 300–301).

While there are no substantial counterexamples (see below for some comments on some possible cases), we’re still running a bit low on evidence though. Steinitz only gives four examples, of which only two cases are indisputably reconstructible PU roots:

  • /śokś/ ‘autumn’, from PU *sükśə (> Fi. syksy, Hu. ősz etc.)
  • *poŋə > Erzya /povo/, Moksha /pova/ ‘hazelhen’, from PU *püŋə (> Fi. pyy, Hu. fogoly [1], etc.)
  • Moksha /ćoŋga/ ‘hill’, from PU *ćüŋkV (> e.g. Es. süng, recorded from but one dialect; Northern Khanty /śŭŋk/.)

His fourth example is Moksha /pokəń/ ‘navel’. The reconstruction of *ü seems less certain here — the only other cognates are found in Ob-Ugric, and while Khanty *pö̆kəɳ ~ pö̆kɭəŋ points to *ü clearly enough, Northern Mansi /pukńi/ ~ Southern /püxńi/ perhaps looks the most like PMs *u. On the other hand, I suppose that the unusual correspondence between Mk. /ń/ and Khanty /ɳ/ will be more understandable if we indeed were to reconstruct *pükkVn(V), and date the merger *ü > *u (*ʏ > *ʊ?) as later than the general palatalization of *n, *l, *r in front-vocalic words Mordvinic.

We are going on 2017 however, not the 1940s, and etymology marches on. Would we happen to have discovered any non-low-hanging fruit over the last 75 years, that could in principle support or contradict Steinitz’ mini-rule? The answer is — yes: one promising recently reconstructed word root is PU (or Finno-Permic, if you’re counting) *mükkä ‘mute, muttering’, due to Janne Saarikivi in his 2007 paper “Uusia vanhoja sanoja“; based on Finnic, Samic, Permic and Mari evidence. And firing up next Heikki Paasonen’s dialect materials on Mordvinic: yep, there we have it: Moksha /moknams/ ‘to stutter’. *mükk- > /mok-/, just as predictable from Steinitz’ suggestion and Saarikivi’s new etymology!
(/-na-/ is a derivative suffix used to form onomatopoetic(ish) verbs; compare e.g. Moksha /vakna-/ ‘to quack’; Erzya /pozna-/ ‘to fart’. And in case it’s not clear enough to non-specialist readers from context, /-ms/ is the normal Mordvinic infinitive ending.)

Datamining a bit from my native language, from Finnish we could actually find some grounds for skepticism at this point, namely precedents for word roots of the shape √mVk- being used to signify unclear speech, or not speaking: e.g. mokeltaa ‘to splutter’, mukista ‘to whinge’, mököttää ‘to sulk’. This could be taken to weaken the etymology we have just found, as suggesting that perhaps some number of the alleged cognates are actually independently formed onomatopoetic words. On the other hand, we could just as well ask if this group of Finnish verbs might not have simply been built on the example of the primary root *mükkä itself; since this √mVk- quite clearly seems to be a phonaestheme, not strictly speaking onomatopoetic. And these examples indeed all seem to find connections to other descriptive vocabulary: e.g. jokeltaa ‘to babble (of a baby)’, mutista ‘to mutter’ ~ ulista ‘to wail’, kököttää ‘to sit in one place’ — providing the possibility to explain them as kind of contaminations, along the lines of mykkä × kököttäämököttää ‘to sit while mute = to sulk’.


I mentioned a few possible complications, though. There are still a few cases where a possible sequence *-üK- would seem to come out as /e/ and not /o/ in Mordvinic. But a closer examination shows that there is no need for worry.

  • *ükə- ‘1’. This yields Erzya /vejke/, Moksha /fkä/ (PMo. approx. *veçkə, apparently from earlier *vej-kkä [2]). The word, though, shows the Mordvinic breaking of word-initial *ü- to *wi- (>> *ve-) — itself another “small” sound change not supported by too much material to begin with. Regardless, this must have been earlier than the general merger of *ü with *i, and thus probably also earlier than the merger of *ü(K) with *u(K). — Another possibility is that *-k- > *-g- > *-ɣ- > *-j- was also early enough in its entirety, and that there therefore was no velar consonant here anymore at the time of *ü-backing.
  • *müŋä ‘backside, by’.  The reconstruction of *ü is not actually warranted in this root, despite a large number of false leads! Finnic has *möö- < *müwä-, but this can be regularly secondary from earlier *miwä- (compare e.g. *hüvä ‘good’ < *šiwä, from Indo-Iranian). Mari *mü̆ŋgə ‘at’ with *ü̆ does not provide evidence either, for reasons I’ve covered before. Komi /mɨj/ ‘after’ may involve a similar development as in *šiŋərə > /šɨr/ ‘mouse’ (where we most definitely have original *i), i.e. retraction before *ŋ. And Hungarian, while having mögött ‘behind’, also shows meg ‘and’, megint ‘again’. I suspect (though cannot crack open in full detail) some degree of dialect mixture, possibly starting from a dialect where unstressed *-ə- > ö rather than ë, followed by an umlaut of sorts: Old Hu. *mɪgət > *mëgött > mögött. — Altogether, it seems feasible to reconstruct rather *miŋä.

[1] With a similar exception development *ü > /o/. Steinitz has a proposal for this as well, describing it as *i, *ü >> /o/ in the environment *p_K (contrast *piŋə ‘tooth’ > Hu. fog, versus Mo. > *peŋ > standard Erzya and Moksha both /pej/). This again looks regular enough, but I am a bit more skeptical yet on assigning overly specific consonant environments like this for sound changes. I’d prefer to break this down to one labialization process (by the preceding *p-) and one backing process (by the following velar consonant), but suspiciously, they do not seem to exist independently of each other.
[2] Just about all Uralic languages have various kinds of unmotivatedly suffixed descendants for ‘one’. Finnic *ükci, Samic *ëktë and Mari *ĭktə all suggets roughly *ük-tə; Mansi *äkʷ suggests *ük-kV. This seems to be a fairly common phenomenon, as the same trend continues e.g. with Samoyedic *o- (Nganasan /ŋuʔəj/, Selkup *okər…) or with Proto-Indo-European (*oi-nos ~ *oi-wos ~ *oi-kos ~ …). Or at least we think it’s suffixation. Sometimes something even weirder comes along, e.g. Udmurt /odɨg ~ odig ~ odik/, Komi /ətɨk ~ əťɨk ~ əťik ~ əťi/: while these are usually also counted among reflexes of *ükə- or even *üktə-, I really have no idea what’s going on with them, and honestly I don’t think anyone does (they really look the most like some kind of late mutant fusions of the Uralic root with Russian один).

Tagged with: , , , , , ,
Posted in Etymology

Trees within trees: the Bundle Model

treeintree

Reposting here, an illustration I whipped up a few days before Christmas, for a debate on the validity of the tree model in linguistics, held at Academia.edu in an article draft session by fellow historical linguists and linguistics bloggers Guillaume Jacques and Johann-Mattis List. They argue against recent papers by Alexandre François and Siva Kalyan, who have proposed “freeing” historical linguistics from the tree model, and moving to an updated wave-model-esque approach they call “historical glottometry”.

I will not cover the debate here in detail, especially as the comments have been made publicly available by now (see also the link above thru to Jacques’ blog for some set-up details and further links). One major observation that I think however emerges is that there are multiple different senses in which we can speak of the “splitting” of languages — and it therefore often depends on the level of analysis how the relationships between languages should be represented.


My diagram above says nothing directly about linguistics, and is simply an abstract interleaving of two disparate tree structures: a macro-level, represented by branch distances; and a micro-level, represented by the graph topology. If you look closely, you can also see that there are indeed two micro-trees in the graph, unconnected to each other. (They likely would join paths sometime further down in history, had I continued drawing.)

There are 12 leaf nodes in this “double-tree”, which we may call A, B, C, …, L. Depending on which level of analysis we are looking at, there are two possible taxonomies generated by the two tree structures:

  • a “macro classification”:
    • [[A, [B, [C, [D, E]]]], [F, [G, H]], [[[I, J], K], L]]
  • a “micro classification”:
    • {{A, {{B, C}, D}}, {{E, {{F, G}, H}}, I}}
    • {{J, K}, L}

There are not many subgroups that would occur in both structures! The only such one is the triplet {F, G, H}… and even the subgrouping of this again diverges. There is moreover an interesting chronological complication with the splitting of this group: the micro-level branching occurs in its entirety substantially earlier than the macro-level branching.

In principle, it would be also possible to nest a third tree yet, of arbitrary structure, deeper inside the picture — so that upon zooming in, the graph representing microstructure again resolves into a set of unconnected nanostructures, branching and turning in tandem. And so on, ad libitum: fit then in an additional picostructure inside the nanostructure, or perhaps: use the current macro-division as a base for a megastructure with another geometry again entirely. (Moving from two dimensions to three or more will be required, if we wanted to fit in “non-contiguous” subgroups such as {A, C} or {E, F, J}.)

My approach here is also but one of various possibilities for “mixing” trees together. It does have one interesting constraint: in all cases, a macro-branching between two leaves takes place later, or at most at the same time (e.g. E | F), as their micro-branching. — But we could also imagine e.g. a single three-dimensional tree, whose 2D projections in a number of different directions each form a new tree of a different shape. In this case, branchings visible e.g. in the XZ-plane could be equally well earlier or later than the corresponding branchings visible in the YZ-plane.


If we imagined the above tree to indicate language relationships, perhaps linguist fieldworkers’ initial instinct would be to group the 12 varieties as 4 languages, according to the macro-structure:

  1. {A}, clearly a variety of its own;
  2. {B, C, D, E} as a set of “closely related” varieties;
  3. {F, G, H} as a more diffuse dialect continuum;
  4. {I, J, K, L} as an intermediate case.

But at some point, a closer look into the dialect diversification of these varieties might indicate e.g. that the features separating A from B-E include some traits that go quite far back, already before the B-E / F-H split. Other troubling isoglosses might also surface, where A thru I shared one value, J thru K another — and where we were regardless unable to show that the latter, “more closely related” varieties truly have innovated, and not the diverse remainder. At some point “language 2” might end up renamed a “dialect continuum” or a “linkage”, while the “more diffuse” language 3 might firmly retain its clade status. If “language 4” also would end up analyzed as a linkage is less obvious. Perhaps linguists would still hang on to analysing at least the split that distinguishes A-D from E-I as multiple unconnected events (one for E, one for F-H, one for I?)


Commentors in the session soon pointed out that my illustration reminds them of the concept of incomplete lineage sorting (ILS) from evolutionary biology. This is, roughly speaking (and any readers with more evobio under their belt than I have, feel free to correct me if this is inexact), the phenomenon that while speciation takes a parent species’ entire gene pool with it, some diversity may later end up being lost in daughter species. And if a species S with two alleles of a gene G splits into two daughter species, and allele G₁ eventually survives only in daughter S₁ while allele G₂ survives only in daughter S₂, we might end up wrongly concluding that the distinct alleles only developed in the daughter species. Moreover, if this kind of a situation takes place a couple of times, a gene may futher seem to have split into alleles in the “wrong” order, compared to the actual family tree of the species.

This is however not quite the same phenomenon that I am attempting to point at.

The exact linguistic counterpart of ILS is levelling: if we reconstruct a morphophonological alternation pattern in a proto-language, let’s say *a ~ *b, it will be possible for descendants to analogically eliminate one or the other alternant, and to end up with unvarying *a or unvarying *b. I have many opinions on levelling (most of them critical of reconstructing alternation from non-alternating reflexes; or of projecting attested alternation patterns deeper than necessary)… but that would be an overly large tangent to go on right now. Suffice to note that yes, levelling indeed also creates counter-tree-like isogloss configurations.

We could also define “lexical levelling”, brought about by the loss of inherited vocabulary. Mechanistically, this might look like a different phenomenon from morphological levelling, [1] but in terms of isogloss patterning, it often ends up looking exactly the same. An ancient proto-word might survive only in one group of descendant languages (and end up looking like an innovation particular to it); or it might be lost in a few descendants quite early on (and end up making the other descendants look like a subgroup defined by the introduction of this word); or it might survive in a ragtag assortment of not especially closely related descendants (and make it very clear that the occurrence or non-occurrence of a given word is not a strong genetic signal).

There is however a key difference between lineage sorting and my meta-trees. The “proto-variation” I’m trying to indicate by this meta-tree is not internal to a language variety. It is instead built from variation between the idiolects (topolects, etc.) that a given language is composed of.

Genes are obviously different entities from species, and likewise allomorphs (words) are different entities from languages, so it’s not a huge surprize that their family trees might not match each other; perhaps not even resemble. Two seemingly unrelated genes could turn out to be related, once you look a couple billion instead of just a couple million years back. It is hard to tell how common the same might be for seemingly unrelated words, given that our knowledge of linguistic history remains far shallower than our knowledge of evolutionary history… but even if we assumed that no such cases exist at all (which is, by the way, demonstrably untrue), loaning still often enough suffices to generate completely opaque doublets such as wool and flannel, or atoll and esoteric.

Language contrasts, dialect contrasts and idiolect contrasts meanwhile are only qualitative variations of the one and same thing: linguistic variation between speakers. And yet we can also sketch a situation where a “language split” ends up taking place along different fault lines than an earlier “dialect split” did.

This observation is by no means my own invention. For example, my Helsinki colleague J. Häkkinen calls this phenomenon “boundary shift” in a paper published a few years ago. [2] The particular example he refers to (certain divergences in vowel history in the common West Uralic era) has by now been explained otherwise, [3] but other candidates could easily be located as well. A few that spring to mind within western Uralic would be the numerous isoglosses connecting Votic with the Eastern Finnic (Savonian-Ingrian-Karelian-Veps) language group, e.g. the innovative 1st and 2nd person plural pronouns *möö, *töö, [4] rather than with Estonian, generally considered the closest relative of Votic; or the treatment of initial *d₂- in Samic, where Southern and partly Ume Sami show a development to *θ- > /h-/, but most languages show instead a development to /t-/, which happens to be also found in Finnic. [5] It is likely that many such conflicting isoglosses simply represent secondary contacts, much after the initial separation of the language groups, or even independent developments altogether, but I indeed see no reason to assume that they must all be somehow secondary. Many examples could well have taken root already during the initial dialect divergence of the involved language groups.

We know from dialectology and sociolinguistics that linguistic innovations almost always have a “width”. Instead of taking place in a single isolated variety, with inheritance from there to a set of descendants, they rather spread across some number of related-but-distinct varieties. (This is a point that François and Kalyan justly stress in their papers, if with different terminology.) A boundary shift is, then, nothing more than a change in how far exactly isoglosses coming in from a given direction end up spreading. The conventional usage of “language area” or “language contact” mainly comes up when new innovations extend wider than older ones did, and we often speak of dialect area X extending some influence to dialect area Y. But the opposite is possible as well: if new innovations “shrink” — they stop reaching a particular group of varieties — then not only does this lead to these varieties “splitting away” as a relict area from an earlier group of related varieties: it also leads to their earlier sibling varieties now “changing course” to instead align with some other adjacent “cousin” varieties.

This is the phenomenon that I attempt to capture by the various bunched right-angle turns in my opening graphic. For example, the split between “language 1” and “language 2” involves three micro-lineages (B-C, D and E) turning away in unison from the micro-lineage of variety A — even though the micro-lineage of E has already much earlier split away from that of A-D, and also the split between A and B-D is already well enough in effect. There is therefore a boundary shift here: the macro-lineage formed by A, B-D and E is broken, and only the latter two continue on together (B-D now moreover split into B-C and D). After this, new innovations again continue to accrue across the macro-lineage for a while, as represented by the linear “branch” section.

This situation does not amount to an “unitary protolanguage”, since the three lineages are, in fact, already micro-separate. An attempt at reconstructing a unitary Proto-BCDE would have to reach much deeper than this period to be able to unify also the deepest micro-divergences.

But, just about equally importantly — a single unified Proto-BCDE regardless exists, if way back there (in this case it is, in fact, simultaneously also the proto-variety behind everything from A to I). “Boundary-shrinking” in this sense can thus only operate on closely related varieties; and it can only decrease the similarity of some varieties from their earlier siblings. It is not capable of leading to the “convergence” of unrelated languages. Whatever macro-group ends up being formed by some separate lineages is not in any way converging: it is merely maintaining its pre-existing divergences at a given level, while language varieties outside the group are free to diverge further off. (Of course other processes, such as loss of archaic vocabulary, can well lead to actual linguistic convergence.)


The distinction I draw here between micro-lineages and macro-lineages however also has a different readily applicable interpretation in linguistics: genealogy vs. typology. We find no problem in stating something to the effect that Finnish and Turkish are agglutinative vowel harmony languages, while Livonian and German are a fusional vowel-reduction languages: this is taken as nothing more than a relatively superficial system of classification, separate from the “true”, i.e. genetic classification (according to which Finnish and Livonian are both Finnic, while Turkish and German are not even Uralic). But regardless, just as (proto-)languages can split into multiple descendants, language areals can similarly over time split into multiple typologies. Starting from a single point far enough back in time, we should be again able to trace a tree of diverging typologies, which is also again 1) likely to diverge in structure from any genealogical tree, and 2) likely to have all of its splits located later than the corresponding genealogical splits.

Typological divergences definitely also often involve boundary shifts of their own. If Livonian at some point in its history has taken a turn towards fusional typology, then it also has to have taken a turn away from agglutinating typology, and this quite well amounts to boundary shrinking of the “(core) Finnic macro-lineage of agglutinative typology”. Or, inversely: the relatively clean agglutinative morphology of common Finnic, still preserved in e.g. standard Finnish and Karelian, has in many later descendants been muddled by various processes of apocope and syncope: such is the case at least in Livonian, Estonian, Southwestern Finnish, Veps, and partly Ludic; more recently also in some dialects of Ingrian and Votic. This has the effect of turning inherited polysyllabic vocalic stems into “thematic stems”, arguably a step towards a more fusional typology (and at least in Livonian and Estonian, this has been a basic building block for many other innovations in morphology). Regardless, looking from the perspective of early dialect divisions in the Proto-Finnic era, the varieties involved are just about a scattershot. [6]

There also seems to be deeper similarity in here to dialect diversification, not only in the resulting tree structures, but also in the actual details of linguistic change. “Genetic macrostructural”, or “linkage-defining” wide-spreading innovations indeed have various features in common with “typological” wide-spreading ones:

  • They may ignore the microstructure of the dialect continuum;
  • They may spread in phases, taking root in different micro-lineages at different times;
  • Where independent, they may spread also over each other, forming patchwork-like rather than concentric isogloss patterns;
  • They may end up being reversed, if a counterinnovation arises;
    (I’m thinking here principally about “isomorphic” sound changes, that only affect the phonetic realization of a phoneme or a phoneme sequence, not its relation to the rest of the phonology; innovations in syntax may be applicable as well)
  • And finally, they can take the leap to “fully areal”, and spread also to “unrelated”, or at least not at all closely related language varieties.

Due to the lack of clear distinction on which linguistic innovations count as “macro” and which as “micro”, François & Kalyan have suggested roughly that we should treat them all as equally genetic. But I would claim that an opposite approach is just as well possible: since there is also no clear distinction between innovations that count as “macro” and innovations that count as “typological”, perhaps we should treat them as equally non-genetic.


So how do we reconcile these two extremes? A trivial solution would be to claim that no genetic relatedness between language varieties exists, but this obviously gets us into other conceptual problems quite fast (not to mention the troubling echoes of Marrism). Another option might be to instead deny the idea that we can speak of “the” genealogy of a language. Whenever many different and contradictory tree structures emerge, it may be worth checking if we could consider each of them to represent the descent of a different thing. A language’s nominal syntax does not have to have the same exact (areal or dialectological) origin as its vowel inventory, which does not have to have the same origin as its verb morphology, which does not have to have the same origin as its metalworking vocabulary; and perhaps it is a mistake to think that we can pick out the “One True Tree” from among the histories of these various subsystems.

But a third option yet, which I am growing increasingly fond of, would be to first grant that, yes, all usually recognized linguistic innovations are more or less “typological” or “areal” — but to then seek a deeper level yet that we could use as the rooting for the genetic origin of a language variety. My current contender for such a level is local continuity, forming what I call the bundle model.

In the absense of dialect levelling events (the introduction of expansive acrolects through e.g. migrations, mass media, or standardized schooling), a topolect specific to a given location has been primarily descending from the earlier topolect of that same village, as far back as language-level continuity gets us. A fundamental division of language varieties into topolects is also relatively unambiguous: just about any speaker either lives, or doesn’t live, in a particular village. No especially coherent division into topolects smaller than a village is possible either (at least as long as we’re talking about settled, non-urbanized, agricultural societies). [7]

A given linguistic innovation that forms an isogloss somewhere across a dialect continuum is, then, not what actually splits two topolects apart. Their existence is merely evidence that two topolects on different sides of the isogloss had already split from each other at the time. A primary splitting event instead corresponds to either the foundation of a new settlement altogether; or to the introduction of a novel language variety to a pre-existing settlement (no matter if as L2 or L1).

There is admittedly the complication that topolect monogeny is not ensured. Any new settlement could gain its speaker base from more than one pre-existing settlement; and the resulting new topolect can quite possibly end up taking on a mixture of its parents’ traits, instead of starting off as essentially a copy of one of its parents.

As for secondary splitting events, i.e. the actual language diversification, these could be instead said to form “bundles” of local micro-lineages: a category which includes as subtypes all three of “language areas”; “linkages” of related languages; and “subgroups” defined by common features. The differences between the three are, in the bundle model, considered differences in degree, not kind, with no sharp boundaries between them. However, it seems to be necessary to note that there are at least two gradual transitions here: half-a-continent-spanning language areas are still clearly different from local linkages, which in turn are also clearly different from small, tight bundles of topolects.

Also, amusingly enough, not only is it possible for a bundle to comprise language varieties of differing genetic backgrounds — it is also possible for a genetic group of languages to fail to be identified by a corresponding feature bundle. I expect many large-scale subfamilies to be indeed genetic subgroups, in addition to their unambiguous bundle status. But within any one such subfamily, it is easily possible for various smaller genetic groups to have formed, and then split up again, fast enough that no actual linguistic markers managed to establish themselves as characterizing the entire group (and only it).

What would be different for “secure” subfamilies (and “primary” language families) is moreover not their speed of formation. I would equally well expect that e.g. the main local-continuity genetic groups of Finnic had already split from each other before the vast majority of the innovations that today characterize the Finnic subfamily (no matter if one current primary branch would amount to half the Finnic language area; or to a single backwoods town somewhere in southern Estonia). It is the extinction of other early connecting varieties that allows me to be relatively sure that, yes, there was once a common genetic ancestor of the Finnic languages that was also not the genetic ancestor of e.g. any of the modern-day Samic languages. This common genetic ancestor could very well still predate various innovations that did spread to both the Finnic and Samic languages, putting it well within Proto-Uralic times, and thus looking distinctively non-Finnic. If we look for biological parallels, this “common genetic ancestor” thus functions the most like the identical ancestors point.

By contrast, reconstructible Proto-Finnic, no matter if we define this loosely by the last innovation common to all the languages (e.g. in phonology, the best candidate is *š > *h), or more strictly by the last innovation that is not predated by any innovation particular to a smaller set of varieties (in phonology I’d suggest for this something like the raising *aa > *oo, *ää > *ee), instead functions as the mere last common ancestor of the “population” of Finnic language varieties. In practice, this would mean something like the last language variety whose distinguishing linguistic characteristics were eventually uptaken by all other Finnic varieties known to us (either with or without allowing for the survival of additional earlier characteristics).

The bundle model also seems to have the benefit that we could make much closer use of archeology in determining when have various micro-lineages originally split from each other. If a cultural wave that we identify as Finnic reaches Southwestern Finland already in 500 BCE — then very well, let us assume that the deepest distinctions between individual western Finnish dialects could have already taken root at the time (and not at whatever time distinctions first start turning up in phonology, or morphology, or vocabulary). After this, we expect to see the foundation of new Finnic-speaking settlements in quick gradual succession, followed by the slower bundling of linguistic innovations (and possibly isoglosses) on top. But just as dialectologists and “linkageists” have long observed, there is no reason to a priori expect these later innovations to form a clear nested tree-like structure.

I have thus ended up agreeing partly with both the Jacques-List and the François-Kalyan camps. As per the latter, yes, we should stop trying to force our analyses of linguistic innovations into a tree shape by default; but per the former, no, this does not mean that we should up-end the concept of “genetic relatedness” entirely, and start applying it also to what are obviously areal units joined only by relatively late innovations (and though I’ve barely even touched the topic in this discussion, also: no, F & K ‘s “historical glottometry” is not an especially illuminating way of demonstrating the historical development of language groups).

For closing, I present here another imaginary diagram, this time more heavily un-tree-like (highly dialect-continuumish), and with some specific features of the bundle model illustrated. — For credit, this is again not completely original work. My key convention of presenting isoglosses as horizontal lines connecting multiple varieties is inspired, foremost, by earlier articles by Sammallahti and Viitso. [8]

  • Solid lines indicate micro-lineages, just as before;
  • Wide-angle turns indicate spreading events;
  • Small-angle turns (mostly) indicate boundary shrinking events;
  • Dashed lines indicate (some) isoglosses, bundling micro-lineages together;
  • Dead ends in T indicate language replacement events;
  • Dead ends in X indicate abandoned settlements.

bundles

I leave it to you to explore the picture further, e.g. to figure out how many processes that I have discussed above you can find illustrated.


[1] They also do share some important mechanistic similarities. If we treat morphophonology as lexicalized rather than surface phonological — then “alternating stem variants” will be nothing more than lexically separate words altogether; and “morphological levelling” amounts to the loss of such “transparently suppletive” words from a paradigm. This is often showcased by morphophonological alternants that lose their original function, but remain in some specialized one.
— A simple example might be Finnish syöpä ‘cancer’. Originally this is simply the active present participle of syö- ‘to eat’; however, it has been ousted from this function by a newer form syövä ‘eating’. Here -vä is the most regular front-vocalic APP ending, analogically drafted in from much more common bisyllabic verb roots (e.g. elä-vä ‘living’, tietä-vä ‘knowing’, käänty-vä ‘turning’, pese-vä ‘washing’), where it is phonologically regular (due to lenition *p > *b > v between unstressed vowels). Hence, the history here involves three steps: 1) the semantic enrichment syöpä ‘eating’ > ‘eating; cancer’; 2) the introduction of the more regular form syövä into the paradigm of ‘to eat’; 3) the loss of the form syöpä ‘eating’.
[2] Häkkinen, Jaakko (2012): “After the protolanguage: Invisible convergence, fake divergence and boundary shift”. — Finnisch-Ugrische Forschungen 61: 7–28.
[3] The Erzya dialects in question seem to agree with Samic in suggesting (West) Uralic *we- in a couple of words, in contrast to forms suggesting *(w)o- in the other Mordvinic varieties. This though turns out to be merely a part of a more general late conditional sound change *u- > /vi-/ in these dialects; see Ante Aikio’s article in SUSA 95: 42.
[4] Discussed in some detail by Terho Itkonen (1983): “Välikatsaus suomen kielen juuriin“. — Virittäjä 2/87: 214–217.
[5] An example taken from the isogloss map of Finno-Ugric by Tiit-Rein Viitso (2000): “Finnic Affinity”. — Congressus Nonus Internationalis Fenno-Ugristarum I: Orationes plenariae & Orationes publicae: 153–178.
[6] This actually goes further yet. Also “Estonian” and “Finnish” have been known for long to be basically typological groupings formed in this fashion, both comprising multiple different genetic micro-lineages, some of which are not especially close in origin. Very roughly, if a Finnic variety is fully consonant-gradating, relatively archaic in its morphology otherwise, mostly nonpalatalizing and lexically Swedicized, it is “Finnish”; if it is consonant-gradating, fully syncopating and apocopating, and lexically Germanized, it is “Estonian”. Laxing the definitions a bit might also allow us to call Karelian, Ingrian and Votic “typologically Finnish”, versus Livonian “typologically Estonian”. — Constructing a definition of “typologically Veps” as a third areal is left as an exercize for the reader.
[7] A slightly modified model, allowing for “locations” to be territories rather than settlements, as well as for more fluid transitions and exhanges between tribal units, would seem be required for nomadic and certain hunter-gatherer societies. This might also provide some degree of explanation for, and new tools for addressing, the difficulties in reconstructing the linguistic pre-history of areas characterized by heavy diffusion between “unrelated” or not closely related languages, such as Australia and Central Asia. I do not think I am quite going into reviving the punctured-equilibrium paradigm of linguistic history here, which likewise denies the possibility of figuring out clear tree-like linguistic histories for mobile societies… but discussing the distinctions between that model and mine would be too much to chew on right now.
[8] See e.g. Sammallahti, Pekka (1977): “Suomalaisten esihistorian kysymyksiä“. — Virittäjä 2/81: 119–136.
– Viitso, Tiit-Rein (1999): “On Classifying the Selkup Dialects”. — Europa et Sibiria. Veröffentlichungen der Societas Uralo-Altaica 51: 441–451.

Tagged with: , , , , , ,
Posted in Methodology

*wu > *u in Finnic

One minor phonological innovation in Finnish is mentioned in historical overviews far more often than could be expected from its lexical frequency: the loss of a palatal semivowel *j when preceding its vocalic counterpart *i. This is probably because the shift has been fossilized as a morphological alternation [1] in the word veli ‘brother’ (< *velji), stem velje-. The change also shows up in some old derivatives, e.g. nelikko ‘group of four’ (< *neljikko) from neljä ‘four’.

For phonological analysis, both synchronic and diachronic, a principle that I find valuable is back/front symmetry. This follows as a special case of what is perhaps the main result of featural phonology: phonemes are not atomic entities, but rather bundles of features. And so sound changes or phonological processes that are conditioned on vowel height tend to ignore vowel backness and roundedness. Here we would then expect to also find the corresponding shift involving labial (semi)vowels: pre-Finnic *-w- or proto-Finnic *-v- > ∅ before *u or *ü (= in shorthand: *U).

Yet it turns out that this question is barely discussed anywhere. I have e.g. found no mention of such a development in Lauri Hakulinen’s Suomen kielen rakenne ja kehitys. [2] Martti Rapola’s Suomen kielen äännehistorian luennot does not fare much better (as in perhaps predictable though, since his focus is firmly on dialectal developments within Finnish, not on pan-Finnish innovations).

Let’s try having a look if there is any evidence to be found on this matter.

In support

Given the absense of clear evidence for *U-stems in Proto-Uralic times, there are not many words where we can reasonably assume the sequence *-wU- to have existed in pre-Finnic times. Just one clear word-initial case of loss can be found: *wülä- > PF *ülä- ‘up(per)’ — cf. Permic *vɨl-. [3] Slightly odder is *wud₂ə ‘new’ (and even this, I believe, should be regardless derived from an even earlier *wod₂ə, though this is of no direct relevance for the current topic). This turns up as PF *uuci (Fi. uusi etc.), seemingly with vocalization, rather than loss, of the initial glide. We could also e.g. assume a metathesis *wu- > *uw- as an intermediate stage.

Still, Proto-Finnic clearly had *u-stems, whatever their origin. And it seems that there is still a decent amount of of evidence for a simplification *-wU- > *-U-  in these. Already within Finnish I can find three clear doublets involving word derivation:

  • kalvaa ‘to gnaw’ ~ kaluta ‘id.’ (< ? *kalvuta) [4]
  • kärventää ‘to scorch’ ~ käry ‘burnt smell, rancor’ (< ? *kärvü)
  • raivo ‘fury’ ~ raju ‘fierce’ (< ? *raivu)

Comparison with Samic also turns up three likely cases.

  • Lule Sami iellvet ‘to note’ (< ? PS *ealvē-) ~ Fi. äly ‘intellect’, älytä ‘to realize’ (? < *älv-ü)
  • Proto-Samic *ocvē ‘wet snow’  (< *učwa) ~ Fi. utu ‘mist, fog’ (< ? *učw-u) [5]
  • Proto-Samic *toalvō-  ‘to lead, to take somewhere’ (< *tolvo- < ? *talwəw-) ~ Fi. taluttaa ‘to lead, to walk someone’ (< ? *talvu-tta- < *talwəw-)

I hypothesize that a close scan of *U-stem roots and derivatives in the other Finnic languages would turn up further evidence as well.

Exceptions

Much like is the case with -ji-, Modern Finnish does however allow the sequence -vU-.

Many of these cases can be shown to have been formed secondarily, and could be hypothesized to have come about only after *-v-loss. E.g. some go back to earlier *-βu- < *-bu- (I give here only non-paradigmatically-alternating cases):

  • juovu-ttaa < *joobu-tta- ‘to get/make someone drunk’ (← juopua ‘to become drunk’)
  • taivu-ttaa < *taibu-tta- ‘to bend’ (← taipua ‘to bend’)
  • vaivu-ttaa < *vaibu-tta- ‘to sink (tr.), lull’ (← vaipua ‘to sink, to fall asleep’)
  • viivy-ttää < *viibü-ttä- ‘to delay’ (← viipyä ‘to be late’)
  • voivu-ttaa < *voibu-tta- ‘to tire (tr.)’ (← voipua ‘to tire (intr.)’)

some involve loaning:

  • laavu ‘lean-to’ ← Samic, cf. e.g. NS lávvu ‘id.’
  • siivu ‘slice’ ← Swedish skiv ‘id.’
  • laiv-uri ‘skipper’ (← laiva ‘ship’; -Uri is a loan suffix from Swedish)
  • päiv-yri ‘almanac’ (← päivä ‘day’)

and others yet result from a late assimilation of unstressed *-AU- to -UU-: [6]

  • arv-uuttaa < *arvautta- < *arvad-u-tta ‘to ask riddles’ (← arvata ‘ to guess’)
  • raiv-uu < *raivau < *raivad-u ‘clearing’ (← raivata ‘to clear land, etc.’)
  • tavu ‘syllable’ < older †tavuu < *tavau < *tavad-u (← tavata ‘to spell’)

A few remaining derivative examples could be assumed to have been formed only after *-v-loss, or to have been reverted by analogy.

  • harv-uus < *harv-us ‘sparseness’ (← harva ‘sparse’) [7]
  • kaiv-u ‘digging, trench’ (← kaivaa ‘to dig’; this is an IMO unetymological doublet of *kajwa-w > kaivo ‘well’)
  • kasv-u ‘growth’ (← kasvaa ‘to grow’; the phonologically expected kasvo already means ‘face’)
  • kuiv-u- ‘to dry’ (← kuiva ‘dry’)

A soundlawful [8] doublet of the last one is possibly found in dialectal kujua ‘to wilt’.

Regardless, there remains a more problematic residue, which prevents me from simply assuming that *-vU- always > *-U- at some relatively early Finnic period. These are all basic noun roots with primary *-v-, where morphophonological alternation as a source of analogy cannot be possibly blamed for anything.

  • koivu ‘birch’. The only real excuse I could think up here is that in South Estonian the root is instead an o-stem, kõiv : kõivo-. So perhaps there has been here a later shift from *-vo to *-vu in North Finnic…? (The root has not been attested from North Estonian; in Votic it probably only occurs as an Ingrian loan; Livonian provides no evidence for the distinction between *-o and *-u.) This would still not be a regular sound change though, given aivo ‘brain’, arvo ‘value’, hieho < *hehvo ‘heifer’, kalvo ‘film, membrane’, etc. [9]
  • savu ‘smoke’ seems like it might actually be a positive example of the change, to an extent. On the basis of South Estonian sau ~ Votic and dialectal Olonets Karelian savvu [10] it would be possible to reconstruct PF *savvu; then, just as could be predicted, one *-v- is lost in Finnish. However, this only leads to the question: why does *-v-loss not occur in the previous three varieties as well? Its loss is still seen in e.g. ‘mist’: SEs udsu, NEs udu, Votic utu.
    An explanation may lie in the earlier history of this word. Samic *sōvë ‘smoke’ and Mordvinic *suf-ta- ‘to smoke’ indicate that the earlier form of the root was simply *sawə, not anything like **sawəw. Erkki Itkonen has supposed [11] that the Finnish word is not formed by suffixation, but rather by apocope-then-anaptyxis. In PF times, all former bisyllabic words ending in *-jə were contracted into diphthongs (e.g. *täjə > *täi ‘tick’, *wajə > *woojə > *voi ‘butter’); so in parallel, we would then expect also *sawə to have been contracted to *sau. But no nominal roots of the shape ˣCVU occur in the native lexicon of Finnish (and the scarce loanwords such as tau ‘tau’ or tiu ’20 items’ are on the recent side as well). Itkonen therefore posits a back-development *sau > savu, to better abide with the canonical bisyllabic root structure. The South Estonian form could then be considered an archaism. Perhaps likewise also the identical monosyllabic reflexes in Southwestern Finnish; although since SW Finnish clearly has had contraction in secondary cases with *-Vbu- > -Vvu- > -Vu- (papu ‘bean’ : SW plural pau ~ standard pavut), this wouldn’t really provide any additional sound change economy.
  • vävy ‘son-in-law’ is almost entirely parallel to the above. We again have North Estonian väi, South Estonian väü, Olonetsian vävvy, suggesting PF *vävvü — although, this time Votic shows shorter vävü. We could well again follow Itkonen’s solution and assume PF *väü. On the other hand, Samoyedic *weŋü suggests to me that the proto-form could this time have been something like *wEŋəwə, predicting indeed PF *vävü < *wäwəw. [12]
  • havu ‘conifer branch’. This could again come from *hau > *havvu > havu, as per Itkonen, in light of Olonetsian havvu. On the other hand, a loan etymology from Baltic (cf. Lithuanian žabas ‘branch’) and Ludian/Veps habu suggest that the proto-form was actually *hapu (with exceptional widespread levelling to the weak-grade stem), or perhaps *habu (with an exceptional unalternating *b).
  • sivu ‘side’. This word definitely does not seem to go back to **sivvu / **siu, given Olonetsian sivu. It might be possible to derive this as a Germanic loanword, in which case this could again be analyzed as a late-comer, but there are several phonological difficulties (e.g. what Old Norse actually has is síða< *sīdǭ, not the seemingly required ˣsíð < **sīdu < **sīdō; western Finnish dialects do not have forms along the lines of ˣsiru or ˣsilu that would be predicted from earlier *siðu; vowel length would be expected to remain in a sufficiently recent loan).

This leads me to suggest that the shift *-vU- > -U- has only taken place following another consonant. Most of my six initial examples are compatible with this. In case of koivu, we’d need to assume this got its -u only after the phonologization of *-oj as the diphthong /oi/; while raju and kujua might need to be analyzed as having originated in western Finnish specifically and spread from there to other varieties. Itkonen’s account of savu and vävy continues to work too, since the key forms like savvu show a geminate -vv-, not a diphthong + glide ˣsauvu (as modern Finnish prefers in cases like this, e.g. sauva ‘pole’). But we could also take a slight shortcut, supposing that these never had a geminate in most of Finnic, and that -vv- in Olonetsian (and Votic?) is indeed a late local innovation rather than an archaism.

In one broad stroke, this conditioning also takes care of just about all of the counterexamples above that could perhaps involve secondary counterfeeding (the types of juovuttaa, laavu, raivuu, kaivu). Additionally, among the positive examples, in one case the involved -v- might indeed derive earlier *-b-: kärventää ‘to scorch’ (tr.) seems like an affective/ideophonic variant of korventaa ‘id.’, which is derived from korveta (: korpeaa) ‘to scorch’ (intr.) < PU *korpə-.

As a third line of evidence in favor of this approach, let’s note that *-ji- > *-i- also seems to not take place following a vowel (laji ‘kind, species’, lujin ‘hardest’ ← luja ‘hard’, nuijia ‘to clobber’ ← nuija ‘club’, ojittaa ‘to dig ditches’ ← oja ‘ditch’) and is probably a post-Proto-Finnic change (*velji ‘brother’ > Karelian veľľi ~ velli, Votic velli). Maybe even particular to Finnish! Es. veli can be derived just as well through apocopated *velj (compare e.g. *neljä > *nelj > neli ‘4’).

Tracing the implications further, I even suspect that cases like PU *täjə > PF *täi = Fi. täi ‘tick’; PU *wajə > *woojə > PF *voi = Fi. voi, as mentioned above, have probably not develeped through a stage such as *täji, *vooji — but have involved the direct apocope of PU *-ə following a glide. In principle this predicts that words of the shape *CVji would perhaps have been possible already by Proto-Northern Finnic, from PF *CVjei < earlier *CVjA-j. Suitable roots for forming derivatives of this kind were rare, though.

This may seem to create problems for accounting for words of the shape CVvi : CVve-, like PF *kivi = Fi. Es. etc. kivi ‘stone’… but by now I have, also for other reasons, ended up with the hypothesis that these involve either the levelling of earlier alternation (*kiü : *kive- → *kivi : *kive-), or a geminate in Proto-Finnic that blocked this apocope (e.g. *povvi ‘bosom’ > Fi. povi, Votic põvvi, Es. *põvv > põu).

A second group — and more?

I have not exhausted above the examples known to me where a development *-vU- > -U- could be supposed for Finnish (or elsewhere in Finnic). However, all words remaining up my sleeve show some ambiguity: they involve syllable contraction *-VvU- > -VU-, and they could be analyzed also as cases of syncope followed by vocalization: *-VvU(C…) > *-Vv(C…) > -VU(C…)-. This hypothesis gains some support also from that several examples could have involved the loss of some vowel other than close rounded *-u- or *-ü-. They also commonly enough involve secondary *-v- from *-b-.

The following clearly have involved earlier *-vU-:

  • haukka ‘hawk’ < havukka (attested in eastern Fi.!) < *habukka — cf. Veps habuk
  • hius (single) hair’ < *hivus < *hibus — cf. Karelian hivus, Veps hibus
  • säyseä ‘tame’ < ? *sävüseä — cf. sävyisä ‘id.’; sävy ‘tone, hue’

The following may have had *-vU-, but other possibilities are reasonable as well:

  • auttaa ‘to help’ < ? *avu-ttaa / *avi-ttaa; aulis ‘willing to help’ < ? *avu-lis
    — cf. apu ‘help’, Veps abutada ‘to help’; or Western Fi. avittaa ‘to help’ (with counterparts in southern Finnic such as Es. aitama)
  • keuhko ‘lung’ < ? *kevu-hko / *keve-hko; köykäinen < köyhkäinen ‘light, feeble’ < ? *kevü-hkäinen / *keve-hkäinen
    — cf. kevyt ‘light’; or kepeä ‘light’
  • liukas ‘slippery’ < ? *livu-kas / *live-kas — cf. lipu ‘slipperyness’; or livetä ‘to slip’, lipeä ‘lye’ (liueta : liukenee ‘to dissolve’, pro ˣlipVeta, and liukua ‘to slide’ have to be analogical; the latter’s soundlawful doublet seems to be lipua ‘to glide’)
  • soukka ‘narrow’ < ? *sovu-kka / *sovi-kka — cf. sopukka ‘nook’; or sopia ‘to fit’

The following have no evidence specifically in favor of *-vU-:

  • aukko ‘hole’ < ? *ava-kko — cf. avata ‘to open’ (or < ? *auɣekko, cf. auki ‘open’, aueta : aukenee ‘to open’ (intr.); unlikely though given Livonian ouk)
  • kiukku ‘anger’ < ? *kiiva-kku — cf. kiivas ‘quick-tempered’
  • loukko ‘nook’ < ? *love-kko — cf. lovi : love- ‘cleft’
  • reuhtoa ‘to yank around’ < ? *revihtoa / *revehtoa — cf. repiä ‘to tear’ (tr.); revetä ‘to tear’ (intr.)
  • riuska ‘brisk’ < ? *rive-ska / *riva-ska — cf. ripeä ‘id.’, rivakka ‘id.’
  • saukko ‘otter’ < ? *sava-kkoi — cf. sapa ‘tail’ (but alternately from *sagukkoi, cf. *sagarma(s) ‘otter’ > Es. saarmas, Veps sagarm)
  • tiukka ‘tight’ < ? *tiivi-kka — cf. tiivis ‘compact’
  • tyyssija ‘abode’ < ? *tyve-s- — cf. tyvi : tyve- ‘base’ (even -yy- < *-yi- might be possible!)

General syncope after -v- however clearly cannot be assumed. Some examples that do not alternate with related bisyllabic forms, even through derivation, include: havista ‘to swish’, havitella ‘to strive for’, hävitä ‘to disappear, lose’, kavala ‘treacherous’, kivahtaa ‘to snap at’, kuvottaa ‘to be/make nauseous’, navakka ‘strong (of wind)’, ovela ‘shrewd’, ravistaa ‘to shake’, ravita ‘to nourish’, sivellä ‘to brush (paint etc.)’, suvanto ‘river pool’. To these could be also added an abundance of more or less transparent derivatives such as avuton ‘helpless’, kivittää ‘to stone’, kovasin ‘whetstone’, lävitse ‘thru’, savinen ‘clay-y’, syventää ‘to deepen’, tavallinen ‘normal’, toivomus ‘wish’, but I believe the point is made without going for completeness.


I could still see some patterns in favor of reconstructing at least conditional syncope. Most of the contracted examples involve following *-kk-; most involve a short first syllable (contrast the juovuttaa ja laavu types earlier); most seem to be “weak grade” formations, where the 2nd syllable would originally have been always closed (including also hius : hiukse-).

But what this is also reminding me of is the pattern of modern colloquial Finnish “clipped” or “slang” derivatives. These are not formed by agglutination, but instead by taking the initial CV(V)C or CVCC sequence of a word, shortening a long vowel if necessary [13], and appending a suffix after that. Some examples of derivation of this kind include:

  • -(t)sa: kotitalouskotsa ‘home economics (as a school subject)’ maantietomantsa ‘geography (as a school subject)’
  • -(t)si(-): fundeeratafuntsia ‘to think’, kannattaakantsia ‘to be worth doing’, miljoonamiltsi ‘million’ (of money), parvekepartsi ‘balcony’
  • -(t)su: fantastinenfantsu ‘fantastic’, rantarantsu ‘beach’; common in nicknames, e.g. Anna, Anni, Annika (etc.) → AntsuMillaMiltsu, Valtteri Valtsu
  • -(t)ska: juttujutska ‘thing(y)’, tietokonetietska ‘computer’
  • -(t)ski: jäätelöjätski ‘ice cream’, nuotionotski ‘campfire, bonfire’
  • (t)sku: banaanibansku ‘banana’, materiaalimatsku ‘(reading) material’

And -kka is one of the more productive suffixes of this kind. E.g.

  • harjoitusharkka ‘training’
  • junglejunkka ‘jungle’ (the electronic music subgenre!)
  • linja-autolinikka ‘bus’
  • liikuntaliikka ‘physical exercise (as a school subject)’
  • maisteri ‘Master (degree)’ → maikka ‘teacher’
  • purukumipurkka ‘chewing gum’
  • SörnäsSörkka ~ Sörkkä ‘district in Helsinki’

We also know some examples of this exact derivation pattern whose spread of cognates suggests fairly great age. Three good examples are the informal family terms eukko ‘woman, wife’ (< *emkko?) (cognate in Karelian), probably from emo / emä ‘mother’; ukko ‘man, husband’ (cognates in almost all Finnic languages), from uros ‘male’; veikka, veikko ‘brother, comrade’ (cognates in all Northern Finnic languages), from veli ‘brother’ (< *velji, as mentioned). I take it as probable that clipped derivation has been around for a good millennium or two in Finnic by now, even if it has never been very likely to leave lasting records.

As for examples that could bridge this handful of ancient-looking examples with 20th-century slang, I’m foremost thinking of examples of adjectives showing “suffix alternation”. At least formally, the possibility of reanalysing a stem and agglutinating -kka to that is possible. But nothing really precludes a “clipping” analysis either. E.g.:

  • jämeä ‘stiff’ ~ jämäkkä ‘sturdy’ (PU *jämä)
  • kimeä ‘high-pitched’ ~ kimakkaid.‘ (*kima, √kima?)
  • kalpea ‘pale’, kalvasid.‘ ~ kalvakka ‘paleish’ (*kalpa, √kalpa?)

— But even if some of the examples above are indeed clipped derivatives (I would suggest kiukku and tiukka as probable cases, due to e.g. their proto-forms with long vowels), this is unlikely to be the full story either. In particular haukka is not a derivative of any kind, but rather a loan in its entirety (← Proto-Germanic *habukaz).


Since it seems futile to cover the remaining cases by a single rule, it is probably wise to not attempt this. I am therefore leaning towards the option that there are no less than three similar but distinct sound changes involved here:

  1. *V̆vU > VU, in western Finnish (the haukka and also pau, koju type)
  2. *CvU > CU, across all Finnish varieties, perhaps most of Finnic, though later than *b > *β > v (the käry, taluttaa type)
  3. *Vwə > *VU, in Proto-Finnic times under so far unclear conditions (a few e-stem derivatives such as loukko and tyys-; possibly the savu group).

Type 3 seems moreover likely to be identical to the rise of some Proto-Finnic instances of long *UU: e.g. PU *śowə > WU *śuwə > *śuw > PF *suu = Fi. Es. etc. suu ‘mouth’; PU *tiwənə > *tiwnə > *tiüni > PF *tüüni = Fi. tyyni ‘calm’. [14]

It remains to be seen how well an analysis of data also from outside Finnish will support this division. To reiterate, I would in particular predict being able to find some further examples of type 2 from the other Finnic languages, involving derivatives in -U that have no exact Finnish counterparts.

An initial blind test already turns up at least one candidate in confirmation. Taking at random one Finnic root of suitable shape: *harva ‘rare, sparse’, I could predict that a derivative *harv-u would later yield haru. A word of this shape indeed turns out to be attested from southern Karelian, in the reasonably suitable meaning ‘watered-down milk’. But a fuller derivative hunt will have to wait for later.

[1] I was going to say “morphophonological”, but really my view is that at least some 80% of all “processes” proposed by morphophonologists educated in generative phonology are not synchronic rules of phonology at all, but merely the still-visible historical residue of former diachronic sound changes. In this particular case, too, it’d take far more mental gymnastics or morphophonological epicycles to explain why underlying /velji/ would surface as [ˈʋeli], while e.g. in the plural genitive, apparent underlying /velj-i-en/ surfaces as [ˈʋeljien]  — than to simply assume that the nominative of ‘brother’ is stored as the separate lexeme /veli/.
(To be fair, I’ve seen recent generativist work taking the stance that a level of “lexical” phonology between “deep structure” and surface realization needs to be posited after all, e.g. Kiparsky, “Formal and Empirical Issues in Phonological Typology“. This will likely go a good way towards rectifying the situation, but it may still be a while before people will be willing to consider e.g. that most allomorphy can be modelled as simply a subtype of synonymy.)
[2] So far, anyway. Any book that has ~1200 footnotes will contain much information that is not in the expected place.
[3] Even here I am actually not fully sure that breaking *ü- > *vɨ- can be ruled out (similar to Mordvinic, where *ü- > *ve-). Reconstructing instead Finno-Permic *ülä- would make it slightly easier to reconcile this with East Uralic *ilə- (> Mansi *äl-, Khanty *eeL-, Samoyedic *i-). But the zero onset in the latter could perhaps also be explained as analogy from *ëla- ‘down’.
[4] The case of kaluta is mentioned by Rapola; he however entertains also the possibility that they would not involve suffixation, but rather a “Sievertian” development -lv- > -lu- (and, presumably, the resulting trisyllabic stem *kalua- being then reanalyzed as if it were an original contraction stem *kaluda-, hence the modern infinitive kaluta and not kaluaa). There are no exact parallels for such a change; southwestern Finnish has the relatively similar -sv- > -su- (kasuaa ‘to grow’, rasua ‘fat’), but kaluta is pan-Finnish.
[5] A comparison I have previously proposed in the comments section.
[6] It would be an interesting question how these derivational cases diverged from *-Abi > *-AU > *-AA in 3rd person singular forms (as in *aja-bi > *ajau > ajaa ‘drives’), but I would presume some analogy in some direction is involved.
[7] Vowel length in this suffix is, per the usual explanations, due to complicated multi-stage analogy.
[8] To coin a translation for the useful concept expressed by German lautgesetzlich / Finnish äännelaillinen.
[9] In southwestern Finnish dialects different forms, such as koju ‘birch’ or aju ‘brain’, can also be found. Influence from Estonian is very much not ruled out though.
[10] Karjalan kielen sanakirja lists the forms savvu, vävvy and havvu from the southernmost dialects of Olonetsian, in the villages of Kotkatjärvi, Nekkula and Riipuskala.
[11] Itkonen, Erkki. “Beiträge zur Geschichte der einsilbigen Wortstämme im Finnischen”. — Finnisch-Ugrische Forschungen 30: 1–54.
[12] With Lehtinen’s Law blocked by the third-mora element, hence not *veevü. — Samic *vīvë is very difficult to account for. The apparent development *-ŋ- > *-v- has previously inspired suggestions of loaning from early Finnic, but in light of also the stem vowel mismatch, something like *wäŋəwə > *weŋəwə > *weŋwə > *wēɣwə > *vējvë > *vīvë (where the original *-ŋ- isn’t what yields *-v-) could also be within the realms of possibility.
[13] Modern Finnish still disallows overheavy syllables containing a long vowel and a coda cluster. Pointti ‘(rhetorical or score) point’ and jointti ‘marijuana joint’ are possibly first heralds of the syllable structure CVVCC making a more general entrance, but e.g. tietska is rather syllabifiable as tiet.ska, with a word-internal onset cluster, much like we need to assume also for loanwords such as ekstra (= probably eks.tra).
[14] My account of *üü in here is tentative — it would have to pre-date *ti > *ci, and it’s possible that there are grounds to exclude this ordering. I’ll have to fiddle with my poset model of Proto-Finnic relative chronology to see if this can be made fit in…

Tagged with: , , , , , , , ,
Posted in Reconstruction

Enter your email address to follow this blog and receive notifications of new posts by email.