A slice of Finno-Ugric research interests across time

A tabulation project I’ve assembled a while ago: a topical index of the Finno-Ugrian Society’s by now approaching-300-long monograph series Suomalais-Ugrilaisen Seuran Toimituksia / Mémoires de la Société Finno-Ougrienne. Aside from being handy for looking up what has been done when about what by whom, it is also possible to dig up various interesting statistical observations from here.

Here is a copy of the file in OpenOffice format, in case anyone wants to have a look themselves. (I also considered linking some kind of a Creative Commons licence here, but I am not sure if formatting raw data available elsewhere passes the threshold of originality at all.)

One simple observation is that early on, the series does not seem to have had a specific theme. The first fifteen volumes, released in the 19th century, include a dictionary of Lule Sami; a bibliography of Sami studies; and even a bunch of monographs on Central Asian epigraphy. In the early 20th century though, the Society ended up establishing a few other book series as well: Lexica Societatis Fenno-Ugricae (for dictionaries) and Kansatieteellisiä julkaisuja / Travaux ethnographiques de la Société Finno-Ougrienne (for ethnographic studies). This left SUST mainly for text collections and linguistic monographs, though for some reason a couple of ethnographic studies continue to be released per decade, too.

Let’s next take a look at some further chronological features.

Subjects covered over the years

I have here focused only on the linguistic monographs on the Uralic languages. Text collections have sometimes spent quite some time in edition, and their publication date does not really indicate anything about current research foci. And while a couple of volumes have focused on languages of other families (mostly various Altaic things, e.g. vol. 82 – G. J. Ramsted: A Korean Grammar), they’re sufficiently rare that there are no real patterns to be seen wrt/ these.

famperyear2
Three monographs have a focus that covers several language groups, yet not many enough, or not with a comparative enough approach, for me to have counted them as “general Uralic” works: vol. 170, Raija Bartens: Mordvan, tseremissin ja votjakin konjugaation infiniittisten muotojen syntaksi; vol. 203, Ulla-Maija Kulonen: The Passive in Ob-Ugrian; and vol. 262, Beáta Wagner-Nagy: On the Typology of Negation in Ob-Ugric and Samoyedic Languages.

We can see that the main focus of research released in the series has remained thruout the years on the Finnic and Samic languages. Research on Samoyedic and Mordvinic has been on the rise since the 70s. Hungarian meanwhile seems to remain at a perpetual limbo. To an extent this is understandable: both Finnic and Samic comprise 7-10 languages (depending on how one counts exactly), and Samoyedic half a dozen as well — while Hungarian remains, despite having more speakers than all the other Uralic languages put together, still only a single language from the comparative viewpoint. It’s however also noticable how the Ob-Ugric languages seem to lag behind both their western and their eastern relatives. From this angle it might even seem that the Finno-Ugrian society is, despite the name, not releasing that much work on the Ugric languages at all these days. Of course, already at the language materials side of this series, the Ob-Ugric languages are decently represented, by Matti Liimola’s seven-volume Wogulische Volksdichtung series, and eight volumes total of text collections by Heikki Paasonen and K. F. Karjalainen from Southern Khanty.

Topics covered over the years

Defining “the” topic (subdiscipline) of a monograph is not necessarily obvious. I’ve however made an attempt, this time also including the works centering on non-Uralic languages:

topicperyear2
At least some broad outlines of history are evident here.

  • The emphasis on various types of of historical-comparative research is clearly visible.
  • Historical phonology in particular still remains the strongest-represented subfield of linguistics in the series. This is however entirely due to a strong focus in the first half of the 20th century… after which interest in it seems to just die out completely. (And no, it is not due to having run out of topics to cover; plenty of Uralic languages from Skolt Sami to Selkup have never received a detailed study of their historical phonology, and that of many others’ has not been revisited in light of modern research in decades. The investigation of numerous issues of Proto-Uralic reconstruction could easily stretch to monograph size as well.)
  • Around the same time historical phonology takes a nose-dive, there is a peak of interest in historical morphology, although by now this is seemingly trailing off too.
  • Synchronic syntax jumps into sudden prominence around the 60s as well. Historical syntax meanwhile has remained a black sheep topic of sorts, up to the present day (the one entry in this category is also the series’ most recent volume altogether).
  • Semantics has usually been covered in conjunction with syntax, not entirely on its own. Possibly a half-point for semantics could be granted for some of the etymological works as well.
  • A couple of perhaps newly rising research directions are comprehensive language documentation and “synchronic dialectology” (an oxymoron of a concept, if you ask me).
  • Nobody cares about phonetics. Not enough to dedicate an entire monograph to the topic, at least.
  • Ethnographic research seems to have been much less affected by “fashion” — this component of the series has trudged on at a low, slowing pace. Again, I don’t really know though what exactly leads to works in this field being published in this series and not in Kansatieteellisiä julkaisuja. (Editorial work overflow, maybe?)

Language use over the years

Here I cover the entire release series, including the presentation languages of the text collections and Festschriften.

langperyear
This picture is relatively simple: German maintains a dominant share of about two thirds of the published material, starts going out of fashion at about 1965, and then finally takes a sharp nosedive in the 90s (its last strong-going decades are largely supported by later entries of text collection series). Finnish creeps up in prominence slowly, reaching a peak in the 90s — only to then cede dominance to English around 2002.

All other languages are essentially curiosities. The French volumes focus mostly on the aforementioned wildcard topic of Central Asian epigraphy. The one Russian entry is an ethnographic monograph. In Northern Sami there’s Pekka Sammallahti’s Festschrift, not only edited in, but also most of its articles are written in the language; and in Swedish, an article collection of Otto Donner’s for his 100th anniversary. Nothing in Estonian so far, although I predict that will probably happen sooner or later. More notable is the total absense of Hungarian, though of course also not that big of a surprize from a Helsinki-based scientific society.


More interesting results might be available on this topic by extending the analysis to similar series by other publishers. Many are probably individually still too small for meaningful statistical analysis (take for example Budapesti Finnugor Füzetek, with 22 releases spanning 16 years), but in the aggregate they could reveal further patterns. How’s the transition from German to English as the main language of publication going elsewhere? Does a historical phonology dropoff point exist in general? Are there languages or language groups whose research has been at some point in or out of fashion across Uralic studies in general? Etc.

Tagged with: , , , ,
Posted in Uncategorized

‘Swan’ in Uralic

A word group among the Uralic etymological comparative material with remarkably messy sound correspondences is that for ‘swan’. The following candidates for inclusion are usually identified (all with the same meaning):

  • Samic *ńukčë (> Southern njoktje, Northern njukča, Kildin нюххч, etc.)
  • Finnic *jouccën (> Estonian jõudsin, Finnish joutsen, Veps ďoutšin, etc.)
  • Mordvinic *lokśəj (> Erzya локсей, Moksha локсти)
  • Mari *jükćə (> Hill йӱкшӹ, Meadow йӱксӧ)
  • Permic *juśk- (> Udmurt, Komi юсь; Komi stem юськ-)
  • 18th century Mansi joschwoi

Two different reconstructions are found in the usual general-purpose sources: *joŋkće (UEW) and *ńokśi (HPUL). Neither of these can be called satisfactory, I think. In the following I’ll go over some issues that can probably be resolved, or at least worked around to some extent:


UEW’s reconstruction foremost suffers from the problem that CCC clusters are generally poorly supported for Proto-Uralic. Moreover, this particular case has only been supposed in order to explain the correspondence between an *u-diphthong in Finnic vs. syllable-final *k elsewhere (regularly metathesized in Permic). The cluster would thus have been then simplified to *ŋć in pre-Finnic, *kć elsewhere. This is a very ad hoc way of reconciling irregular reflexes — it would make just about as much sense e.g. to claim an onset cluster *nj or *lj, to account for the irregular *ń- in Samic or *l- in Mordvinic.

Earlier I had suspected that simply *joŋćə could be made work, but getting from a nasal to *k elsewhere appears too troublesome. It is true that Samic clearly has similar denasalization in *joŋsə >> *jōksë ‘bow’ (possibly indeed mediated by epenthesis to *ŋks), but Mari *jåŋež and Mordvinic *joŋs do not support supposing this. Also, while there is some curious resemblance here with Proto-Indo-European *ǵʰans- ‘goose’, it’s probably too little to be worth pursuing, in the end. The semantic difference aside, *ǵ⁽ʰ⁾ ~ *j would most likely indicate a loanword from IE to Uralic, while *ŋć ~ *ns looks like it could only work in some kind of an Indo-Uralic framework.

I now favor a different explanation: reconstructing the cluster *xć.

Distinguishing Proto-Uralic *x and *k is challenging. They have identical direct reflexation in all branches other than Finnic (where between vowels *-x- > ∅, but *-k- > *-k-) and Mordvinic (where between back vowels *-x- > *-j-, but *-k- > *-v-). This being the case, I consider it likely that some portion of the unusually frequent *-Ck- and *-kC- consonant clusters in Proto-Uralic [1] might be actually so far unidentified instances of *-Cx- or *-xC-. This is all the more likely these days, now that the old reconstruction by Janhunen of syllable-final *x for words showing a Finnish long vowel is no longer current (so e.g. PF *meeli ‘mind’ < *mälə and not **mäxlə; PF *koole- ‘to die’ < *kalə- and not **kaxlə-).

We also happen to already know that in Finnic, the regular outcome of syllable-final *x is vocalization to *u ~ *ü, as is shown by the two Finnic verb roots *souta- ‘to row’, *nouta- ‘to fetch’, which derive from PU *suxə- ‘to row’, *ńoxə- ‘to pursue’ (with the common verbal suffix *-ta- appended for no clear reason). [2] So the expected reflex of *-xć- in Finnic will be indeed *-uc-.

Nothing on the other hand seems to majorly contradict the idea that elsewhere in Uralic, syllable-final *x has again generally merged with *k. *souta- does has a Samic formal equivalent: *suvtē- ‘to transport by boat’, but perhaps this is an old Finnic loanword (vs. the bare root in *sukë- ‘to row’ being the native reflex)?

This reconstruction also allows better odds that the Mansi word indeed belongs here. László Honti has commented [3] that the loss of the entire cluster *ŋk seems unlikely. The development *x > ∅ would surely be less troubling though, especially syllable-finally in a compound word. (The second element is clearly *wuj ‘animal’.) He also notes that joschwoi (*jōšwoj?) seems cognate with modern Northern Mansi jūswoj ‘eagle sp.’, and that the earlier meaning of the word might not even have been ‘swan’ as much as ‘large bird’. But if these meanings are connected, I could just as well imagine a development route ‘swan’ > ‘large bird’ > ‘eagle’.

— Some other examples *xC clusters can perhaps also be located. Further research will be necessary, but one case that looks promising to me is Samic *piktē- ‘to heat’, which seems analyzable as a derivative < *pix-tä-, from the same root *pixə- ‘to cook’ (traditionally reconstructed as *peje- (UEW) or *pexi- (HPUL)) as Mordvinic *pijə-, Hungarian , Samoyedic *pi-. [4] Another possible case is ‘rope’: Mordvinic *piks, Khanty *püüɣəL, older Hungarian fiu < *pexsə. In both cases the vocalism in Samic (*i > *i and not > *ë) and Mordvinic (*i, *e > *i and not > *e) appears to suggest that *k < earlier *x. The same argument applies also to PS *u in ‘swan’.


Turning to vocalism, although earlier PU *o has been generally presumed in the 1st syllable, to me this seems based more on belief in the conservativeness of Finnic than in actual comparative considerations. In particular the correspondence between Samic *u, Finnic *o and Mari *ü appears fairly irregular. A lowering *u > *o before *x has sometimes been proposed for Finnic, mainly in ‘to row’ (cf. above); and it could be assumed here as well. This approach however does not work for Mari. In UEW, *ü is instead attributed to irregular “fronting influence” of the word-initial *j… but since the regular Mari reflex of *u is the “reduced-series” *ŭ (> Meadow Mari /u/, Hill Mari /ɤ/), then if starting from something like *juxćə, we’d surely expect the result of fronting to be its front counterpart *ü̆ (> Meadow Mari /y/, Hill Mari /ə/).

A more promising solution seems to be provided by some closely related soundlaws recently proposed by Mikhail Zhivlov: PU *ë-ə before a velar consonant > pre-Samic/Mordvinic *u, Finnic *o, which would turn up in some other widespread Uralic words as well, perhaps most prominently ‘to drink’ (S. *jukë-, F. *joo-, versus Mari *jüä-, Hungarian ív-). [5] Mari *ü and Permic *u will then turn out to be simply the usual reflexes of *ë.

I thus arrive at the improoved reconstruction *jëxćə.

All sorts of other issues still remain:

  • For the initial consonant, *j of course seems like the safest bet. Unlike Sammallahti, I see no reason to privilege Samic *ń-. There are no parallels for its lenition to *j- in the other Uralic languages. If anything, I wonder if the nasal can have an onomatopoetic function here? though swan calls do not strike me as especially nasal at all. Mordvinic *l- is a complete mystery too. I would suggest contamination from *lunta ‘goose’, but this word has not survived in the branch.
  • Mordvinic *o could indeed reflect earlier *u, but my tangentially covered etymology/reconstruction above of *pijə- ‘to cook’ < PU *pixə- assumes *x to block the lowering of close vowels, same as it does in Samic. But could *x > *k syllable-finally have already occurred in Mo. before this?
  • *oo in Mansi, if it belongs here, probably cannot be derived from *ë (and is also at best a rare reflex of *o as well).
  • Similarly, if sch is /š/, this cannot be derived from *ć. It’s attested as a “sporadic” reflex of *ś though.
  • The geminate affricate in Finnic is curious. Other cases of apparent *ć > *cc exist as well, e.g. PU *wäńćə > PF *väicci > Fi. veitsi ‘knife’; PF *icek > Fi. itse ‘self’ (but Es. ise). This is also in contrast to secondary affricates due to the Proto-Finnic sound change *ti > *ci coming out consistently short (and being then further reduced to plain s.)
  • The Mari dialects run the full gamut with their sibilants, showing varyingly š, s, ś, ć, and it’s not clear to me what is going on. The contrast between PU *ś and *ć is at least as difficult to reconstruct as that between *x and *k, though less due to coinciding reflexes, and more due to most subgroups being highly inconsistent on if they show affricates or fricatives (the only relatively clean cases are Samic and Samoyedic). Subgroup-internal snafus like this do not help.
  • I suspect *kć > *kś to be regular for Mordvinic, but who knows. There are very few examples of *Cć to begin with, and the most reliable-looking case (*ńëkćəm ‘gills’) has no Mordvinic reflex.

[1] For some discussion, see e.g. Mikko Korhonen (1986), On the reconstruction of Proto-Uralic and Proto-Finno-Ugrian consonant clusters, Suomalais-Ugrilaisen Seuran Aikakauskirja 80. Korhonen characterizes the issue as one of a lack of consonant clusters of the shape dental + dental (and indeed, *lt and *rt seem to have been absent from the PU lexicon) — but it is also the case that clusters with a velar member, e.g. *ŋk, *lk, *ks, appear to be much more numerous and better-established than clusters with a labial member, e.g. *mp, *lp, *ps.
[2] This result is, paradoxically enough, also due to Janhunen. He accomplishes this by assuming an early vocalization in words of the ‘mind’, ‘to die’ type, and a late vocalization in ‘to row’, ‘to fetch’. See Juha Janhunen (2007), The primary laryngeal in Uralic and beyond, Suomalais-Ugrilaisen Seuran Toimituksia 253: 203–227.
[3] László Honti (1985), Etimológiai adalékok, Nyelvtudományi Közlemények 87/2: 444.
[4] Komi pu- ‘to cook’ will have to be discarded from this etymology however, it seems. Not that this is regular either way: while *e normally yields close back /u/ in Udmurt and Jazva Komi, almost all clear cases show /o/ in mainline (Zyrian / Permyak) dialects of Komi, a correspondence that is normally also attributed to a different Proto-Permic vowel from those showing /u/-across-the-line. I would also separate Mansi *pääj- ‘to cook’, though this word will not have to end up as an etymological orphan: it seems instead derivable from the “heat-and-light” root *päjə, whose potential reflexes include e.g. Samic *peajvē ‘day, sun’ and *peajō- ‘to shine’, Finnic *päjwä ‘day’, Komi bi ‘fire’, Hungarian fehér ‘white’, Khanty *pääj ‘thunder’, Samoyedic *päjwä ‘heat’ (some of these more, some less probable — the semantics in particular would require review).
[5] Mikhail Zhivlov (2014), Studies in Uralic vocalism III, Journal of Language Relationship 12: 115–116. I am actually not entirely sold on this idea yet, since the changes seem to lack phonetic motivation, and are based on a very small set of data. But for now it seems at least worth keeping in mind.

Tagged with: , ,
Posted in Reconstruction

Linkday #3: Phylonetworks Dot Blogspot

A blog discovery today, that I however find a bit too tangentially related to add to my main histling blogroll: The Genealogical World of Phylogenetic Networks. They mostly discuss phylogenetics in general, with most examples drawn from biology — but for once, linguistics also has its share, covered by Johann-Mattis List (who’s also the author of some interesting papers on the same topic). A few example posts, and some comments of mine:

This highlights something I’ve noted for a while now: there seems to be no properly illustrative generally accepted way of presenting wave theory models of language history. Most variations are indeed pretty much ahistorical (the local Uralic studies iteration of this would be Salminen’s “ball graphs” [1]). I’m even aware of at least some related models that explicitly admit this fact and do not claim to represent language change, such as van Driem’s “fallen leaves” model. [2] I’m not sure though if I would call the theory itself ahistorical as much as often poorly presented … but this might not be the time to defend the concept in detail.

This focuses on the inexactness of current linguistic terminology regarding the relatedness of words. We largely make do with a single term ‘cognate’, but there does not quite seem to be consensus regarding if this means 1) words related only thru inheritance, or also words related thru loaning; and 2) only words with the same meaning, or also words that have changed their meaning along the way.

List’s proposed amendments don’t particularly resonate with me, though. I for one am happy to extend the word ‘cognate’ also to words related in a way that includes loaning. This is especially since I think it’s quite difficult to rule out entirely the possibility of loaning having occurred during a word’s transmission. Any linguistic innovation starts at a point, and diffuses across its speaker community effectively by loaning. Since the language/dialect distinction is somewhat arbitrary (more sociolinguistic than objective), it follows that distinguishing loaning from inheritance is somewhat arbitrary as well.

For example: in various other languages of the world we can easily call jazz a loanword from English. But shall we also call it (at least in the sense of the music genre) a loanword from American English in British or Australian English? Or a loanword from Chicago English in New York English? More relevantly, supposing that a 30th-century linguist examines this word’s descendants across the Anglic languages of their times, would they be justified calling the American and Australian words cognate — or should that label be reserved only for words deriving from pre-colonial Early Modern English? And finally: how well do we know that our alleged “cognates proper” deriving from prehistorical protolanguages do not have this kind of a background?

There’s also the fact that in a certain fundamental sense, all linguistic transmission from one generation to the next can be said to be comprised of loaning, though usually in a fashion faithful enough that we categorize it off as “inheritance”.

Similarly I do not think the lack of emphasis on semantic identity is a major flaw. Again, while clear-cut cases of words with dissimilar meaning exist, it’s a matter of taste where to decree that two words now do have “the same” meaning. Is the fact that English head and Finnish pää ‘head’ are used in several different idiomatic ways an obstable to claiming that they’d have the same meaning? Or the fact that pää in some usages is better translated as e.g. ‘end’, while head in some usages is better translated as e.g. johtaja? Or, even if we supposed two words in two modern language varieties to be completely identical in meaning: would we have to again count them semantically non-homologous if we could show that their common proto-form had also some additional meaning that was in both languages later lost?

I do agree that having more detailed terminology would be helpful… but I’d perhaps start adding it from the other end. Since it’s loaning and semantic change that are visible, positively identifiable processes, it might be more productive to have terms denoting “cognate via loaning” and “cognate but semantically changed” specifically, rather than attempting to circumscribe the negative categories “appears cognate purely by inheritance” and “appears cognate without any semantic change”.

[1] As in e.g. Salminen, Tapani (1999): Euroopan kielet muinoin ja nykyisin. In: Fogelberg, Paul (ed.): Pohjan poluilla. Suomalaisten juuret nykytutkimuksen mukaan. Helsinki: Suomen Tiedeseura.
[2] Outlined in van Driem, George (2001): Languages of the Himalayas: An Ethnolinguistic Handbook of the Greater Himalayan Region. Leiden: Brill.

Tagged with: , ,
Posted in Links, Methodology

Errata for the Karelian dialectal atlas

A recent acquisition of mine has been the not long ago released dialectal atlas of Karelian: Диалектологический атлас карельского языка / Karjalan kielen murrekartasto, Helsinki 2007; based on data collected in the 1930s. Covering 209 traits — many of them with some half a dozen possible values, for on the order of 1000 data series altogether — and almost 200 individual varieties from the Republic of Karelia (thus including also Ludian), this will sure come useful e.g. for assessing which Eastern Finnic innovations should be considered historically indicative of original dialect divisions and which have later diffused between varieties. Results of this sort will in turn also inform how to interpret conflicting isoglosses in other situations.

There however appear to be some annoying errors in the material. So far I’ve noticed one larger issue: several maps that trace the development of secondary long vowels seem to mix the Olonetsian (Livvi) varieties and the central Karelian varieties of Paatene. The latter group is correctly shown in the early maps as characteristically reflecting Proto-Finnic and post-Proto-Finnic *aa and *ää as /oo/ and /ee/. However, in later maps tracing the development of long vowels contracted from e.g. *a(i)+e, *ä+e (after the loss of *d or *g), long mid reflexes are instead attributed for Olonetsian, while the Paatene dialect is claimed to have /ai/ or /äi/.

The map data per se seems correct though, and I guess it’s rather the map legends that’s the problem, since in most cases this error comes in the exact same typographical form: the symbols square with black right half (◨) and circle with black upper left and black lower right quadrants (for which I can’t seem to find a Unicode glyph) appear to have been switched on the following maps:

  • Map 16 (reflexes of *käde-, weak oblique stem of *käci ‘hand’)
    cf. the examples for käsi @ KKS: Paatene sg.gen. kɛɛn, Nekkula/Riipuskala sg.iness. käiz
  • Map 18 (reflexes of *näge-, weak stem of *nähtäk ‘to see’)
    cf. the examples for nähä @ KKS: Paatene 1p.sg. ńɛɛn, Nekkula/Riipuskala 2p.sg. näit~näid
  • Map 19 (reflexes of *avaideldak, frequentative of ‘to open’)
    cf. the examples for availla @ KKS: Nekkula/Riipuskala 2p.sg. availed
  • Map 20 (reflexes of *lainoideldak, frequentative of ‘to swallow’)
    cf. the examples for lainoilla @ KKS: Nekkula/Riipuskala inf. lainoilla

In the last case, the “swapped” dialects are rather Livvi and Kuuďäŕv́ Ludian, with ˣuu rather than /oi/ attributed to the former, and ˣoi rather than /uu/ attributed to the latter. (At Paatene no special development applies to “Proto-Karelian” *oo, which is reflected as /uo/, same as everywhere else.)

Also incorrect is Map 17 (reflexes of *pagettak ‘to escape’), though here the swapped symbols are rather black triangle (▲) and circle with black left half (◐).

The rather similar Map 15 (reflexes of *lage-, weak oblique stem of *laki ‘top, ceiling’) and Map 21 (reflexes of *lukkudeldak, frequentative of ‘to lock’) appear to be however correct.


A second, lesser problem appears on Map 24 (reflexes of *habukka ‘hawk’), where a symbol ▲ is marked for the apparently heavily divergent Vaśiľiskoi dialect of Tver Karelian — but no such symbol appears on the legend. I have no idea if this is a typographical error or an omission from the legend.

Tagged with: , , ,
Posted in Commentary

Gradation of *st in Finnic (and related complications)

The development of consonant gradation in Finnic (and why not, also elsewhere in Uralic) is one of those topics that really needs a new monograph-scale treatment one of these days. Not just for the sake of collecting the accumulated knowledge in a single source, either. Modern understanding of linguistic theory and methodology would probably allow not only improoved description of gradation as it works in the modern languages; it should also help better tackling the historical puzzles involved.

Some overall observations are simple enough to make. Perhaps the main historical trend in Finnic has been the gradual morphologization of gradation, departing from its original phonetic roots, and being generalized in some environments, levelled in others.

One good example is the gradation of /t/ in consonant clusters: we can reconstruct *nd, *ld, *rd as the original weak grades of the sonorant-initial clusters *nt, *lt, *rt (exactly parallel to *d as the weak grade of intervocalic *t); however, after further phonetic development, most Finnic varieties now rather have /nn/, /ll/, /rr/. Given the pattern consonant+t : geminate consonant here, it is not too surprizing that large swaths of Finnic varieties have also introduced the parallel alternation /st/ : /ss/, even though the weak grade clearly cannot originate from earlier **sd. [1] The innovation covers Karelian proper (but not Livvi/Olonetsian); Ingrian; and various eastern dialects of Estonian. Finnish dialects, though, have no trace of this.

— A small example for readers who are not especially familiar with how Finnic root-medial consonant gradation works in practice: the verb roots *anta- ‘to give’, *osta- ‘to buy’ yield in Karelian the 1st person singular forms annan ‘I give’, oššan ‘I buy’, with /nn/ and /šš/ as the weak-grade forms of /nt/, /št/. Finnish by contrast has similarly weak-grade annan, but a “strong-grade” form (rather: unaffected by gradation) ostan. In both languages, the underlying cluster also remains e.g. in the infinitive forms: Krl. antoa, oštoa; Fi. antaa, ostaa.

Which morphological forms show gradation can be predicted from the Proto-Finnic syllable structure. 1PS *andan, *ostan have a closed 2nd syllable, triggering lenition of *t, while the infinitives *antadak, *ostadak have an open 2nd syllable, and the original voiceless stop remains. However, as we can see, in which exact phonetic environments /t/ is affected by gradation varies by language. In this case gradation appears more heavily morphophonologized in Karelian than in Finnish; both languages however retain the original phonetic conditioning factors of gradation (a closed 2nd syllable in annan, vs. an open one in antoa/antaa), and hence these cases of gradation could be still called morphophonological, not yet purely morphological.


Back on track. Interestingly, in the middle of the above-mentioned innovative area where original *st has been subjected to gradation, also another development is found: in Votic, *st is reflected as /ss/ in the strong grade, and single /s/ in the weak grade (thus e.g.: *ostada > õssaa, *ostan > õsaa). Traditionally this has been attributed to an early separate development: a general sound change *st > *ss would have taken place already before the analogical extension of *Ct-gradation. This would have been then followed instead by the analogical extension of the gradation pattern geminate voiceless stop : singleton voiceless stop also to the voiceless fricative /s/, now abundantly found as a geminate as well.

Looking at Votic in isolation, this seems like an entirely possible account. However, in an areal context this is less clear. Votic is the most persistently innovative Finnic language with respect to consonant gradation: it is applied productively even to recent loanword consonants and clusters from Russian (resulting in such unique alternations as /pk/ : /bg/). In this light, it seems like an unusual coincidence that, at the epicenter of the eventual /st/-gradating area, [2] Votic would have already early on been established as an island that opted for a different solution altogether. An alternate hypothesis would be to suppose that Votic once used to have the more widespread pattern *st : *ss as well; and that the attested pattern /ss/ : /s/ represents simply a further development for this. This is fairly easy to arrange. We can continue to assume *st > /ss/ as a regular sound law, and would have to merely combine it with also assuming *ss > /s/, as a mini-chainshift of sorts.

Not everyone will like this kind of complication just for the sake of neater geographical generalizations, I’m sure. But there’s an interesting piece of evidence that appears to be in favor of my new analysis. The inessive case ending, reconstructible as Proto-Finnic *-ssA, [3] is in fact found in Votic as -za ~ -zä. This also suggests exactly the development *ss > *s, followed by voicing between two unstressed syllables (— also known as “suffixal gradation”, though the process differs from regular gradation in a number of ways).

This seems to be all the evidence we can hope to get together on the development of *ss in Votic, though. As far as I know, there are no root-medial cases of *ss that could be reconstructed for Proto-Finnic, and not even any especially old loanwords. The most widespread might be the obviously recent Fi/Krl. pyssy ~ Es. püss ~ Vo. püssü ‘gun’ (from Low German).

It’s again possible to stitch together a different, Votic-internal analogical explanation for the inessive. Kettunen in Vatjan kielen äännehistoria has done that already a century ago, starting from how we can find in Votic also a “strong-grade” [4] illative ending -sEE < *-sEn somewhat more widely than in the average Finnic variety. But generally I think a phonological explanation that takes care of multiple problems should be considered preferrable to unrelated analogical accounts, as long as no additional problems are introduced.

Moreover, if gradation of *st entered Votic as a relatively late analogy, we should perhaps expect it to fail to take root in environments where no analogical motivation for its extension was available. Such a case can indeed be found: the adverbial ending *-stik, found in Votic as simply -ssi(g). Contrary to Kettunen, early loss of final *-k cannot be blamed, since Eastern Votic is one of the few Finnic dialect groups where it in fact remains in some individual varieties: e.g. alassig ‘naked’ (and not ˣalazig). [5] Eastern Votic fails to apply gradation also to infinitives of s-stem verbs, e.g. pessäg ‘to wash’. Infinitives of resonant-stem verbs (e.g. tulla(g) ‘to come’, mennä(g) ‘to go’) would have provided a weak source of analogy though, explaining Western Votic pesä etc.

Even this last-mentioned analogy actually makes more sense if we assume it to have been earlier in relative chronology: from strong-grade *pestä(k) to weak-grade *pessä(k), rather than from strong-grade pessä(g) to weak-grade pesä. In the latter case I would definitely expect the pattern single consonant stem : geminate consonant infinitive to be more salient than the rather abstract grade difference. [6]

On the other hand, a “relatively late” date for the new look of *st-gradation could still be fairly early in absolute chronology. Evidence from Krevinian (an enclave dialect of Votic once spoken in Latvia, separated since the 15th century) suggests that the gradation pattern /ss/ : /s/ existed already by the late Middle Ages, and Kettunen even calls this “one of the oldest changes in Votic” (presumably meaning: one of the oldest uniquely Votic changes). But given that Proto-Finnic dates to around 0 CE, this still leaves plenty of time for the development of first the usual form of, and later a uniquely Votic type of *st-gradation.

There are a couple of what might look like chronological issues to this scenario. E.g. the introduction of /rt/ : /rr/ is, IIRC, perhaps as late as 17th century in parts of western Finland. Though I wonder if this should be taken as indication that, contrary to traditional Finnocentric default assumptions of all influence within Northern Finnic varieties having flowed from the west to the east, this gradation pattern is in part Karelian influence in western Finnish. (Perhaps even /lt/ : /ll/ and /nt/ : /nn/?!)

I also see it now and then assumed that Votic is the sole autochthonous language of Ingria, and that all varieties of Ingrian represent later intrusions from north of the Gulf of Finland (much like we know to be the case with Ingrian Finnish). However, at least the case of the Kukkuzi dialect is problematic. Traditionally analyzed as heavily Ingrianized Votic, it is also at least equally well analyzeable as a Voticized Northern Finnic variety. And a bit further west, the same problem arises with northeastern coastal Estonian as well: the dialect shows clear effects of Finnish contact, but also indications of original, more deep-reaching Northern Finnic affinity, e.g. the complete absense of õ, in whose place we find a perfectly etymological distribution of e versus o. (I wonder if anyone’s ever tried hunting for isoglosses connecting Kukkuzi and NECEs in particular.)

If Votic has continuously had Northern Finnic neighbors already since early on (early contacts are traditionally indeed assumed on lexical and morphological grounds, but this generally seems to have been considered to be a separate issue from contact with Ingrian in particular), again all the easier to assume that at one time, it too had the general “Central-Eastern Finnic” gradation pattern *st : *ss.

These days I even find myself wondering if all the other main dialects of Ingrian proper can really be derived from Karelia, even if we also count the Karelian Isthmus. Their differences strike me as sharper and starker than those between the individual dialects of Karelian. It seems clearly implausible to assume a single late migration, followed by rapid diversification within a minisculous geographic area. Assuming 3-4 wholly separate backmigrations would also be contrived, at least if we cannot find historical correspondences for them (like how the introduction of Ingrian Finnish has been connected to Ingria’s brief period as a part of the Kingdom of Sweden). Perhaps coastal Ingria in general was simply never Votic-speaking, and is in fact rather the original Eastern Finnic homeland…? A topic for another post, another time though.

[1] The even more widespread gradation pattern /ht/ : *hd must have a different explanation, though. There is, as far as I can tell, no evidence for a geminate **hh. I presume this pattern instead emerged very early, around the time middle Proto-Finnic *š had finished its trek backwards and settled as /h/, i.e. no longer an obstruent; and *d had in most Finnic varieties lenited to /ð/, but had not yet been lost. Around this time [ɦð] would have nicely paralleled clusters such as [lð] and also the likes of [ɦl], [rɦ]; especially if we assume that /ð/ did not phonologically hold the status of a dental fricative, but that of a dental approximant.
[2] As measured by the general patterning of Finnic isoglosses, not by raw geography. The latter method would leave Votic off at the southeastern fringe, and would probably put the center-of-mass of the innovation somewhere in southeastern Finland…
[3] South Estonian and Southern Ostrobothnian Finnish indicate the more archaic variant *-snA, but I would assume that this was established as a free or stylistic variant by Proto-Finnic, instead of being retained in an allomorphic distribution of some sort. South Estonian in particular appears to have been eager to extend consonant cluster assimilations from unstressed positions also to post-tonic positions (e.g. *koktu > kõtt ‘stomach’, *maksa > mass ‘liver’, *sakna > sann ‘sauna’), so it would be quite odd if here a “strong-grade” allomorph has instead been generalized from positions with secondary stress.
[4] I am not fully convinced that the Proto-Finnic alternation *s : *h, found in some suffixes and in *s-stem nominals, has anything to do with consonant gradation, given its total absense root-medially; not even in words like *vasikka (or *vasëikka?) ‘calf’, where there should not have been any possibility for the analogical reintroduction of /s/ from strong-grade forms. If the case of Fi. lähellä ~ läsnä ~ lästä ‘near’ would constitute an example doesn’t seem clear. For one, as cognates have only been found from Mari and Samoyedic, we do not know for certain if the root is to be reconstructed as *läšə- or *läsə-; and läsnä, lästä could potentially be explained as based on the inessive and elative (pre-PF *lähe-snä, *lähe-stä?) rather than continuing the archaic locative and ablative (pre-PF *läS-nä, *läS-tä). For two, these being adverbs, *s > *h in prosodically unstressed positions seems possible (as is probably the case for the 3rd person pronouns *hän, *hek).
[5] Similarly also e.g. Karelian alašti, not ˣalašši.
[6] A possibility that however remains is that the Eastern Votic forms might be rather analogical reintroductions, again on the basis of the geminate infinitive forms of n, l, r-stems. Still this seems to be “the wrong way around”, given that these forms retain final –g as a transparent trigger for gradation, while the weak-grade Western Votic forms do not.

Tagged with: , , , , , , ,
Posted in Reconstruction

Finnic o-umlaut, continued

I’ve often seen the Finnic languages considered to demonstrate that vowel harmony acts a counterforce to the common tendency for second-syllable (“stem”) vowels to trigger various conditional developments (umlauts) of first-syllable (“root”) vowels. At least within the larger Uralic comparative context, this indeed appears to be the case. There is even the illustrative case of Livonian, a Finnic language which has both lost vowel harmony and innovated a process of *i-umlaut (appearing e.g. in the nominative singular forms of nouns: *käci > ke’ž ‘hand’, *tammi > täm ‘oak’, etc.)

This however does not need to imply that vowel harmony languages are somehow categorically immune to umlaut developments. I’ve already briefly examined a possible shift *ë-o > o-o for Votic. It also seems another somewhat similar case can be found as well, this time though appearing wider across Finnic. More interestingly, this involves umlaut “against the grain” of Finnic vowel harmony — in backness.


To start from the beginning, today’s observation traces its roots to Janne Saarikivi’s paper “ystävästä, uskosta ja vokaaleista“, published 2010 in the eminent Finnish etymologist Kaisa Häkkinen’s Festschrift Sanoista kirjakieliin (SUST 259). This treats the Finnic word group for ‘friend’, whose representatives include Finnish ystävä, Estonian ustav, Livonian ustõb (and which has also been borrowed to Samic; e.g. Northern Sami ustit). The words clearly resemble fossilized participles, but various competing ideas have been suggested on what the original root would be, exactly. Saarikivi argues convincingly that the best option appears to be connecting the words to the same root as e.g. Fi. usko(-) ‘belief, to believe’: i.e. *uskV-(t)ta- > *usta- ‘to be true/reliable, to consider a friend’ > *usta-ba ‘(one who is considered a) friend’.

His explanation for the phonetic development is, however, slightly awkward. Drawing in some known parallels from Permic and Khanty, and bringing in some new Samic evidence, he suggests that this word group could be traced back to a Proto-Uralic (transitive) verb root *iskə- ‘to believe in’. From this a reflexive (intransitive) derivative *iskə-w > *isk-o- would have been created, which would have been backed to *usko- in Proto-Finnic; followed by re-fronting in Finnish, for “similar unclear reasons” (a very hazardous form of argument IMO) as appears in some other words, e.g. Fi. muhku ~ myhky ‘clump’.

Petri Kallio has instead proposed in passing what seems to me like a clearer explanation; again in his paper “Jälkitavujen diftongit kantasuomessa” that I seem to have brought up a couple of times by now. [1] According to him (and I agree), a more expected initial development in Finnic would be *iskə-w- > *iskü- > *üskü-. The labiality assimilation here is a known sound development — and incidentally presents another minor example of umlaut in Finnic. Following this he suggests resuffixation: *üsk-o(-) > usko(-). This latter step involves what seems like a previously unproposed sound development: *ü > u by the influence of second-syllable *o.

Phonologically, this sounds reasonable: it’s generally accepted that *o in unstressed syllables remained outside vowel harmony in Proto-Finnic, and o/ö-harmony as found in modern Finnish/Ingrian/Karelian (and partly Veps) only emerged later. In other words, PF second-syllable *o was indeed specified as [+back], and could pass this feature also to a first-syllable vowel.

(Also, though Kallio does not say as much, this two-tier scenario seems to even explain Fi. ystävä. Back when *UstA- was still around as a separate verb ‘to consider reliable/a friend’, we could consider *üskü-(t)tä- > *üstä- the regular development, attested in Finnish; competing with a variant *usta-, attested in southern Finnic and Samic, which would have gained its /u/ by analogy with *usko(-).)


Getting to the point though, this idea has drawn my attention to what looks like a phonotactic gap in Proto-Finnic. Although we can reconstruct PF *-o following most first-syllable vowels (e.g. *ilo ‘joy’, *veto ‘pulling’, *käko ‘cuckoo’, *pato ‘dam’, *pëlto ‘field’, *kolo ‘hole’, *puno- ‘to weave’ [2]), there do not seem to be any recognized cases of the vowel structure *ü-o. Even in modern Finnish, cases of y-ö are fairly rare. This seems like grounds to formulate a hypothesis. I suggest that Kallio’s proposed “o-umlaut” development is not merely an isolated sporadic example, but a full-fledged soundlaw: Proto-Finnic *ü-o has regularly yielded later u-o.

Investigating this possibility is going to be a bit difficult, though. PF *-o continues to have no firmly established regular origin (other than the dissimilation *ai > *oi in unstressed syllables after *a, *e, *i, which is not relevant here), and is mainly concentrated in derivatives and loanwords. Some particular morphological groups’ cognates in Mordvinic suggest the development *Aw > *o, but in others there seems to be no indication of this.

Regardless, here are a couple of doublets & such I’ve identified in Finnish that might be indicative of this same “o-umlaut” as usko(-):

  1. ulotta- ~ ylettä- ‘to reach smth’. The former’s been considered to derive from the postposition root ulko- ‘out’, the latter from ylä- ‘up’. While these are fairly close-by concepts, and while there are even particular expressions that appear to support this derivation (e.g. ulottaa kätensä ‘to extend / reach out a hand’, ylettää kattoon ‘to reach (up ’til) the ceiling’), these could regardless be due to semantic contamination. This seems to be confirmed by Veps /uluta-/, Livonian ulātõ, which both suggest roughly PF *ulotta-, not *ulgotta-. Alas, the 2nd-syllable vowels in these are aberrant, which suggests that the history here has probably been somewhat more complex.
    SSA proposes an alternate analysis: deriving ulotta- and ulko- both from a root *ula-, allegedly also present in ulappa ‘open sea’ and ullakko ‘attic’, but this seems to run contrary to all regular patterns of Finnic word derivation. [3]
  2. luppo ‘beard lichen’. No Finnic variants pointing to *ü are known, but the word’s Samic cognates interestingly enough uniformly indicate an original front vowel. Although there are various known cases of *ë/*o vacillation in Samic, both PS *lëppō and the Finnish form could also simply derive from earlier *lüppo.
    A possible problem though is that the Finnic word is only found in Finnish and northern Karelian, and perhaps is rather to be explained as a Samic loan. It would be possible to speculate with late retention of *ü in early Samic, or an umlaut development of *ë-ō in the loaning Samic variety, but I’ve nothing solid to go on on with that line of thought.
  3. pursto ~ pyrstö ‘tail’. This is one of the examples of “frontness alternation” that Saarikivi mentions. Supposing *pürsto as a starting point would allow some Finnic dialects to evolve pursto, others pyrstö. On the other hand, it might be a problem that the words likely derive from Germanic *burstō- ‘bristle’ (which may seem semantically distant, but Karelian and some Finnish dialects retain an intermediate sense ‘dorsal fin’). [4] SSA suggests that pyrstö, the variant with a narrower distribution, could be due to contamination with pyrise- ‘to shake’ (intr.), pyristä- ‘to shake’ (tr.), which seems equally possible.
  4. ruho ‘body’ ~ ryhä ‘hump’. A comparison that would not have struck me as obvious, but SSA analyzes the latter as a “variant” of the former. Normally we’d expect an A-stem to be more original than an o-stem though. This is also suggested by the loan etymology of the words from Germanic *xruza- ‘corpse, pile’. Or, given the y-vocalism, perhaps rather from Old Norse *hryRa-? Thus *rühä → *rüh-o > ruho seems like a possibility.
  5. runno- ‘to cram, mangle’ ~ ryntää- ‘to rush (into)’. The interference of various other words is possible (e.g. ruhjo- ‘to injure, mangle’, säntää- ‘to rush’) but what makes me suspect indeed common origin is the irregular variation nn ~ nt, appearing in both groups here. In standard Finnish the two verbs themselves have ended up in different “grades” (not quite in accordance with regular consonant gradation), but further derivatives include e.g. runtu ‘dent’, rynniä ‘to rush suddendly’ (punctual).
  6. rusto ‘cartilage’ ~ rysty ‘knuckles’. SSA connects the two as being of “similar descriptive origin”, but they could be connected as straightforward parallel derivatives (*rüst-o, *rüst-ü). Less clear is if rusikka ‘fist’, also mentioned by SSA, is a part of the same cluster. This does not seem necessary, but if it does, *u will probably have to be more original (and we’re back to square one).

None of the cases appear crystal clear, but being able to get six examples together still suggests to me that this is probably on to something. OTOH also chronology seems slightly problematic here. The Saarikivi–Kallio scenario for usko would require that *ü-o > u-o was later than the assimilation *i-ü > *ü-ü; yet this has also been quite late, being e.g. found in (standard) Finnish (*pisü- > pysy- ‘to stay’, *pistü > pysty ‘erect’) but not in several dialects of Karelian (pisy-, pisty). [5] My proposed new derivation of ryhä~ruho from Proto-Norse rather than Proto-Germanic would also require a date well after Proto-Finnic for this change. But if it rather dates to the common (Western) Finnish era, then even counterexamples with narrower distribution in Finnic become relevant. Some troubling cases might be the following:

  • kylvö ‘sowing’, kyntö ‘plowing’ (← kylvä- ‘to sow’, kyntä- ‘to plow’). These are a part of the wider pattern of deverbal -O/U-nouns. The pattern -tA- : -tO is particularly productive though (kääntö ‘turn(ing)’ ← kääntä- ‘to turn’, säätö ‘adjustment’ ← säätä- ‘to adjust’, ääntö ‘articulation’ ← ääntä- ‘to articulate’, etc.), which might have motivated the creation of kyntö in place of expected kunto, followed by semantic analogy to produce also kylvö? Dialectally, a variant kylvy is fairly widespread (and even kynty is attested).
  • pyytö ‘plea’ (← pyytä- ‘to ask’). A similar derivative as the above two, but here we could additionally suppose that long yy was perhaps unaffected by this umlaut.
  • kytö ‘slash-and-burnt field’. Likely also a derivative of the above type, from kyte- ‘to smoulder’. However, from an e-stem verb I’d expect kyty, which is indeed attested in a few dialects (cf. also e.g. kylpy ‘bath’ ← kylpe- ‘to bathe’, käsky ‘order’ ← käske- ‘to order’, sylky ‘spittle’ ← sylke- ‘to spit). Could kytö be a late re-suffixation to avoid homophony with kyty ‘brother-in-law’?
  • tyttö ‘girl’; fairly widespread, with cognates found in most of Northern Finnic, as well as in Votic. However, the variant tytti seems to be older yet, with cognates extending also to Estonian and Livonian. This seems like a similar innovation as the replacement of Fi. isä ‘father’ with iso in Karelian.

It appears that a clearer picture of the development of 2nd-syllable labial vowel suffixes in Finnic will be needed for making progress here.

[1] Footnote 13, to be specific. For the further ref details, see e.g. my previous post. Perhaps I should establish a policy to add anything I cite at least twice to my Literature page?
[2] The idea of a contrast between *e-o and *ë-o (cf. Estonian vedu vs. põld : põldu-) is provisional and not particularly crucial to the point.
[3] I wonder rather what’s exactly the relationship between ullakko and lakka ‘roof’. SSA suggests irregular contraction from *ula-lakka, but perhaps there is rather some kind of a prefix *ul- in here. In that case, it would be also possible to analyze ulappa as *ul-appa, where the 2nd component would probably derive in some fashion from Proto-Norse *haba- ‘sea’ — probably thru Samic *āpē ‘open space’, in light of the developments *h > ∅ and *b > pp (similarly to what we see in northern Finnish aapa ‘open bog’). While very hypothetical, this approach still seems more promising to me than the notorious proposal that ulappa would be one of the no more than two or three Finnic words to have allegedly retained the Proto-Uralic derivational suffix *-ppa. — It would even be formally possible to derive ulotta- as *ul-otta-, i.e. based on the verb otta- ‘to take’?… probably not a good idea though.
[4] E.g. Wiktionary appears to have *bursti- though, which could allow deriving an i-umlauted reflex in Finnic after all; but this looks like a reconstruction mainly based on English. Hellquist’s old Svensk etymologisk ordbok suggests *burstiō-. I am not able to assess offhand which of these stem type variants is best supported.
[5] Or could these be late analogical reversals, on the basis of related formations such as *pistä- ‘to stick’?

Tagged with: , , , ,
Posted in Commentary, Etymology

Proto-Finnic *c in Karelian

During some casual investigation of Karjalan kielen sanakirja, I appear to have stumbled on something interesting.

One of the more distinctive innovations among the Karelian dialects is the reflexation of Proto-Finnic *s. In Northern Karelian, and in the northernmost dialects of Southern Karelian (including the Tver, Tihvin and Valdai dialects, spoken by Karelians displaced much further south), this is by default retracted to postalveolar š: see e.g. kešä ‘summer’, oštoa ‘to buy’, šilmä ‘eye’. [1] The development, though, is blocked by a preceding *i, as in e.g. aisa ‘shaft (in harness)’, pisteä ‘to stick’, muistoa ‘to remember’, viisi ‘5’.

In Livvi aka Olonets Karelian (likewise also in Ludian and Veps), funnily enough *s is inversely shifted to š only after *i. This probably indicates that Old Karelian & Old Veps had no **š, and this sibilant split initially produced an allophonic contrast of palatalized [ś] versus unpalatalized [s]; in the northern part of the dialect area, the latter was eventually retracted, and in the southeastern part, the former.

There are several complications to this textbook picture, though. First, as is common, dialect loaning etc. seems to have generated a rather messy border for the sound changes… The Paatene, Mäntyselkä and Porajärvi dialects appear to be particularly inconsistent cases. (Later palatalizations like *-si > *-śi could maybe explain some cases.)

But more interestingly, there are also some other cases where s is found thruout almost all of the Karelian varieties. And it appears that sometimes these can be archaisms of a sort: the distribution goes regularly back to Proto-Finnic *c, rather than *s. So far I have located the following cases:

  • *acja > asie ‘matter, thing, errand’
  • *acrajin > asrain ‘trident’
  • *kecrädä- > kesrätä ‘to spin thread’
  • *käci > käsi ‘hand’
  • *ocra > osra ‘barley’
  • *suci > šusi ‘wolf’
  • *toci > tosi ‘true’
  • ? *vacara > vasara ‘hammer’ (South Estonian vassar does not suggest *c, but the word’s origin from Indo-Iranian *vadźra does.)
  • *veci > vesi ‘water’
  • *ükci > yksi ‘1’

Possibly also *vooci > vuosi ‘year’, which has š (< ? *s) in “central Karelian” (Repola, Rukajärvi, Paatene, Mäntyselkä, Porajärvi), but s (< ? *c) in “Northern Karelian proper” (Uhtua, Vuokkiniemi, Kontokki).

It’s notable that the cases here do not only include cases of *c resulting from the Proto-Finnic assimilation *ti > *ci; they also include all cases known from Karelian of the somewhat anomalous PF cluster *cr — which therefore appears to confirm Petri Kallio’s recent proposal that it should indeed be reconstructed as *cr, not as *str.


How to understand this correspondence? It does not seem to be possible to assume that the development has been *c > *ś. Not only would this be phonetically awkward (especially since the geminate *cc is reflected as non-palatal čč across all of Karelian), it also seems to be the case that the Tunkua dialect has *c >> š, in contrast to *ś > s. On the other hand, assuming the retention of PF *c as an affricate all the way ’til the full disintegration of Karelian would also be problematic, for starters e.g. because the weak grade of PF *cc would most likely have already also become a short affricate *c by this time. Yet a gradation pattern ttš : s has not been attested from anywhere in Karelian (nor elsewhere in Finnic), AFAIK.

So I would suppose that the reflex of *c in Old Karelian must have been a second sibilant phoneme; which might be simplest denoted *s₂. Precedents from elsewhere (e.g. Castilian Spanish, Old French, High German) suggest though that the most likely phonetic value for this would be a laminal sibilant [s̻], contrasting with *s₁ < *s as an apical sibilant [s̺]. This would additionally explain why the shift *s₁ > š exists in the first place: to enhance the contrast between this and *s₂ (just as also occurred in German).

A hypothesis of *[s̻] in Old Karelian also seems to offer some new possibilities for explaining the later development of PF *cr. I have already for quite some time wondered about its development to tr in Eastern Finnish dialects on one hand, in Ingrian on the other; which seems to be an odd contrast to sr in Karelian (and also Livvi/Ludian/Veps). In Ingrian it could be easily attributed to contact with Estonian/Votic, but this is further complicated by *sr > zr appearing after all in the Soikkola dialect. Moreover, Western Finnish has *sr > hr. All this makes the Eastern Finnic tr-area look more like a secondary innovation than an isogloss shared with Southern Finnic. Perhaps a development along the lines of *sr > *θr > tr could be assumed? — And at this point it becomes quite handy that *[s̻] > [θ] is also a fairly common innovation (cf. Castilian Spanish again; or, Old Persian). If we only had to assume the fronting of this phone in particular, the scenario here seems to become a little bit better-grounded.


Not every instance of PF *c yields this Karelian *s₂, though. The usual reflexation of *s₁ appears in e.g. the following cases:

  • (intervocalic:) *kaca > kaša ‘corner (of ax)’, *keüci > keyši ‘rope’, *kuuci > kuuši ‘six’, *täüci > täyši ‘full’, *uuci > uuši ‘new’
  • (postconsonantal:) *kakci > kakši ‘2’, *küpci > kypši ‘cooked’, *lapci > lapši ‘child’; *künci > kynši ‘nail’; *hirci > hirši ‘log’, *orci > orši ‘perch’, *varci > varši ‘shaft’, *virci > virši ‘hymn’

There are also three cases yielding *ś after *i (going per the Tunkua reflexes): *niici > niisi ‘heddle’, *raici > reisi ‘thigh’, and the abovementioned *viici > viisi, which suggest that *c > *s in this set of words is altogether probably quite old, and they’ve after this simply developed as any inherited PF *s. At least *rc > *rs and *Uc > *Us could be simply regular.

Another question is word-initial *ci-, *si- (e.g. *cilta ‘bridge’, *silmä ‘eye’). A brief scan suggests that these seem to be reflected identically; though the Paatene dialect now consistently seems to have s (likewise word-initially before any other front vowels).

If any additional traces of this *s₁ / *s₂ contrast could be found elsewhere in Eastern Finnic (e.g. in Upper Luga Ingrian, which also has the change *s > š) will have to be left for further study.

[1] Though I could ask if this isogloss might make a better boundary between Southern and Northern Karelian than medial voicing; while pretty easy to locate, the latter is also a relatively trivial feature that seems likely to have just rubbed off on Southern Karelian as Russian influence. — KKS actually includes the dialect of Jyskyjärvi as a part of Northern Karelian even though it does have medial voicing. I also wonder what’s the motivation behind this choice.

Tagged with: , , , , , ,
Posted in Reconstruction

Some observations on Votic õ versus o

One of the bigger open problems of Finnic historical phonology is the shift *o > õ in Southern Finnic.

The non-front non-open illabial vowel õ found across Southern Finnic — the exact realization varies from /ɤ/ to /ɨ/ — most regularly corresponds to Northern Finnic e in words of back harmony; e.g. Estonian mõla ‘paddle’, põld ‘field’ ~ Finnish mela, pelto. If these cases should be reconstructed with front *e or back *ë in Proto-Finnic remains disputed, but the correspondence pattern is fairly unambiguous. [1]

Frequently, though, õ is also found in correspondence to NF o, e.g. in Es. õlg ‘shoulder’, õlg ‘straw’, hõbe ‘silver’, kõrv ‘ear’, lõhi ‘salmon’, sõrm ‘finger’ ~ Fi. olka, olki, hopea, korva, lohi, sormi. Cognates from elsewhere in Uralic, or in loangiving languages, fairly consistently indicate that Northern Finnic retains here the original state of affairs, and anything along the lines of a Proto-Finnic central rounded vowel **ȯ probably should not be reconstructed. [2] However, o also frequently remains; e.g. Es. oja ‘brook’, ots ‘forehead’, kolm ‘three’, tohtima ‘to dare’ ~ Fi. oja, otsa, kolme, tohtia.

The task, then, is finding the conditioning for the delabialization *o > õ. It’s been observed already long ago that no easy solutions are available. Which words show this change and which do not varies greatly already depending on the language variety in question. Previous analyses by e.g. Viitso and Raun have distinguished some 4-5 main distribution patterns. It seems probable that the issue cannot be fully answered without detailed analysis of the dialectal diversification of Estonian. Interdialectal loans, especially to and from the literary standards, [3] have probably also muddied the original distribution quite a bit.

This all regardless does not prevent partial progress being made. In particular, Livonian is known to be a clear-cut case: delabialization is here found effectively solely in the diphthong *ou and in the sequence *ovV, which entirely regularly yield õu / õvV.

This in mind, a look at how the change has played out in its other geographic extremum — Votic — might also be fruitful. This does not seem to have been done, though. Lauri Kettunen’s Vatjan kielen äännehistoria (1915) does not even attempt an analysis, and in case anyone else has since him examined the historical phonology of vowels in Votic in similar detail, I am not aware of it.

I will not be presenting a full analysis here, either, only a couple of hypotheses, based on a relatively quick look-over of the lexicon of the Mahu dialect. A look into data from other, fuller-documented dialects of Votic (these days well-summarized in Vadja keele sõnaraamat) will be necessary to confirm or deny the following ideas.


The main impression that emerges when examining Votic on its own is that attempting to determine conditions for *o > õ is probably the wrong approach. Delabialization occurs much more often than not, in almost any phonetic environment imaginable. Only two sources of modern o are immediately clear: firstly, long oo is entirely unaffected by delabialization (as also elsewhere in Southern Finnic); and secondly, recent loanwords, from Ingrian, Finnish, and Russian, consistently retain their short o. There appear to be a number of other word shapes where short stressed o is probably inherited — but they are quite few, and what seems like an approach worth exploring is that this development has been conditional, while delabialization would be simply the default reflexation.

One particularly promising environment are words with original *o also in the 2nd syllable. Clear cases are the words koto ‘home’, roho ‘grass’. Loaning is not an option for either of these: the former was in Eastern Finnic (and standard Finnish) replaced by the form koti, while the latter shows vowel shortening before h, a sound change absent from Ingrian and Finnish. [4] The same rule could be used to account for some other words as well: kokottaa ‘to bawk’, mokoma ‘such’, orko ‘valley’.

Two very interesting words moreover turn up: opënë ‘horse’, toro ‘acorn’. These have not had original *o at all, and they rather derive from Proto-Finnic *hëpoinën, *tërho (or *tëroh?) — cf. Finnish hevonen, terho. This suggests that the history here was actually not the retention of stressed *o before a subsequent *o, but instead a later back-assimilation *ë-o > *o-o (and in ‘horse’, this was presumably folloed by an even later assimilation *-o-ë > /-ë-ë/). [5]

This opens also a new possibility for etymologizing koko ‘pile’. Given Finnish keko, perhaps this similarly derives from earlier *këko; in which case the synonymous Fi. koko, and its other Finnic cognates, would seem to turn out to be loanwords from Votic? (If the same would go also for koko ‘whole’, koota ‘to gather’ is not clear. I could also imagine these being derived from *kokë- ‘to check, e.g. traps’, perhaps via a meaning ‘to collect’.)

But on the contrary: no assimilation of this sort is seen in e.g. nõvvoa ‘to advice’, põlto ‘field’, sõtkoag ‘to mix dough’, võrkko ‘net’, võso ‘sprout, young tree’ (all five inherited words of the *ë-o type). The first could be maybe explained as being due to the general ban on ˣ/-ouC-/, and the last two similarly due to the ban on ˣ/voC-/. But ‘field’ and ‘to mix’ resist easy explanations. I wonder if both of them being loosely agricultural vocabulary has any relevance.

Another environment where o could perhaps be regular is the context C_(C)kka. In my sample, no examples with õ are found in this position, but instead there’s a full handful of examples of o: hoikka ‘thin’,  kokka ‘hook’, kolkka ‘corner’, nokka ‘beak’, rokka ‘cabbage soup’. At least the first is evidently an Ingrian loan, on account of retained /h/, but given the average proportion of such loans, I am not sure if we should expect the same to be the case for all of these.

It’s unclear what phonetic motivation could exist here, though, since a lone coda /k/ is not enough to block delabialization; e.g. *oksa > õhsa ‘branch’.

A third case where o fairly often remains is the diphthong *oi. From Mahu there are five positive inherited examples: koira ‘dog’, koivu ‘birch’, moisio ‘manor’, poika ‘son’, poiz ‘away’; and three negatives: õikõa ‘right’, nõissag ‘to rise’, sõittaag ‘to scold’. The sixth could suggest a shift *oi-ë > õi-õ; and the seventh may have something to do with how the rest of Finnic rather indicates earlier *ou (Es. tõusma, Fi. nousta), perhaps suggesting something like *novise-? [6] But no especially clear pattern seems to emerge here.

The issue of õ will for now still remain unsolved, but at least it is clear that some clues will emerge as long as one is willing to look for them.

[1] Given that this correspondence mostly occurs in loanwords from Indo-European that once had *e-a, *e-o or the like (e.g. pelto derives from Proto-Germanic *felθō), it’s clear that there has been a retraction from earlier *e at some point, but what is not clear is if we should assume this to have been an areal Southern Finnic innovation, or a common Proto-Finnic innovation followed by a backshift in Northern Finnic. The question is out of the scope of this post, but suffice to say that I narrowly lean in favor of the latter (mainly per some indirect arguments from relative chronology, which however rely on a few so far unreleased ideas of mine).
[2] In principle I am however open to the possibility that in some cases Southern Finnic might regardless retain the more original state of affairs, especially since by now we know that Proto-Uralic also featured a non-front non-open illabial vowel *ë. This is normally continued by Finnic *a, but it’s imaginable that some conditional exception developments exist. One particularly interesting gap to observe is that while PU seems to have allowed a fairly wide variety of vowel + semivowel combinations, no cases of the sequences *-ëw-, *-ëj- have been reconstruted so far.
[3] That is, both the modern Nort Estonian-centric standard, mainly based on the Tallinn dialect, and the old South Estonian-centric standard, mainly based on the Tartu dialect.
[4] Somewhat mysteriously this feature is however found also in Veps. — In principle it’s of course also possible that shortening before h is simply later than *o > õ.
[5] Interestingly enough, also Standard North Estonian seems to have this change in hobune ‘horse’. Yet ‘acorn’ remains tõru. Could early loaning from Votic to Estonian be involved?
[6] This reconstruction would, then, rather look like a derivative *nov-ise-, and we could ask if the root is somehow connected with PU *ńoxə- ‘to chase, follow’ (whence also PF *nou-ta- ‘to fetch’). But the semantics do not seem to work even elementarily, especially since *-ise- is well-attested only as deriving onomatopoetic verbs of all things.

Tagged with: , , , , ,
Posted in Reconstruction

Inheritance in Phonology

It occurred to me that there’s one concept I have never seen anyone else define or use, although I’ve been working with it in my own research for a while now: that of an inheritance phoneme.

This is in effect the polar opposite of the well-known case of the loanword phoneme. As the audience of this blog probably mostly knows, a loanword phoneme refers to a sound that is absent from the native lexicon of a language, but occurs in one or more of its contact languages, and has been taken on from there into the language itself. Clear examples include /b g f ʃ/ in modern Finnish.

But sometimes, we can by contrast find in a language a phoneme that is absent from its contact languages, and is only found in the native-enough lexicon. [1] In Finnish a recent example might be the labial opening diphthongs /uo/, /yö/. Although found as reflexes of earlier *oo, *öö even in some not especially old loanwords from e.g. Swedish (including tuoli ‘chair’, kyöpeli ‘kobold’; yet more recently also fluori ‘fluorine’), they appear to have within the last about 200 years become a “closed class” that, for now, is no longer acquiring new members. [2] Of course, this is not “closed” in the same sense as a morphological word class might be — the diphthongs remain entirely possible in new ideophones and onomatopoeia (blyögh ‘barf!’), blends (Suomalia ‘an area in Finland with a relatively large current or predicted Somali population’), and derivatives based on pre-existing roots.

Better examples can probably be found, from languages having some more strongly marked phonemes. For example, I’d expect Czech ř or German pf to be not very common in current loanwords, and to have been so for a good while; or the nasal vowels in French to be absent from most modern loanwords, with the exception of those from Portuguese or sub-Saharan African languages.

Even then, this concept seems less clearly defined than the loanword phoneme. While a loanword phoneme is established by its one-time inadmissibility in the language altogether, there is nothing in a language’s internal structure at any given time that could prevent a given phoneme from appearing in loans. This situation can only be an incidental fact about its contact languages — and if the contact situation changes, anything’s possible again. (Put a Czech speaker community in regular contact with speakers of Toda, and I for one would bet that ř would then start regularly turning up in some loanwords.) A phoneme could also be only “partially inherited”, in being found in some loan strata but not in others — as I hypothesized to be the case with French nasal vowels.

On the other hand, what is interesting here is that while words containing loanword phonemes allow setting up a terminus post quem for their acquisition into the language (if we know that Finnish circa 1600 had no /ʃ/, then all modern Finnish words with the consonant must be more recent, even if their etymology were unknown) — inheritance phonemes may allow establishing a terminus ante quem. This seems like a fairly powerful tool; usually we can backdate a word only by the comparative method, and even then not watertightly either. But, given a word like Fi. tuoksua ‘to smell’ (of unknown origin, not attested before the end of the 17th century, and in contrast to the more widespread native Finnic synonym haista), we can regardless consider it probable from its diphthong that this is not an especially young word, perhaps dating at least to the Middle Ages. Given an absense of known loan etymologies from any obvious candidates for a loangiver (Swedish, Russian etc.) would furthermore suggest that we can with slightly lower confidence add a couple of centuries more yet. [3]


We can also define similar concepts such as loan cluster and inheritance cluster. The former, although to my knowledge never explicitly named, is again a known phenomenon. Finnish continues to work as an example: while Modern Finnish clearly allows e.g. word-initial consonant clusters, it is not too hard to find phonological analyses that dismiss them as non-native and proceed to posit a “basic” syllable structure (C)V(V/C)(C). Jorma Koivulehto has also made good use of this approach in research of early loanwords, having e.g. shown that all Finnic word roots with the medial cluster *-rt- are ultimately Indo-European loans, and not of Uralic inheritance. [4] (This, however, is not to be confused with the occurrence of *rt in word stems, where it can well result from inherited *r + a suffix such as causative *-ta-; as in Fi. vieri ‘side’ → vier-tä- ‘to be or go beside smth.’)

It seems similarly possible to consider e.g. Finnish tk for the most part an inheritance cluster that indicates relatively native vocabulary. No examples of this cluster in old loans are known; and given that already in Late Proto-Indo-European, the inherited “thorn” clusters of dental + velar were metathesized or otherwise reduced, it seems likely that none will be found anytime soon either, at least not from an Indo-European direction. (Much newer examples can be found though, e.g. Atkinsin dieetti, votka; and in far-northern dialects, e.g. vietka ‘adze’, from Sami.)

I could explore various further examples here, but for now, this post should do for a point of reference for later use.

[1] “Nativeness” is a relative concept, of course, not an absolute one. E.g. Finnish kauppa ‘store’ can be considered a “native” counterpart of the more recent loans puoti (← Swedish), lafka (← Russian), basaari (ultimately ← Persian) etc., but ultimately it is a Germanic loanword as well. Similarly, even words reconstructible back to Proto-Uralic can in principle be loans at some deeper time-level yet (e.g. we can suspect on semantic grounds that pata < *pata ‘pot’ might be one).
[2] The illabial opening diphthong /ie/ remains possible in loans, e.g. fiesta, siesta, DJ Tiësto.
[3] For some speculation though, something could be perhaps made of the similarity to Swedish doft, German Duft ‘smell’. If these could be analyzed as earlier *duf-t-, perhaps in turn some kind of a labial-stop extension of PIE *dʰewh₂- ‘to smoke’ (PG *dup-?? Svensk Etymologisk Ordbok connects here also Greek τυφος ‘smoke’), then we might be able to assume that the Finnish word derives from pseudo-PF *tupa/*tupo ‘smell’ → *tuβa-ks-u-/*tuβo-ks-u- ‘to put out smell’ > *tu.aksu-/*tu.oksu-, with a similar late contracted diphthong as in words like siellä < *si.ällä < *siɣällä < *sigä-llä ‘there’, or haukka < havukka (attested dialectally) < *haβukka < *habukka ‘hawk’.
[4] See in particular: Koivulehto, Jorma (1979): Baltisches und Germanisches im Finnischen: die. finn. Stämme auf -rte und die finn. Sequenz VrtV. In: Schiefer, Erhard F. (ed.), Explanationes und tractationes Fenno-Ugricae in honorem Hans Fromm, pp. 129–164. München.

Tagged with: , , ,
Posted in Methodology

Weighing etymological distributions

I’ve sometimes remarked (but until now, not on this blog) that one interesting difference between Uralic and Indo-European studies is radically different approaches to lexical reconstruction. Uralic studies have for long hung on to the idea of a deeply stratified family tree, and accordingly, word roots dating to the same, nearly identical stage of phonological reconstruction have been varyingly separated as “Proto-Finno-Samic”, “Proto-Finno-Volgaic”, “Proto-Finno-Permic”, “Proto-Ugric”, “Proto-Finno-Ugric” or “Proto-Uralic” — depending simply on in which branches of Uralic have descendants survived. While on the IE side, all available reconstructions are generally treated under the title “Proto-Indo-European”, no matter if we’re dealing with a word root with a narrow distribution covering only e.g. Germanic and Balto-Slavic, or one found everywhere from Irish to Bengali and from Hittite to Tocharian. (Fairly often also quite different reconstruction stages are equated, at least in name; mostly in connection to laryngeal theory, which I find to be in mostly poor shape when it comes to distinguishing between comparative and internal reconstruction.)

Ironically enough, both sides appear to have been wrong. The evidence for most of the traditional intermediate groupings of Uralic has either evaporated long since, or has turned out to have been illusory all along; while studies on the dialectification of Indo-European fairly consistently keep suggesting the status of Anatolian and possibly Tocharian as early splits.

Focusing more on the IE side for once: there do not, yet, seem to be general-purpose sources that would examine how many of the numerous typological and allegedly synchronic analyses of Proto-Indo-European would hold even if we restricted our view to just the oldest material. (There are individual papers out there somewhere I’m sure, but admittedly I have not been looking especially heavily for them.) But in order to get some kind of a rough idea, I’ve started a small project: taking Wiktionary’s list of Proto-Indo-European roots as a starting point and indexing them according to their distribution across the better-documented IE languages (i.e. no Phrygians or Messapics). You can check on the work in progress over here. Sure enough, while convenient, this is probably also a fairly unsystematic sample of data. I might want to follow up on this by taking at some point a look at some more comprehensive modern rootlists, such as the LIV.

This anyway comes out as a type of dataset I have some practice with by now: a distribution matrix, recording the lack or presence of a root in a subgroup. [1] There are some interesting things you can do with such data, although I think a generally applicable theory remains undeveloped. I already have several similar projects involving Uralic data in preparation — of these, the two in the best shape are a spreadsheet database of the common Samoyedic lexicon (about 780 entries, mostly from Janhunen’s Samojedischer Wortschatz; currently not missing much else than finishing translating the German glosses into English), and another one listing the best-preserved common Uralic lexicon (with reflexes in six or more of the nine main Uralic subgroups, which comes out at about 200 entries; currently not missing much else than finishing adding the intermediate Proto-Samic/Proto-Finnic/Proto-Samoyedic forms). [2]

With PIE and the Indo-Hittite question, one followup could be similar filtering of the evidently abundant “Common IE” lexicon (= everything not attested from Anatolian and Tocharian). It’s after all probable that a lot of vocabulary that once occurred in Anatolian and/or Tocharian remains simply undocumented in the literary records of the languages; and, other things being equal, a word root attested widely across the modern IE languages is more likely to be an archaism (or an erroneous comparison) than one reconstructed on the basis of more fragmented data.


But at this point I run into the question: what kind of a metric should I use for assessing how well has a given proto-root been retained? A flat sum-of-branches function seems to still work decently for Uralic, but for IE, not so much. The fundamentally underdocumented Anatolian and Tocharian are one type of problem, while another are the “family-isolates” Albanian and Armenian, where an order of magnitude less inherited vocabulary is found than in the old major groups like Greek or Indo-Iranian. [3] It seems clear that if a Common IE root is only lost from Alb.+Arm., this is not as big a deal than if it were instead lost from Gr.+II. But how much so exactly? And suppose I were to treat II reflexation worth e.g. one point, but Albanian reflexation worth one half — should I then also treat e.g. Slavic reflexation worth something like 0.8, given that the group is also clearly younger (and has had more opportunities for renewal of vocabulary)?

Initially it may seem that just noting the overall rate of lexical retention should work. Let’s say Albanian has lost 70% of the Common IE lexicon, while Germanic has lost 10%; does this means that loss in Albanian is therefore seven times less valuable as evidence?

This approach however would seem to conflate lexical archaicity and lexical diversity. Even if, say, Germanic and Indo-Iranian are both subfamilies that retain 90% of the common IE vocabulary, this does not imply that their histories have been essentially identical. As far as we know from history and archeology, this “symmetry” would be due to the former having been for long hanging out in the margins of Northwest Europe, and has not had as many opportunities for renewing its lexicon; while the latter has split into further subgroups already early on, including several languages first attested soon afterwards, and so the odds are good that any given IE root could have been retained in at least a few descendants somewhere.

Another variable to take into account thus might be the amount of lexical diversity within a language group. But I also have yet to work out how to formulate a metric for this, exactly. And the question kind of iterates… determining the lexical diversity within e.g. Indo-Iranian is probably going to require a way to assess the lexical distribution between its main branches; and then likewise for determining the lexical diversity within e.g. the Persid languages; and then finally the same also for varieties of modern Persian. Ultimately this then reduces to a question on how well have individual language varieties been documented in the first place.

I might simply need a clearer theory of what am I trying to assess about etymological distributions in the first place. In principle, there seem to be at least two somewhat distinct issues involved:

  • attempting to determine the “internal rate of loss/innovation” for a particular lexeme (which, contrary to even the more sophisticated lexicostatistic theories out there, is in all likelihood not a constant, but rather something further depending on a language’s sociolinguistic situation and other such external variables); and from this approximate how much further back from its oldest strictly reconstructible stage is it likely to date
    (e.g. if we can reconstruct the Common IE roots *kakka- ‘poop’ and *pléwmon- ‘lung’, we could perhaps assume from just the semantics, already before any sound-symbolic or similar considerations, that the former is younger than the latter)
  • attempting to determine how likely it is that a particular widespread word root is actually a later areal innovation rather than common inheritance
    (e.g. all other things being equal, a putative PIE root that has not been attested from any Celtic language is more likely to be a lexical innovation that never reached the westernmost Late PIE dialects, than one that does extend there; or, for that matter, a word root attested only from Latin, Greek and Anatolian carries a bigger risk of involving serial loaning than a word root attested only from Umbrian, Slavic and Anatolian)

Both of these approaches would provide evidence on how likely is it that e.g. some Common IE root was or wasn’t already present in Proto-Indo-Hittite. But they regardless involve distinct historical processes.

[1] Technically these should be considered probabilities, not boolean variables. If a reflex is uncertain or has unclear features, we can mark this uncertainty as a 0.8 or 0.5 or 0.1, instead of a plain 1 or 0. And even the zeros and ones should perhaps be actually considered to be shorthand for ɛ and 1-ε, for some miniscule ɛ approximating the probabilities that we’re in fact wrong about how the history of e.g. Greek works, or how historical linguistics and etymology works in general.
[2] Further information on these and my other similar projects available on inquiry.
[3] Although it is interesting to note that, so far, almost all vocabulary with Anatolian parallels seems to be fairly well-retained even in Alb. and Arm. compared with poor retention otherwise. Perhaps this indicates the greater resilience, and attestability, of core vocabulary compared to peripheral vocabulary? But already the “Indo-Tocharian” layer seems to fare worse. We’ll see if this pattern carries thru.

Tagged with: , , , , ,
Posted in Methodology

Enter your email address to follow this blog and receive notifications of new posts by email.

Follow

Get every new post delivered to your Inbox.

Join 34 other followers