Inheritance in Phonology

It occurred to me that there’s one concept I have never seen anyone else define or use, although I’ve been working with it in my own research for a while now: that of an inheritance phoneme.

This is in effect the polar opposite of the well-known case of the loanword phoneme. As the audience of this blog probably mostly knows, a loanword phoneme refers to a sound that is absent from the native lexicon of a language, but occurs in one or more of its contact languages, and has been taken on from there into the language itself. Clear examples include /b g f ʃ/ in modern Finnish.

But sometimes, we can by contrast find in a language a phoneme that is absent from its contact languages, and is only found in the native-enough lexicon. [1] In Finnish a recent example might be the labial opening diphthongs /uo/, /yö/. Although found as reflexes of earlier *oo, *öö even in some not especially old loanwords from e.g. Swedish (including tuoli ‘chair’, kyöpeli ‘kobold’; yet more recently also fluori ‘fluorine’), they appear to have within the last about 200 years become a “closed class” that, for now, is no longer acquiring new members. [2] Of course, this is not “closed” in the same sense as a morphological word class might be — the diphthongs remain entirely possible in new ideophones and onomatopoeia (blyögh ‘barf!’), blends (Suomalia ‘an area in Finland with a relatively large current or predicted Somali population’), and derivatives based on pre-existing roots.

Better examples can probably be found, from languages having some more strongly marked phonemes. For example, I’d expect Czech ř or German pf to be not very common in current loanwords, and to have been so for a good while; or the nasal vowels in French to be absent from most modern loanwords, with the exception of those from Portuguese or sub-Saharan African languages.

Even then, this concept seems less clearly defined than the loanword phoneme. While a loanword phoneme is established by its one-time inadmissibility in the language altogether, there is nothing in a language’s internal structure at any given time that could prevent a given phoneme from appearing in loans. This situation can only be an incidental fact about its contact languages — and if the contact situation changes, anything’s possible again. (Put a Czech speaker community in regular contact with speakers of Toda, and I for one would bet that ř would then start regularly turning up in some loanwords.) A phoneme could also be only “partially inherited”, in being found in some loan strata but not in others — as I hypothesized to be the case with French nasal vowels.

On the other hand, what is interesting here is that while words containing loanword phonemes allow setting up a terminus post quem for their acquisition into the language (if we know that Finnish circa 1600 had no /ʃ/, then all modern Finnish words with the consonant must be more recent, even if their etymology were unknown) — inheritance phonemes may allow establishing a terminus ante quem. This seems like a fairly powerful tool; usually we can backdate a word only by the comparative method, and even then not watertightly either. But, given a word like Fi. tuoksua ‘to smell’ (of unknown origin, not attested before the end of the 17th century, and in contrast to the more widespread native Finnic synonym haista), we can regardless consider it probable from its diphthong that this is not an especially young word, perhaps dating at least to the Middle Ages. Given an absense of known loan etymologies from any obvious candidates for a loangiver (Swedish, Russian etc.) would furthermore suggest that we can with slightly lower confidence add a couple of centuries more yet. [3]

We can also define similar concepts such as loan cluster and inheritance cluster. The former, although to my knowledge never explicitly named, is again a known phenomenon. Finnish continues to work as an example: while Modern Finnish clearly allows e.g. word-initial consonant clusters, it is not too hard to find phonological analyses that dismiss them as non-native and proceed to posit a “basic” syllable structure (C)V(V/C)(C). Jorma Koivulehto has also made good use of this approach in research of early loanwords, having e.g. shown that all Finnic word roots with the medial cluster *-rt- are ultimately Indo-European loans, and not of Uralic inheritance. [4] (This, however, is not to be confused with the occurrence of *rt in word stems, where it can well result from inherited *r + a suffix such as causative *-ta-; as in Fi. vieri ‘side’ → vier-tä- ‘to be or go beside smth.’)

It seems similarly possible to consider e.g. Finnish tk for the most part an inheritance cluster that indicates relatively native vocabulary. No examples of this cluster in old loans are known; and given that already in Late Proto-Indo-European, the inherited “thorn” clusters of dental + velar were metathesized or otherwise reduced, it seems likely that none will be found anytime soon either, at least not from an Indo-European direction. (Much newer examples can be found though, e.g. Atkinsin dieetti, votka; and in far-northern dialects, e.g. vietka ‘adze’, from Sami.)

I could explore various further examples here, but for now, this post should do for a point of reference for later use.

[1] “Nativeness” is a relative concept, of course, not an absolute one. E.g. Finnish kauppa ‘store’ can be considered a “native” counterpart of the more recent loans puoti (← Swedish), lafka (← Russian), basaari (ultimately ← Persian) etc., but ultimately it is a Germanic loanword as well. Similarly, even words reconstructible back to Proto-Uralic can in principle be loans at some deeper time-level yet (e.g. we can suspect on semantic grounds that pata < *pata ‘pot’ might be one).
[2] The illabial opening diphthong /ie/ remains possible in loans, e.g. fiesta, siesta, DJ Tiësto.
[3] For some speculation though, something could be perhaps made of the similarity to Swedish doft, German Duft ‘smell’. If these could be analyzed as earlier *duf-t-, perhaps in turn some kind of a labial-stop extension of PIE *dʰewh₂- ‘to smoke’ (PG *dup-?? Svensk Etymologisk Ordbok connects here also Greek τυφος ‘smoke’), then we might be able to assume that the Finnish word derives from pseudo-PF *tupa/*tupo ‘smell’ → *tuβa-ks-u-/*tuβo-ks-u- ‘to put out smell’ > *tu.aksu-/*tu.oksu-, with a similar late contracted diphthong as in words like siellä < *si.ällä < *siɣällä < *sigä-llä ‘there’, or haukka < havukka (attested dialectally) < *haβukka < *habukka ‘hawk’.
[4] See in particular: Koivulehto, Jorma (1979): Baltisches und Germanisches im Finnischen: die. finn. Stämme auf -rte und die finn. Sequenz VrtV. In: Schiefer, Erhard F. (ed.), Explanationes und tractationes Fenno-Ugricae in honorem Hans Fromm, pp. 129–164. München.

Tagged with: , , ,
Posted in Methodology

Weighing etymological distributions

I’ve sometimes remarked (but until now, not on this blog) that one interesting difference between Uralic and Indo-European studies is radically different approaches to lexical reconstruction. Uralic studies have for long hung on to the idea of a deeply stratified family tree, and accordingly, word roots dating to the same, nearly identical stage of phonological reconstruction have been varyingly separated as “Proto-Finno-Samic”, “Proto-Finno-Volgaic”, “Proto-Finno-Permic”, “Proto-Ugric”, “Proto-Finno-Ugric” or “Proto-Uralic” — depending simply on in which branches of Uralic have descendants survived. While on the IE side, all available reconstructions are generally treated under the title “Proto-Indo-European”, no matter if we’re dealing with a word root with a narrow distribution covering only e.g. Germanic and Balto-Slavic, or one found everywhere from Irish to Bengali and from Hittite to Tocharian. (Fairly often also quite different reconstruction stages are equated, at least in name; mostly in connection to laryngeal theory, which I find to be in mostly poor shape when it comes to distinguishing between comparative and internal reconstruction.)

Ironically enough, both sides appear to have been wrong. The evidence for most of the traditional intermediate groupings of Uralic has either evaporated long since, or has turned out to have been illusory all along; while studies on the dialectification of Indo-European fairly consistently keep suggesting the status of Anatolian and possibly Tocharian as early splits.

Focusing more on the IE side for once: there do not, yet, seem to be general-purpose sources that would examine how many of the numerous typological and allegedly synchronic analyses of Proto-Indo-European would hold even if we restricted our view to just the oldest material. (There are individual papers out there somewhere I’m sure, but admittedly I have not been looking especially heavily for them.) But in order to get some kind of a rough idea, I’ve started a small project: taking Wiktionary’s list of Proto-Indo-European roots as a starting point and indexing them according to their distribution across the better-documented IE languages (i.e. no Phrygians or Messapics). You can check on the work in progress over here. Sure enough, while convenient, this is probably also a fairly unsystematic sample of data. I might want to follow up on this by taking at some point a look at some more comprehensive modern rootlists, such as the LIV.

This anyway comes out as a type of dataset I have some practice with by now: a distribution matrix, recording the lack or presence of a root in a subgroup. [1] There are some interesting things you can do with such data, although I think a generally applicable theory remains undeveloped. I already have several similar projects involving Uralic data in preparation — of these, the two in the best shape are a spreadsheet database of the common Samoyedic lexicon (about 780 entries, mostly from Janhunen’s Samojedischer Wortschatz; currently not missing much else than finishing translating the German glosses into English), and another one listing the best-preserved common Uralic lexicon (with reflexes in six or more of the nine main Uralic subgroups, which comes out at about 200 entries; currently not missing much else than finishing adding the intermediate Proto-Samic/Proto-Finnic/Proto-Samoyedic forms). [2]

With PIE and the Indo-Hittite question, one followup could be similar filtering of the evidently abundant “Common IE” lexicon (= everything not attested from Anatolian and Tocharian). It’s after all probable that a lot of vocabulary that once occurred in Anatolian and/or Tocharian remains simply undocumented in the literary records of the languages; and, other things being equal, a word root attested widely across the modern IE languages is more likely to be an archaism (or an erroneous comparison) than one reconstructed on the basis of more fragmented data.

But at this point I run into the question: what kind of a metric should I use for assessing how well has a given proto-root been retained? A flat sum-of-branches function seems to still work decently for Uralic, but for IE, not so much. The fundamentally underdocumented Anatolian and Tocharian are one type of problem, while another are the “family-isolates” Albanian and Armenian, where an order of magnitude less inherited vocabulary is found than in the old major groups like Greek or Indo-Iranian. [3] It seems clear that if a Common IE root is only lost from Alb.+Arm., this is not as big a deal than if it were instead lost from Gr.+II. But how much so exactly? And suppose I were to treat II reflexation worth e.g. one point, but Albanian reflexation worth one half — should I then also treat e.g. Slavic reflexation worth something like 0.8, given that the group is also clearly younger (and has had more opportunities for renewal of vocabulary)?

Initially it may seem that just noting the overall rate of lexical retention should work. Let’s say Albanian has lost 70% of the Common IE lexicon, while Germanic has lost 10%; does this means that loss in Albanian is therefore seven times less valuable as evidence?

This approach however would seem to conflate lexical archaicity and lexical diversity. Even if, say, Germanic and Indo-Iranian are both subfamilies that retain 90% of the common IE vocabulary, this does not imply that their histories have been essentially identical. As far as we know from history and archeology, this “symmetry” would be due to the former having been for long hanging out in the margins of Northwest Europe, and has not had as many opportunities for renewing its lexicon; while the latter has split into further subgroups already early on, including several languages first attested soon afterwards, and so the odds are good that any given IE root could have been retained in at least a few descendants somewhere.

Another variable to take into account thus might be the amount of lexical diversity within a language group. But I also have yet to work out how to formulate a metric for this, exactly. And the question kind of iterates… determining the lexical diversity within e.g. Indo-Iranian is probably going to require a way to assess the lexical distribution between its main branches; and then likewise for determining the lexical diversity within e.g. the Persid languages; and then finally the same also for varieties of modern Persian. Ultimately this then reduces to a question on how well have individual language varieties been documented in the first place.

I might simply need a clearer theory of what am I trying to assess about etymological distributions in the first place. In principle, there seem to be at least two somewhat distinct issues involved:

  • attempting to determine the “internal rate of loss/innovation” for a particular lexeme (which, contrary to even the more sophisticated lexicostatistic theories out there, is in all likelihood not a constant, but rather something further depending on a language’s sociolinguistic situation and other such external variables); and from this approximate how much further back from its oldest strictly reconstructible stage is it likely to date
    (e.g. if we can reconstruct the Common IE roots *kakka- ‘poop’ and *pléwmon- ‘lung’, we could perhaps assume from just the semantics, already before any sound-symbolic or similar considerations, that the former is younger than the latter)
  • attempting to determine how likely it is that a particular widespread word root is actually a later areal innovation rather than common inheritance
    (e.g. all other things being equal, a putative PIE root that has not been attested from any Celtic language is more likely to be a lexical innovation that never reached the westernmost Late PIE dialects, than one that does extend there; or, for that matter, a word root attested only from Latin, Greek and Anatolian carries a bigger risk of involving serial loaning than a word root attested only from Umbrian, Slavic and Anatolian)

Both of these approaches would provide evidence on how likely is it that e.g. some Common IE root was or wasn’t already present in Proto-Indo-Hittite. But they regardless involve distinct historical processes.

[1] Technically these should be considered probabilities, not boolean variables. If a reflex is uncertain or has unclear features, we can mark this uncertainty as a 0.8 or 0.5 or 0.1, instead of a plain 1 or 0. And even the zeros and ones should perhaps be actually considered to be shorthand for ɛ and 1-ε, for some miniscule ɛ approximating the probabilities that we’re in fact wrong about how the history of e.g. Greek works, or how historical linguistics and etymology works in general.
[2] Further information on these and my other similar projects available on inquiry.
[3] Although it is interesting to note that, so far, almost all vocabulary with Anatolian parallels seems to be fairly well-retained even in Alb. and Arm. compared with poor retention otherwise. Perhaps this indicates the greater resilience, and attestability, of core vocabulary compared to peripheral vocabulary? But already the “Indo-Tocharian” layer seems to fare worse. We’ll see if this pattern carries thru.

Tagged with: , , , , ,
Posted in Methodology

Etymology squib: Huoma & co.

An interesting paper I’ve recently found, by Kirill Reshetnikov from 2011: “Новые этимологии для прибалтийско-финских слов”, Урало-алтайские исследования 2 (5): 109–112. A Russian-only journal is a slightly odd location for publishing research on Finnic etymology, but I suppose technically still fair.

Two of his four comparisons I was already aware of, and I have no major complaints about them:

  • Finnic *kasvot (plurale tantum) ‘face’ ~ Samoyedic *kat ‘face’ < PU *kas-
    I’ve noticed on my own the possibility of connecting these words as well. The previous suggestions that I have seen for deriving the Finnic word from *kasva- ‘to grow’ have struck me as semantically forced. What I am not sure of, though, is reconstructing *kaswV as the common Proto-Uralic form. While loss of *w in consonant clusters would be regular in Samoyedic, there are no precedents for clusters of obstruent + semivowel in PU, [1] and also the loss of a stem vowel in Samoyedic after an original consonant cluster would be exceptional. Perhaps the root is simply *kasə, and *-vo- in Finnic is a suffix or two. [2]
  • Finnic *oh-ut, *oh-kainen ‘thin’ ~ Ob-Ugric *waaɣəɬ ‘thin’ [3] < PU *wokšə
    Seemingly independently also proposed by Ante Aikio in the 2nd volume of his recent Studies in Uralic Etymology article series. Multiple non-trivial developments are involved, but the etymology still appears to be entirely regular.

while one comparison I remain hesitant on: North Finnic *turkki ‘fur coat’ ~ Samoyedic *tər ‘fur’. A three-consonant cluster *-rkk- would not be possible for PU; yet an analysis of the Finnic form as something like *tur-kka-j or *turk-ka-j, as Reshetnikov proposes, would seem unusual as well. There are some Finnic roots analyzable as *CVC-ka, I think a few might be analyzeable as *CVw-kka or *CVj-kka, and there’s even *po(n)č-ka ‘shank’, derived from PU *pončə- ‘tail’ — but most of the time “heavy” consonant-stem formations still seem to only involve dental suffixes. Another morphological reason to be suspicious is that the Finnic word is an unalternating *i-stem, a root type usually found in loans. True, some words of this type are old *j-derivatives (e.g. *kota ‘house’ → *koti ‘home’), but in those cases the underived root seems to be almost always still around as well.

Phonologically, I also do not think the development of PU *u to Samoyedic *ə can be considered normal in a consonant-stem root.

Interestingly however, no root for ‘fur’ or ‘fur coat’ seems to have been reconstructed for PU previously (despite a wide number of roots reconstructed with the meaning ‘skin’ [4]), nor even for Proto-Finnic proper; so perhaps that counts as a small point in favor of the etymology.

But mooving on to Reshetnikov’s last, and most interesting, proposal: Finnish/Karelian huoma- ‘possession, care’ (mostly in adverbs) ~ Hungarian óv ‘to protect, guard’. This seems like a good catch. Pre-Hungarian *-w has generally been vocalized word-finally (in words like *köw(ɛ-) > kő : köve- ‘stone’, *low(a-) > ló : lova- ‘horse’, *saw(a-) > szó : szava- ‘word’, *taw(a-) > tó : tava- ‘lake’), so a word-final -v like this would probably come from earlier *-β; which, in turn, is in inherited vocabulary perhaps more often than not from an earlier *m (in words like PU *nimə > név ‘name’, PU *ńälmä > nyelv ‘tongue’, Ugric #äľmV > enyv ‘glue’). [5]

Previously a different etymology for huoma has been proposed: borrowing from Germanic *sōmijan- ‘to fit; to honor’. The common Finnic verb *hoomat- ‘to notice’, and some derivatives such as Fi. huomio ‘attention’ probably do come from this source — but I agree that ‘possession’ is not an obvious development from this meaning at all.

Alas, at this point Reshetnikov seems to skip over some work. Observing that both Finnic and Hungarian have a long mid back vowel *oo / ó, he simply proceeds to reconstruct PU *šoma/*šooma. This was a paper released back when the idea of long vowels in Proto-Finno-Ugric was still mostly unchallenged — but even then, as far as I know, Hungarian ó has never been considered to descend from earlier *oo or *o! Most cases rather have í (e.g. ín ~ PF *sooni ‘vein, sinew’; nyíl ~ PF *nooli ‘arrow’) or a, á (e.g. nyal ~ PF *noolë- ‘to lick’; három ~ PF *kolmë(t) ‘3’; ház ~ PF *kota ‘house’). A short o in possible correspondence to Finnic *oo does appear in orr ‘nose’ ~ Finnic *voori ‘mountain’, and some examples of o ~ *o are known too.

But when it comes to Hungarian long ó, it seems that this is in all cases a secondary vowel resulting from vocalization of coda *w (when following pre-Hungarian *a and *o). The v-stems such as , szó, mentioned above demonstrate the process well. Other examples where the same can be assumed within a root include PU *kuŋə > ‘moon’. Hence the common PU root should be rather reconstructed as something like *šowə-ma, *šoxə-ma, *šoŋə-ma or *šuwa-ma, with the Finnic and Hungarian long vowels both resulting from contraction. Of these, I think the last option looks the best: as the *ma-derivative seems to date to PU already, the *ə-stem variants would have been at a risk of reduction to a consonant-stem formation *šoGma, from which we’d rather expect Finnic **houma or **huuma.

Finnic-internal evidence supports an analysis as a derivative as well. Beside huoma-, there exists also a largely parallel huosta- ‘possession, custody’; again attested only from Finnish and Karelian. The two are analyzeable as derivatives from a common root *hoo- (although -sta- is a rare formant; it usually derives local nouns, such as Fi. alusta ‘base’), which would probably have been a verb meaning approx. ‘to take care of’.

I suspect there’s even a third derivative of this hypothetical verbal root *hoo- to be found in Finnic, hidden in plain sight: *hoolë- ‘to take into custody, to take care of’ (not retained as an underived verb, only as several parallel derivatives, e.g. Fi. huolehtia, huolia, huolita). This has normally been analyzed as identical to the homophonic noun root *hooli ‘worry, bother’ (probably ← Baltic), but the semantics of this derivation seem to be a bit off. In that case, the morphology would also point to the bare root being treated both as a noun and a verb, which is attested in some old inherited words (e.g. *tuuli ‘wind’ : *tuulë- ‘to be windy’) — but I don’t think I’ve ever seen a case of this in loanwords, where instead dummy verbal suffixes are often piled on even for no reason whatsoever (e.g. *he-i-ttä- ‘to throw’, and not **hee-, from Germanic *sēa- ‘to sow’). Thus, an analysis as an old frequentative *hoo-lë- seems to work better.

Later on, this and ‘worry’ have probably semantically bled into each other, leading to e.g. Fi. huolehtia meaning both ‘to take care of’ and ‘to worry about’; or huolia meaning both ‘to take into custody’ and ‘to bother taking’.

If we can therefore establish a PU verb root *šuwa- ‘to take care of’, this also opens one interesting possibility. The common Samoyedic verb for ‘to give’ is reconstructed as *tə- (compare Nganasan ta-, tə-, Selkup *tatə-; but Tundra Nenets tā-). Unsurprizingly, it has been considered a reflex of PU *toxə- ‘to give’; but the development *o > *ə would be quite irregular. [6] On the other hand: the regular PU source of *ə is *u-a; and we expect *-w- to be lost in Proto-Samoyedic, so *šuwa- would seem to be a potential proto-form for this verb. An open stem vowel *-a- is, again, not expected to be lost, but if we reconstructed *təå- or *təə-, with a later contraction to *a or *å (in some forms?) in some languages, this might even explain the irregular long ā in Nenets [7] and open a in Nganasan.

The semantics, I admit, do not match very well at all. But an interesting further formation is the synonymous PSmy *tə-tå- ‘to give’. For this, a derivation from *šuwa- ‘to take care of’ would not be too odd; a causative derivative ‘to give in someone’s care’ could perhaps develop into a neutral word for ‘to give’. If it would make sense for this meaning to then propagate back to the base root is on the other hand a more difficult question.

— One last question to explore might be if this PU root has anything to do with PIE *h₁su- ‘good’, but that would run into a bit too many questions and tangents to be fruitful to get into right now.

[1] The “spirants” *d₁, *d₂ that can be found in a few roots like *käd₂wä ‘female animal’ do not count, I think; phonotactically they seem to behave more like liquids.
[2] I have recently developed a suspicion that there might exist an old obsolete noun-deriving suffix *-wa in a couple other Finnic words as well. One case that seems clear is *päivä ‘day, sun’, in light of an observation due to Janne Saarikivi: only *päj(ə)- should be considered a part of the original root, given several other words that appear to be related. In the UEW we can find some listed under the roots *päjä ‘fire’, *päjV ‘white, to gleam’; also relevant is Estonian päike ‘sun’, which cannot be derived from anything like **päiväkkä.
— As a further aside, this analysis moreover seems to show that the word is not actually an exception to the sound law *ä-ä > Finnic *a-e, which as of recently has been subject to ongoing discussion by at least Kallio, Zhivlov, and Aikio.
[3] *wooɣəθ according to László Honti; however, as I might have remarked before, I am skeptical of Honti’s *aa, which almost never seems to appear in Uralic vocabulary; and of his phonetically backwards sound change *oo > Mansi *aa. It seems more probable to me that if a separate Proto-Ob-Ugric stage existed, then its *aa was simply retained in Mansi, raised to *oo in Khanty. Commenting also on the reconstruction scheme of Eugene Helimski and Mikhail Zhivlov, where this vowel correspondence is reconstructed as short *a, would take me too far off the track here though.
[4] Reasonably well-reconstructible cases include: *čomčə ‘skin layer’ (S, Kh) | *iša ‘skin’ (S, F, P, ?Mo, ?Ma) | *ketə ‘skin’ (S, F, Mo, Smy) | *kopa ‘skin, bark’ (F, Ma, P, Smy) | *küpsɜ ‘skin on paws’ (P, Ms, Kh) | *perə ‘skin’ (Kh, Smy) | *śuka ‘bark / skin?’ (F, Ms, Kh, ?H)
[5] Though there are a couple of odd exceptions, where either expected *β is also vocalized (PU *lämə ‘broth’ > ? *lɛβ > lé : leve- ‘juice’), or *w / *ɣ is not (PU *jekä ‘year’ > ? *ēw / *ēɣ > év).
[6] Given that PU *toxə- has been considered a loan from PIE *doh₃- ‘id’-, it’s also not clear if this verb ever existed in pre-Samoyedic to begin with. The other widespread PU word for ‘to give’, *a/ëmta-, has also not been attested from Samoyedic or even Ob-Ugric, and it looks likely to be a causative derivative from a simpler root.
[7] The contrast between Tundra Nenets ā and ă has often been analyzed phonologically as /a/ vs. /ə/, but since TN in fact also has a separate reduced vowel (usually transcribed °), and contrasts long í, ú with short i, u, I consider this analysis untenable; as far as I know, no other language in the world contrasts reduced and unreduced mid central vowels. (In general, the common habit in Uralic linguistics to treat vowel reduction and vowel centralization as separate phenomena seems troublesome to me.) Instead, I would propose simply analyzing ā as long /aː/, and ă as short /a/. The fact that ā comes from earlier full *a, and ă from earlier reduced *ə, should not be considered relevant; especially since ° is historically largely derived from PSmy *ə just as well, and since *ə is regularly reflected as a plain full vowel /a/ also in the Southern Samoyedic languages.

Tagged with: , ,
Posted in Commentary, Etymology

*je-: A Reprise

Summer’s wrapping up, a new academic year’s about to roll in, and if all goes well, I might be returning to more active blogging around here.

I have also returned, about a week ago, from the 12th International Congress for Finno-Ugric Studies. You can check out my presentation online, too: Semivowel losses and assimilations, in Finnic and beyond. Longtime readers may recall me having first explored the ideas within many blog posts (and one blog platform) ago. Evidence has continued to turn up, and I’m by now quite convinced that my newfound soundlaw *je- > Finnic *i- indeed exists.

The title is admittedly a bit more general than what might be warranted per the presentation’s contents. For space concerns, I was not able to treat the topic of initial semivowels in Uralic languages in more general.

I could mention one fairly simple addendum here, though — while *wo- has been already traditionally well-established, and I attempt to show that abundant evidence of *je- can be found as well, by contrast it seems to me that **wu- and **ji- were not possible sequences in Proto-Uralic (and they remain impossible in most descendant languages as well).

Only one widely accepted instance *wu- has been proposed: the word for ‘new’ (> Fi. uusi, Hu. új etc.), traditionally reconstructed (modulo notation) as *wud₂ə. I however suspect that this should be instead reconstructed with *o, and that the evidence suggesting *u is due to the shift *o-ə > *u-ə in open syllables, as first proposed by Janhunen (1981) for the Finno-Permic end of the family. (Seen also in e.g. *lomə > *lumə > Fi. lumi ‘snow’. Sammallahti has later suggested that the change also affected Ugric; I am skeptical, however. More on this at some point in the future.) — For *ji- there are a few more potential examples, but the best-looking cases (? *jikä ‘age’, ? *jitɜ ‘night’) fall among those where I believe *je- should be rather reconstructed.

If so, then it seems to me that we can likely apply for Proto-Uralic a phonological analysis also known from various other languages of the world: to unite *j *w on one hand and *i *u on the other as allophones of each other.

Tagged with: , , ,
Posted in News, Reconstruction

On comparison in Proto-Uralic

Here is a somewhat speculative idea that recently occurred to me. I don’t think I will be able to deliberate on all the comparative implications just now, but it wouldn’t surprize me too much if something similar had already been proposed.

A relatively well-known suffix element usually reconstructed for Proto-Finno-Ugric (implicitly Proto-Uralic, in wake of Finno-Ugric turning out to be probably not a genetic grouping) is the comparative suffix *-mpa, reflected quite well in the best-known Uralic languages: Finnish -mpi : -mpa-, Estonian -m, Hungarian -(a)bb. [1] The Samic languages also have clear cognates, e.g. the Northern Sami bisyllabic adjectives’ comparative ending -t : -bu- (from slightly earlier *-b : -bu-).

No trace of such a comparative suffix though is known elsewhere in Uralic. This smells slightly suspicious. Hungarian is, overall, quite innovative, and usually whatever clear old Uralic features have been retained there, can also be traced in at least some of its Russian relatives; especially Ob-Ugric and Permic. There’s a proposed Samoyedic cognate, but as far as I recall seeing, found only in Nenets — and only used in an approximative sense.

It’s also the case that there is a quite well-established PU participle ending *-pa. These two suffixes share the privilege of being just about the only places in comparative Uralic inflectional morphology where *p occurs; and both of them have very roughly adjectival semantics. Might it be possible to thus segment *-mpa as *-m-pa? We’d like to know if this can be made to make semantic sense; and if we can find a reasonable candidate for what the nasal element comes from.

The former question can be roughly reformulated as: “if a thing is greater (than another one), what is it doing?” To me it seems the answer would be “exceeding, being greater”. A PU “comparative” form such as *wod₂ə-mpa (> Fi. uude-mpi, Hu. új-abb ‘newer’) could then be instead analyzable as *wod₂əm-pa, meaning ‘that which *wod₂əN-s’; and which could have independently developed into an IE-style nominal comparative in Hungarian and Finno-Samic. Originally it’d have been instead the verb stem *wod₂əN- that captured the “comparativeness”, meaning something like ‘to be newer’. The approximative sense in Nenets seems well-derivable from this as well — we can easily imagine the base meaning as just ‘to be new’, and derive from this on other hand an amplification ‘to be newer’, on the other hand a mitigation ‘to be newish’.

Above I’ve written the nasal of my internally reconstructed verb stem as just -N-. While Proto-Uralic allowed heterorganic nasal+stop consonant clusters with a coronal stop, [2] there seem to be no examples of this with a peripheral stop. Only *mp and *ŋk can be reconstructed stem-medially, while there are no **np, **nk, **ŋp, **mk. So I suppose all nasal consonants are fair game here. (And, of course, in Finnic and Hungarian all such heterorganic clusters assimilate anyway.)

— Now consider Finnish verbs derived with the suffix -ne-: e.g. iso ‘big’ → isone- ‘to become bigger’; mätä ‘rotten’ → mätäne- ‘to rot’; pimeä ‘dark’ → pimene- ‘to become dark(er)’. The suffix is used almost exclusively on adjectives, and typically forms verbs meaning indeed increase in quality. This seems to provide a great candidate for a Proto-Uralic derivative class, on which the nominal-type comparatives of “Western European Uralic” could have been based. Altogether, originally a word like Fi. isompi ‘bigger’ would have been a consonant-stem participle, equal to modern Fi. isoneva ‘that which increases, becomes bigger’ (pseudo-PU *ićäwmpä ~ *ićäwnəpä).

A chief remaining problem would be whether we really can reconstruct this verbal suffix all the way to PU though. SKRK reports similar usage across Finnic, as well as possible cognates in Ob-Ugric and Samoyedic; these however indicating original *-m-! Another hypothesis mentioned would be comparison to a momentane suffix -n in Hungarian, found in some fossilized forms such as villan ‘to flash’ (seemingly related in some way to világ ‘world; (archaic) light’ < PU *wëlkə). This sounds a bit better with respect to my reconstruction, but I’d like having some more supporting evidence. And of course, I’d also have to check how well the development of comparison constructions elsewhere in Uralic can be lined up with this scenario.

[1] For clarity, I’m ignoring vowel harmony in this post.
[2] Perhaps the clearest case is *tumtə- ‘to know’, whence e.g. Fi. tuntea, NS dovdat, Hu. túd, Tundra Nenets tumtă-.

Tagged with: , , , , ,
Posted in Etymology, Reconstruction

Linkday #2: FUF online

A small discovery to report: looks like someone from University of Toronto has kindly digitized a few back issues of Finnisch-Ugrische Forschungen, old enough to be out of copyright, and uploaded them on; findable e.g. under the keyword “Finno-Ugric languages — Periodicals“. Currently available are issues 1, 5, 6, 7, 9, 10, 11, 14, 15, from between (straightforwardly enough!) 1901 and 1915.

(Though the last one actually includes an article from a young Y. H. Toivonen (1890-1956), and which would according to Finnish copyright law remain under copyright for a while still… but one hopes the Toivonen estate will not consider this a terrible injustice.)

Posted in Uncategorized

Love, pity and morphology

Finnish armas ‘dear’ has a somewhat interesting etymology: the word is considered to derive by borrowing followed by semantic amelioration from Germanic *armaz ‘pitiful’.

If we were given no other data, this argument would have to remain rather hypothetical. The shape of the word does suggest an Indo-European loan, but allowing major semantic drift as a free assumption is an easy excuse for finding Germanic or Baltic etc. loan etymologies for almost everything in the Finnish lexicon (and given some effort, I’m sure the method could be also stretched to prove that Finnish is actually, say, a highly divergent dialect of Chinese). Aluckily, according to the data reported in SSA [1], there are several additional pieces of evidence that point to a meaning ‘pitiful’ having existed in Proto-Finnic as well.

  • A “dialectal” meaning ‘pity’ is attested for the Estonian cognate (equivalent? [2]) armas.
  • The derivative *armas-ta- (> Fi. armastaa) has pity-related meanings somewhat more widely, in Karelian, Estonian and Livonian.
  • Finally, an exclusively pity-related parallel derivative from the same root appears to exist: armahtaa ‘to pardon, have mercy on’ — though attested slightly less widely: this appears in most northern Finnic varieties other than Veps, but in southern Finnic in only Votic, and it’s possible though not strictly required to consider it an Ingrian loanword in there.

The last one of these I actually find more interesting yet, though for a different reason. Namely, why does -h- appear here? It is true that the stem of armas in inflected forms is *armaha-, as in the genitive singular *armahan > Fi. armaan (~ armhan in Veps or Kven); but a stem vowel *a is not normally lost before the verbalizer *-ta-, and no consonant stem **armah- exists for this declension class.

One explanation might be haplology. Lauri Hakulinen seems to implicitly suggest this solution in SKRK, [1] listing the word under verbs derived by the momentane suffix *-ahta-. I.e. ‘to suddendly pity’ = ‘to pardon’? After this we’d have to assume contraction of the somewhat awkward stem *armahahta- to the attested armahta-.

This approach however suffers from the problem that Finnish momentane verbs are productively derived only from verb stems, not from nominal stems. Hakulinen only reports five other verbs formed in this fashion. Two of them actully derive from original *-eh-stems and they might be simple *-ta-derivatives after all (repalehtaa, roikalehtaa [3]), and for other two, derivation from a verbal stem does not seem to be possible to rule out (riemahtaa ‘to rejoice suddendly, erupt in celebration’, tipahtaa ‘to drop suddendly’ [4]). This leaves vapahtaa ‘to redeem, liberate’ (← vapaa ‘free’) as the only clear parallel.

Now vapahtaa is of course semantically very close to armahtaa, and this seems like a good reason to suspect that they may have affected one another’s formation in some way. Comparative examination however suggests that it’s probably armahtaa that is the model, and vapahtaa the remodelled verb. As mentioned, the former has cognates in multiple Finnic varieties; meanwhile the latter is restricted to Finnish. The root *armas also seems to be a relatively old Germanic loan, being found everywhere across Finnic, while vapaa < *vapada is a more recent Slavic loan, absent from marginal varieties such as Veps and Ludian. [5] So we have no clear solution here for armahtaa.

I have a different hypothesis in the works, though, that seems to fit in here quite well.

An interesting gap of general Finnic morphophonology is that no words ending in *-ah can be reconstructed for Proto-Finnic, and to my knowledge no corresponding declension can be observed in the modern languages either. This contrasts with a large number of words of the *armas type, ending in *-as : *-aha-; and an equally large amount ending in *-eh : *-ehe- (directly attestable in Karelian: hameh ‘dress’, veneh ‘boat’, etc.) A couple examples of *-es : *-ehe- exist as well (Karel. kirves : kirvehe- ‘axe’), and one or two cases of *-oh (Karel. orih ‘stallion’). Frequently we can also find among these “sibilant-final” words [6] discrepancies between the Finnic languages in the stem type: e.g. Finnish helmi ‘pearl’, a bare *-e-stem, corresponds to an *-es-stem helmes in Estonian, and an *-äs-stem ēļmaz in Livonian.

This makes me suspect that at some point in Finnic prehistory, general morphological levelling may have taken place here; that at one point, stems with a nominative *-ah existed as well, but these were later all reassigned as either *-as-stems or as *-eh-stems.

This is structural speculation so far. But I think there is at least one good reason to suspect the former existence of a class of *-ah-stems: in old enough Germanic loanwords, *s/*z are quite regularly substituted by pre-Finnic *š > Late Proto-Finnic *h. In the case of *-eh-stems, we can indeed find some direct correspondences of this stem type with the Indo-European masculine nominative singular ending *-s (> Germanic *-z): e.g. the above-mentioned *hameh ‘dress’ from PGmc *hamaz, or *padeh ‘path’ (> Karel. pajeh : patehe-) from PGmc *paθaz. [7] But *-as-stems arrive on the scene seemingly quite early, sometimes even in parallel with an *š-substitution: Fi. keihäs ‘spear’, hidas ‘slow’, from PGmc *gaizaz, *sīθaz! This all would surely be easier understandable, if we assumed for early Finnic declension patterns such as *keišäš : *keišäšä-, *šitaš : *šitaša- > ? *keihäh : *keihähä-, *hidah : *hitaha-, later levelled to the directly reconstructible *keihäs : *keihähä-, *hidas : *hitaha-.

I do not yet have a clear enough grasp of the overall picture though to say if the levelling process might have been regular in its output anywhere in the Finnic area — or even, if this should be assumed to have been a pre-Proto-Finnic or a post-Proto-Finnic process. [8]

But armahtaa seems to regardless fit into the framework quite nicely: the word would turn out to be after all a simple *-ta-causative, only one based on a now-lost consonant (nominative) stem *armah! The semantics also fit this picture: as noted above, armahtaa is an exclusively pity-related verb, with no associations of love. Noting again the semantic trajectory of the basic root word — Germanic ‘pitiful’ → presumable earlier Proto-Finnic ‘pitiful; dear’ > later Proto-Finnic ‘dear; pitiful’ > modern Finnic ‘dear’ — this verb was thus probably formed at an earlier time than armastaa, perhaps before the development of the meaning ‘dear’ entirely.

[1] “SSA” and “SKRK”, two indispensible sources in the study of Finnish etymology and morphology, have now been added to my Bibliography page.
[2] I sometimes feel that a pair of words in closely related languages that have the exact same meaning and shape should perhaps be described in stronger terms than “being cognate”. Given that we are usually comfortable saying that a given word “exists”, as a single entity, in several distinct dialects — and that the language/dialect distinction is arbitrary — it might be useful in an etymological context to claim that e.g. English mouse and German Maus are not merely “related”, but in fact the exact same word, just spelled in two different ways. This issue comes up the most often in etymological dictionaries, where a traditional “every language has distinct words” approach will sometimes lead to heavy repetition: “Finnish armas is cognate to Ingrian armas, Karelian armas, Estonian armas, Votic armas…”
[3] Though neither of these is familiar enough to me that I could do a closer semantic assessment of this solution.
[4] These seem like they would likely be derived from riemuita ‘to rejoice’, tippua ‘to drop’ rather than the bare roots riemu ‘joy’, tippa ‘drop’.
[5] Of course, currently Veps and Ludian are anything but marginal when it comes to Slavic contacts; but the oldest Slavic loans in Finnic appear to predate late Proto-Slavic proper (this one as well: reflecting early PSl *svabadā rather than late PSl *svoboda), and they were probably adopted from the archaic Old Novgorod dialect, with a main contact area close to Ingria and the Pskov region.
[6] Recall that Finnic *h < *š.
[7] I do not recall offhand what is the standard explanation of the 2nd-syllable *e of these, though.
[8] This might even have a few repercussions for Finnic historical phonology, but I will refrain from going into the topic for now.

Tagged with: , , , , ,
Posted in Reconstruction

Five Shortcuts to Writing a Heavyweight Etymological Dictionary

Minor apologies for the clickbait-satire title (I do not actually enumerate any shortcuts in this post), but the arriving summer is making me jocular I guess. :)

My current stop on what seems to be turning into an unofficial world tour of etymological research is Dravidian. I was prompted to look into this after noticing a copy of Elli Marlow’s PhD thesis A Comparison of Uralic and Dravidian Etymological Vocabularies (1974, University of Texas) at the University of Helsinki library, and picking it up for examination. This proposed relationship is a curious one in that it always seems to have generated more interest on the Dravidologist than on the Uralicist side. Marlow, too, is a Dravidologist who seems to have little other experience in Uralic studies aside from having easy access to specialist literature by virtue of her Finnish roots.

Her work at least appears to dodge one of the worse pitfalls of comparison between language families, i.e. ignoring pre-existing etymological research and instead data-stripmining individual languages of a family for lookalikes. She refers strictly only to the data of the most up-to-date sources of her day of the old vocabulary in Uralic and Dravidian: Collinder’s Fenno-Ugric Vocabulary (1st edition 1955) and Comparative Grammar of the Uralic Languages (1960); Toivonen, Joki & Itkonen’s Suomen kielen etymologinen sanakirja (1st volume 1958, 4th 1969); and Burrow & Emeneau’s Dravidian Etymological Dictionary (1st edition 1961, followed by a “supplement” in 1968).

The end result is regardless that working mainly on about 1000 etymologies from Collinder, she finds Dravidian parallels for no less than 786 of them. Most of them retain decent semantic and phonetic resemblance, too. I have considered doing a full count, but e.g. taking items #200-#205 at random, we have the following comparisons (with approximate Dravidian reconstructions by me, to save some space; Uralic reconstructions Collinder’s):

  • Uralic *ńärpä- ‘thin, sparse’ ~ Dravidian #ñēr- ‘(to grow) thin’
  • U *jäpće ‘roasting spit’ ~ D #cappū ‘thorn’
  • U *ńola- ‘to crawl’ ~ D #neḷi- ‘to crawl’ / #nur̤ai- ‘to creep’
  • U *ńoma- ‘to creep’ ~ D #nāmpu ‘climbing plant’
  • U *ńowŋa ‘a salmonid fish’ ~ D #naŋku ‘a fish species’
  • U *ńorɜ ‘type of moss’ ~ D #ñir ‘water’

The second of these I would say is entirely unconvincing, and the sixth seems like a stretch as well, but the other comparisons are tolerable at face value. I could further criticize several of them on inter-Uralic grounds, of course. But this altogether still seems to leave a situation where, if not quite in the ballpark of 75%, then still some 30-50% of the best-established Uralic material would have fair-looking parallels in the DED.

The plot twist though is that the DED and its addenda include a bit over 5500 etymologies — a rather bigger pool of data, and it’s obvious that this significantly increases the odds of finding something that appears similar by pure chance.

But instead of continuing to a detailed assessment of the Uralo-Dravidian hypothesis, I have a different question I’d like to ask. Namely: how does one manage to assemble that many Proto-Dravidian etymologies? The family is supposed to be relatively old and diverse, on the same order as Uralic (though clearly less so than Indo-European). Should I be afraid that the dictionary it is an “anything goes” hodgepodge? Or that they are including younger Indo-Iranian loanwords all over the place?

Burrow & Emeneau’s work is currently available online, so this matter is relatively simple to investigate. Again, rather than going in for the full grind, I’ve taken a small sample: the etymologies implicitly reconstructed as beginning with word-initial short *a-, 338 of them altogether. And yes, it turns out that there are a few issues.

The foremost problem appears to be that Proto-Dravidian vocabulary and vocabulary of more limited distribution are mixed at will. Of the 338 words, only 199 appear in at least two main branches of Dravidian; 139 of them are restricted to a single sub-branch — in 118 cases South Dravidian, and even of these a significant portion are found in nowhere else but in Tamil and its close relative Malayalam! It would be easy to get the impression that one is reading not a “Dravidian Etymological Dictionary” but an “Etymological Dictionary of the Native Tamil Lexicon”. And if that were the case, 5500 words starts sounding entirely normal.

Or perhaps slightly more generally: “Tamil, Kannada and Telugu”. Within the wider-ranging comparative material, almost all of the data is represented in two or three of these old major literary languages of southern India. I moreover count 83 etymons that seem to also feature Telugu as the only non-south-Dravidian language to have a reflex, which makes me rather suspicious on if they all should be assumed to descend from the common ancestor of SD and Telugu; especially when the only lesser non-SD language to reach a vocabulary retention rate of more than 20% is Gondi. Even this is apparently actually a highly dialectally diverse grouping, as the DED and also e.g. Glottolog separate it into more than half a dozen sub-varieties.

Now, sure, we could attribute this terrible track record of all the unwritten central and northern Dravidian varieties to simply underdocumentation + massive Indo-Aryan influence. But whenever a DED entry does turn up with a reflex in one of these, it usually appears in quite a few others just as well. The distribution appears to be a bit too neatly divided between “poorly attested” and “well-attested” cases. So my suspicion would be that additionally quite a few of the DED entries represent old cultural loanwords that have diffused between the written Dravidian languages, and which do not go back to Proto-Dravidian (possibly not even Proto-South Dravidian)…

Another possible issue yet is that the known main Dravidian subgroups might still form some intermediate units. In doing some related research, I’ve run into some interesting claims from David W. McAlpin (perhaps best known for the Elamo-Dravidian hypothesis) along these lines — e.g. a suggestion that “Northern Dravidian” does not exist, and instead Brahui (the northwesternmost Dradivian language, spoken in southern Pakistan) and Kurukh-Malto (the northeasternmost Dravidian languages, spoken in East India) are primary branches of Dravidian, along with everything else a third branch. Geographically, this would certainly make sense; we know that the Indo-Aryan languages have entered Southern Asia from the north, and surely they must have in this process taken over at least some amount of territory that had formerly been Dravidian-speaking. McAlpin even has a paper where he suggests that Brahui might be closer related to Elamite than to Dravidian proper; though I do not know if he has held on to this idea.

In any case it looks like that distilling together some kind of a core Dravidian lexicon from the DED would be recommendable for any wider comparative purposes. Reaching all the way down to e.g. vocab found just in Tamil and its closes neighbors, or Telugu and its closest neighbors, is exactly the “data-stripmining” problem I mentioned Marlow having seemingly dodged. But, apparently, not one of her main sources.

Tagged with: , , , ,
Posted in Commentary, Methodology

Finno-Ugric đ

A note for my Finnish-proficient readers and other interested people: I’ve transcribed and uploaded Arvid Genetz’ presentation Suomalais-ugrilainen đ ensimmäisen ja toisen tavuun vokaalien välissä (published 1896) on Wikisource.

This is one of the three works to have come from Finland in the mid-1890s where people seemed to discover just about in parallel that no, original *t does not become Hungarian /l/, and that an entirely different proto-consonant needs to be set up instead for words like Hu. elő ~ Finnish ete- ‘fore, front’; Hu. tele ~ Fi. täyte- ‘full’. A nice demonstration of the internal consistency of the comparative method.

Genetz here does not manage to distinguish the plain and palatal variants *d₁, *d₂ though; that detail would be instead proposed by his more famous student E. N. Setälä.

It’s an interesting read from a 21st century perspective. Plenty of standard etymologies are already in place. There however remain several etymological comparisons that I can only describe as pre-scientific (e.g. Hungarian bűz ‘to smell’ ~ Finnish mätä ‘rotten’). Others yet look like they might be onto something, but seem to have remained without further investigation over the last dozen decades, e.g. an analysis of Fi. hiukka ‘small amount’ as a deminutive of hitu ‘thin bit, flake’.

The Finnish of the times is interesting to read, too. Dated enough to be obviously old, but still perfectly understandable. Perhaps the strangest thing might be a detail of Genetz’ phonetic terminology: he describes /ð/ as alveolaarinen vs. Hungarian /z/ as dentaalinen, the exact opposite from what we’d expect today.

Tagged with: , , ,
Posted in Uncategorized

Geography-constrained family trees?

A complaint that often comes up in introductions to studies on computational phylogenetics is that the number of possible binary trees grows quite fast (loosely factorialesquely) as a function of the number of entities we are attempting to relate. This means that regardless of what method is being used, it is for larger datasets usually not possible to investigate all possible trees on how well they would fit the data.

I wonder though if there might be though a shortcut that would allow simplifying linguistic analyses, in particular, a fair bit: geographical restrictions on branching. Languages, for a rough generalization, exist on territories within the 2D surface of the Earth; they have a number of neighbors, and usually have a neighboring language as their closes relative. Languages far separated from their relatives tend to be exceptional cases, often confirmable as recent intrusions by history.

So, instead of investigating arbitrary binary trees, we perhaps ought to only investigate binary trees where each division leaves two currently or historically contiguous groups of daughter languages. This restriction seems likely to be quite powerful in paring down the seemingly intractably vast hypothesis space.

Consider the simplest possible class of examples, the linear dialect continuum:

  • If given a western dialect A, a central dialect B, and an eastern dialect C, the only sensible phylogenetic hypotheses are ((A B) C) and (A (B C)); the tree with B as an outgroup is rejectable.
  • if given four dialects A B C D, we will accept five trees (A (B (C D))), (A ((B C) D), ((A B)(C D)), ((A (B C))D), (((A B) C) D), and reject ten.
  • if given five dialects, we will accept 14 trees and reject 91.
  • if given six dialects, we will accept 42 trees and reject 903.
  • if given seven dialects, we will accept 132 trees and reject 10 263.
  • if given eight dialects, we will accept 429 trees and reject 134 706.
  • etc; as can be seen, the efficiency of this pruning method grows quite fast.

As I’ve seen recently remarked, 50 languages suffice to generate about 3·10⁷⁶ binary trees, almost as many as there are atoms in the universe (not actually more, but close enough). — Meanwhile string bracketings grow rather more slowly, and with 50 dialects we only reach about 2·10²⁷.

This is still huge enough (i.e. about 2 000 000 000 000 000 000 000 000 000 trees) that I don’t think my method here will help much on studies on Austronesian phylogeny in particular. But in smaller cases, with on the order of 10-20 language varieties under investigation, the difference will be quite significant.

In a real-world situation, dialect geography rarely works quite this easily though. I suppose a slightly better test model might be polyhex connectivity graphs — these allow up to 6 neighbors for each individual language, something that ought to suffice for most cases. I might be tinkering with these in the near future.

Obviously this constraint can still be violated by occasional majorly expansive languages, e.g. Persian. I suspect we do not have sufficient information to tell right away which of the dozens of minority Iranian languages were its original neighbors. Though in these cases a solution that’s in principle available is to divide such a language itself into a large number of tiny dialects, and treat each of them separately…

Tagged with: , , ,
Posted in Methodology

Enter your email address to follow this blog and receive notifications of new posts by email.


Get every new post delivered to your Inbox.

Join 31 other followers