CIFU 13 announced

The 13th International Congress for Finno-Ugric Studies, to take place in Vienna in August 2020, is now fully announced: symposia have been settled and paper submission is open. Most people who would be interested in participating likely have gotten also the usual email circulars, but perhaps some readers will be reminded by this post; maybe even to just pop by for a visit to listen to some presentations.

I will be participating too of course. Perhaps with more than one presentation this time, even (but no promises just yet).

Tagged with:
Posted in News

The treatment of /f/ in Finnic

Loanwords from Germanic and, more recently, Russian have been feeding *f into Finnic for a good while. Today /f/ has been established as a loanword phoneme in most Finnic varieties (including, I think, all of the literary standards), but for most of the last 2000 years, the consonant has been adapted into native Finnic phonology in various shapes.

Five substitutions are usually recognized:

  1. *f → /p/
    Mostly in oldest loans from Proto-Germanic or Proto-Scandinavian. The oldest examples could feasibly even precede Grimm’s Law, and therefore actually involve *pʰ → *p (the likes of *pëlto ‘field’). Others can be dated as slightly later, e.g. *pasto ~ *paasto ‘fast’ ← Gmc. *fastōn-, showing the probably relatively late *ā > *ō. A few examples are found even in much more recent loanwords such as Fi. porstua ‘porch’ ← Sw. förstuga or förstuva; Fi. upseeri ‘officer’ (perhaps since expected **vo and **hs are or were not phonotactically possible).
  2. /f/ → ∅
    In initial consonant clusters, e.g. Fi. läski ‘(pork) fat’, riski ‘strong’ ← Sw. fläsk, frisk.
  3. /f/ → /v/
    This is found initially (Fi. vaari ‘grandfather, old man’ ← Sw. far ‘father’) and after a consonant (Fi. konvehti ‘confectionary’). I suspect the switch from the first substitution pattern to this marks the onset of *w > [ʋ]. This may have been completed only after Proto-Finnic, since several dialects of Finnish have been recorded even in the 20th century with [w] adjacent to rounded vowels: kuva [kuwa] ‘picture’, vuosi [wuosi] ‘year’, vyö [wyø] ‘belt’ etc. Dialectal variants such as kasva- ~ kasua- ‘to grow’, kivi ~ [kiw] ~ [kiu] ‘stone’ could also speak in favor of *kaswa-, *kiwi and not **kasva-, **kivi as the starting points. Likewise the Estonian metathesis *Vuh > /Vhv/, more easily rewritten as *wh > *hw.
  4. /f/ → /h/
    Found initially preceding a labial vowel (Fi. huotra < *hootra 'scabbard' ← Gmc *fōdra-) and word-internally preceding a consonant (Fi. luhti ‘loft’, sahrami ‘saffron’, uhri ‘sacrifice, offer’).
  5. /f/ → /hv/
    Found between vowels, e.g. Es. Fi. sohva ‘sofa’, Es. kohv ~ Fi. kahvi ‘coffee’ (contrast though Livonian and dialectal Fi. kaffe, Karelian koffi ~ koofi ~ koufi etc.)

Altogether we have, in the newer layers, /h/-substitutions for preserving voicelessness, /v/-substitutions for preserving labiality and continuancy, and /hv/ for covering both.

There’s however also a sixth that I have usually not seen mentioned: substitution as /uh/ (~ /yh/), unpacking the consonant in the opposite order from the kahvi type. At least two other examples appear to be known. One is the Russian loan ‘kaftan’: Fi. Krl. Izh. Vot. kauhtana, Ludian–Veps kauhtan. (Estonian has rather kahvtan.) The other is Fi. Krl. Izh. tiuhta ‘reed; awn’ ← Gmc. *stifta- (> Sw. stift), with the sound development remarked on in LÄGLOS in the word’s entry (in the 3rd volume), but not in the foreword overview of sound substitutions (in the 1st volume). I think a few additional examples could be adducible too:

  • Fi. Izh. vyyhti ‘weft’ (← Gmc. *wifti-), whose vowel length is usually attributed to sporadic lengthening before coda /h/ and labialization to irregular influence from /v/. But Karelian shows viyhti; I think this is likely to be more original. In Finnish and Ingrian, evidently *iü > yy. Finnish and Karelian dialects plus Ludian show also viihti ~ viihť, which could be instead the real example of secondary lengthening (but also maybe a parallel development of *iü).
  • *riuhto-, *riuhtat- ‘to rip, tug’ (Fi Krl Izh Lu), with a variant reuhto- in Finnish. Maybe a derivative from Germanic *rīfan- (> Sw. riva) or *reufan (> Eng. reave) ‘to tear’? For *t-suffixed forms, I only know of the noun rift though.
  • Fi. töyhtö ‘tuft, crest’, Krl. töyhäkkä ‘fluffy’, [1] töyhistyö ‘to puff, bristle up’ (probably ← Fi, per öy). Has immediate resemblance with the English, though Scandinavian only seems to have an s-affixed variant tofs (→ Fi. tupsu ‘tuft, tassel’). Looking at Low German could maybe turn up a suitable loan original?

It can be noted that all of these examples occur in the context *-Vft-. This is probably not an accident: **-fk- does not occur in Germanic (possible enough in Russian though, and giving e.g. colloq. Fi. lafka ‘store’ ← Ru. лaвка), while **-VUhR- does not seem to occur in Finnic. A few rare examples of -VihR- can be found, but usually with simplified variants alongside: in Finnish e.g. (standard form first) kaisla ~ kaihla ~ kahila ‘reed’, laina ~ laihna ‘loan’, raihnas ~ raina ‘decrepit, geriatric’, saiho ~ saihvo ‘corral, pen’.

[1] Krl. töyhäkkä has also a 2nd sense ‘haughty’, which together with töyhteä ‘to fuss about’ are probably better compared with Fi. touhuta ‘id.’, touhu ‘fuss, bustle’ (with typical affective/deminutive fronting).

Tagged with: , , , , , ,
Posted in Etymology

Dravidian etymostatistics: a rough look

Burrow & Emeneau’s classic Dravidian Etymological Dictionary (DED) has been conveniently available online for a while.

I find the online version a bit too spartan though, at least for browsing purposes: when a dictionary has 500+ pages and 5500+ etyma, one would want to be able to find things a bit more effectively than just leafing through at random. The preface has page numbers for sections by letter sections, but these are, unfortunately, unlinked. The print version has by-language indices, but they have been forgone in favor of a search function in the online version. A search function however only works for finding things one already knows of. Also, the “real” lemma of the entries, according to which they are alphabetized, is actually not even printed anywhere! This is a virtual Proto-Dravidian form (I suspect not necessarily valid for the relatively poorly known Northern and Central Dravidian, so maybe more like Proto-Southern Dravidian). If there is a Tamil descendant, it is usually reasonably close to the virtual PD forms used, but often enough there isn’t.

For some added convenience, I’ve thus put together a page-by-page index of my own, recording the first lemma form occurring on each page, and the first few phonemes of what the underlying reconstruction appears to be (though some of these might well be incorrect).

These 503 page-leading entries (I’m ignoring the Appendix in the analysis below) also work as a random sample of sorts of Dravidian reconstructions, and they allow a rough look at the statistical properties of the data.

For strong results on Proto-Dravidian, full by-language stats on the reflexes would be needed, e.g. to filter out data restricted to particular subgroups. This would be quite a bit of work however. But I have recorded the “lemma language” — the language from which the first reflex is given. DED uses a stable order of languages, and the lemma forms run through this list in preferential order: Tamil if available, if not then Kolami, if not that either then Malayalam, etc. This means it’s possible to get accurate reflexation rates for Tamil from just this single sample.

We can also take a look at the distribution of the lemma languages:

  • Tamil: 353 (≈ 70.2%)
  • Kannada: 49
  • Malayalam, Telugu: both 13
  • Kota: 11
  • Kolami, Kuṛux: both 10
  • Kui: 8
  • Konḍa, Parji: both 7
  • Gondi: 6
  • Pengo, Tulu: both 4
  • Toda: 2
  • Ālu Kuṟumba, Iṛula, Koḍagu, Kuwi, Maṇḍa, Naiki: 1 each

The three other big literary languages unsurprizingly come out on top next to Tamil. Otherwise the order is probably due to factors other than the size and degree of documentation, though. Kolami, as mentioned, is the #2 go-to variety after Tamil, and indeed scores 10 lemma appearences, not far from the larger Malayalam. However Kurux, quite far down the priority list, also reaches the same! This is probably because Kurux is one half of the distinctive Northeastern Dravidian group (together with Malto, which does not appear here), which seems to have a decent amount of unique vocabulary, without parallels elsewhere in Dravidian. Similar cases are Kolami, Kui and Koṇḍa; the first as the largest Central Dravidian language, the latter two in their own distinctive sub-branches of South-Central (or “South II”) Dravidian.

Initial consonants number as follows:

  • k: 100 (Tamil: 72 = 72%)
  • ∅: 99 (Tamil: 73 ≈ 74%)
  • p: 67 (Tamil: 45 ≈ 67%)
  • m: 56 (Tamil: 40 ≈ 71%)
  • t: 54 (Tamil: 35 ≈ 65%)
  • c: 50 (Tamil: 33 = 66%)
  • v: 37 (Tamil: 26 = 70%)
  • n: 27 (Tamil: 20 ≈ 74%)
  • ñ: 5 (Tamil: 5)
  • y: 3 (Tamil: 3)
  • : 3 (Tamil: 0)
  • l: 1 (Tamil: 1)
  • r: 1 (Tamil: 0)

We see here fairly even representation in Tamil, hovering around 70% as could be expected, as well as the phenomenon where only a rather limited selection of consonants have been originally possible word-initially in Dravidian.

For vowels I have counted not just word-initial cases, but rather all first-syllable vowels (so a includes a-, ka-, ca- etc.):

  • a: 152 (Tamil: 107 ≈ 70%)
  • u: 83 (Tamil: 64 ≈ 77%)
  • i: 61 (Tamil: 47 ≈ 77%)
  • e: 46 (Tamil: 24 ≈ 52%)
  • o: 43 (Tamil: 28 ≈ 65%)
  • ā: 44 (Tamil: 33 ≈ 75%)
  • ō: 20 (Tamil: 11 = 55%)
  • ū: 20 (Tamil: 14 = 70%)
  • ē: 19 (Tamil: 13 ≈ 68%)
  • ī: 15 (Tamil: 8 ≈ 53%)
  • (ai: 3, au: 0 — from Tamil only, included here in the a counts)

These now show a fairly different distribution. The cardinal vowels a ā i u ū are represented at about 74% altogether, slightly above the counts from the previous section. The mid vowels as well as ī are by contrast left at only 59% altogether. This difference probably indicates some development of real history. Several possibilities come to mind:

  • maybe Tamil is more archaic, and in the other Dravidian languages, several instances of mid vowels are secondary;
  • maybe the other languages are more archaic, and in (some subgroup including) Tamil, there has been a partial shift from mid to non-mid vowels;
  • maybe the disparity results from differences in post-PD vocabulary that has spread with contacts;
  • maybe the disparity results from the Tamil lexicon being more thoroughly documented, so that e.g. Indo-Aryan “technical” loanwords (likely to have a higher percentage of the cardinal vowels) are better represented.

More detailed comparison would be however required to figure out which, if any, the case is.

Medial consonants are more varied still. I’ve included sub-counts for the nasal+stop clusters and geminates (no other clusters seem to have occurred in Proto-Dravidian; “extended” nasal+geminate cluster series are proposed for PP ~ NP correspondences between languages, but these would take more than just casual eyeballing of lemma forms to identify):

  • : 71 (Tamil: 53 ≈ 75%) (ṭṭ: 27)
  • r: 71 (Tamil: 48 ≈ 68%)
  • k: 47 (Tamil: 32 ≈ 68%) (kk: 18)
  • l: 38 (Tamil: 24 ≈ 63%) (ll: 6)
  • : 35 (Tamil: 24 ≈ 69%) (ṯṯ: 2)
  • : 31 (Tamil: 26 ≈ 84%) (ḷḷ: 7)
  • : 31 (Tamil: 25 ≈ 81%)
  • t: 27 (Tamil: 19 ≈ 70%) (tt: 6)
  • c: 22 (Tamil: 12 ≈ 55%) (cc: 5)
  • : 23 (Tamil: 16 ≈ 70%) (ṇṭ: 13)
  • m: 20 (Tamil: 13 = 65%) (mp: 7)
  • : 19 (Tamil: 18 ≈ 95%) (ṉṯ: 0)
  • y: 17 (Tamil: 13 ≈ 76%)
  • p: 12 (Tamil: 7 ≈ 58%) (pp: 9)
  • v: 11 (Tamil: 8 ≈ 73%)
  • n: 7 (Tamil: 4 = 57%) (nt: 5, including all of the Tamil cases)
  • ñ: 8 (Tamil: 4 = 50) (only in ñc)
  • ∅: 7 (Tamil: 6 ≈ 86%)
  • : 6 (Tamil: 3 = 50%) (only in ṅk)

There are now also some new top scores in representation. The 6/7 count for zero “medials” (in fact mostly monosyllabic roots) is probably just due to pronoun roots & similar grammatical elements often being shorter than proper lexical roots, and being likely to survive more widely.

The coronal nasals seem to indicate a real sound change. Alveolar is highly common by itself, while dental n is absent entirely, aside from the cluster nt (in this sample at least). Cf. that word-initially however only n occurs, not . But it seems likely already from this data that these were allophones of each other at some point. And, quite obviously, no palatal **ñ or velar **ṅ should be recognized as distinct either.

The retroflexes ṭ ṇ ḷ r̤ [ɻ] are fairly strongly represented, while palatals c ñ fairly weakly, but again this could have many explanations.

I’ve taken a brief look at the co-occurrence of these three root phoneme positions too. Nothing really extraordinary turns up, though. *v- plus labial vowel is disallowed, a few phonetically awkward or unstable combinations like *ki- *ke- *-iṭ- *-eṭ- are somewhat rare, Similar Place Avoidance turns up among the C…C combinations. The weirdest-looking total gap is **c…r̤; and on a closer look even this is accidental (the full DED does include a few cases, none just happen to be the first on their page).

Tagged with: , , , , ,
Posted in Commentary, Reconstruction

Thesis release, DIY edition

One would think finishing a thesis were enough to stop needing to worry about it, but sometimes not.

Earlier this year I finished my Master’s thesis on the origin of the long vowels in Finnic languages (after about three years, three advisors and three downsizes in coverage). The topic has been for long under debate, but seems to now have settled on a new fairly economical solution. This hinges on what I call Lehtinen’s Law: in early Finnic, *a *ä lengthen under fairly specific conditions and are then raised to *oo *ee. What I have ended up covering is a detailed overview of the earlier research and what the big picture looks like currently, including several new details: e.g. the observation that several loanwords from Indo-European, such as *soola ‘salt’, provide corroborating evidence for a development *aa > *oo in Finnic. (This will be, I hope, a prelude to a string of papers where I aim to rework the phonological reconstruction of Proto-Uralic into a less Finnocentric mold.)

Unfortunately something in the University of Helsinki thesis repository backend has been broken for several months now, with no ETA for a fix, and new Master’s theses have not been uploaded since March. Until recently all I have offered interested colleagues asking is that the thesis “will be coming” online at some undefined point (placeholder here). Over at this blog, I don’t think I’ve more than passingly alluded to it being finished.

The time is ripe to do something about this myself though. I’ve recently put together a slightly fixed version (mainly inserted missing references, or the details of some that were still “forthcoming” by the time I left it in) and just emailed a copy to some people. On the weekend I’ve also uploaded the work on for public access. If you’d rather download it right away, I also have a direct download link.

I am thinking of putting together an English summary of this at some point, since the topic is likely of interest also beyond people who read Finnish. For starters, however, you can already check out e.g. the various vowel correspondence tables: p. 58 (a rough outline of the default development of Proto-Uralic non-close vowels *e, *ä, *ë, *a, *o), p. 77 (the cognates elsewhere in Uralic of Finnic *oo) and p. 88 (same for *ee).

Comments and questions are welcome, on this post or through other channels (email, PMs, etc.)

Tagged with: , , , , , , , ,
Posted in News, Reconstruction

A research project wishlist

I’m only starting out on real scientific publishing (it looks like my first squib-size article, currently in peer review, will be out in early 2019), but during the years I’ve run this blog and worked on my thesis, I’ve already racked up a fair-sized publication plan and stack of article drafts. There will be roughly one for each of the various conference presentations I’ve given so far, maybe a dozen that would expand on various blog posts, and a handful of thesis work leftovers. Many others have not been announced in any fashion.

Looking at the far end of the list though, I think I’ve been tacking on also ideas that aren’t really research plans as much as things I wish someone would do. Many of them call for substantial background work, and in the foreseeable future of 5-10 years, they will be unlikely to fit on my plate. The following are free for grabs, if anyone reading by any chance happens to be looking for research project ideas:

  • An updated handbook on the history of Finnish — the last updated version of Hakulinen’s Suomen kielen rakenne ja kehitys came out in 1979, and a lot has happened since then. In particular the overview of native vs. borrowed components in the Finnish lexicon seems long out of date.
    — I would likely start on some component of this myself if nothing has happened by let’s say 2030, but that’ll be a while still.
  • A study of the lexicon of Kukkuzi Ingrian/Votic. Researchers have waffled back and forth on if this Finnic variety should be considered a variety of Votic with an Ingrian superstrate, or a variety of Ingrian with Votic substrate, mostly on phonological and morphological criteria. With the 2012 release of the extensive Vadja keele sõnaraamat, it should be possible to investigate if there is also an anomalous amount of vocabulary that’s either present in Kukkuzi and absent elsewhere in Votic, or absent from Kukkuzi but well-represented in the other Votic dialects.
  • Similar studies to the previous could be probably also done to check how coherent “Ingrian” really is (with or without Kukkuzi) — the main varieties are clearly delineated, and all show their own similarities varyingly also with Votic, Ingrian Estonian, the two Ingrian Finnish varieties (Savakko and Äyrämöinen), Southeastern Finnish, and Karelian. There could be other Kukkuzi-esque misanalyzed varieties mixed in here as well.
  • A comparative reconstruction of Proto-Hungarian, based on not just the philological Old Hungarian evidence but also the evidence of the various Hungarian dialects. Handbooks sometimes state that all the modern dialects could be derived from “Middle Hungarian” circa 1500–1600, but this is obviously nonsense at least in the case of the Székelys. Many other dialects could also have diverged earlier, only to be later assimilated back towards the mainstream. Loanword evidence would be also important (for one thing, they completely destroy the theory that Old Hungarian would not have had vowel length), and obviously Uralic ancestry would have to be kept an eye on too. — Sometimes the term “Proto-Hungarian” is used instead for the prehistorical pre-migration ancestor of Hungarian, but I cannot recommend this practice: this time depth is firmly within the “single-branch” phase of Hungarian and cannot be probed by the comparative method.
  • A study of substrate in Ob-Ugric. Mansi and Khanty gain their similarity from at least three sources: the two are related (minimally within Uralic), they form a common language area (as shown by isoglosses that only cover parts of both languages) and share later contact influences (most importantly from Komi, Tatar and Russian), but on archeological and anthropological grounds, an additional fourth source could be a pre-Uralic substrate of western Siberia (Helimski’s “Yugra”). What comes up if we apply modern methods of substrate language research to the two?
  • A comparison of the Ugric and East Uralic hypotheses. There is by now a good amount of data that has been collected purportedly in support of a common Ugric (Hungarian–Mansi–Khanty) group within Uralic; but it has been pointed out that the original and clearest point of evidence, the rearrangement of the PU sibilant system (traditional formulation: *s *š *ś > *θ *θ *s, later *θ > Hung. ∅ ~ Mansi *t ~ Khanty *ɬ) applies also to Samoyedic, leading to a larger grouping recently named “East Uralic”. This is the case for at least a few other features too. Does all this end up showing that either or both of these groups should be considered areal?
    Some other possible sub-angles include: is some of the common Ugric vocabulary better considered loanwords e.g. from Hungarian into Ob-Ugric? Can previously unidentified OU-Samoyedic cognates be found? How many of the commonalities could be potentially interpreted as shared retentions rather than shared innovations? How does the alleged Ob-Ugric subgroup compare with either hypothesis?
    — I will be doing at some point at least the related comparison of the East Uralic hypothesis with its clear opponent, the long-standing Finno-Ugric hypothesis (which, as far as I can tell, has always remained merely a glorified assumption that has never been studied in detail, either pro or con).
  • A bibliography of Indo-Uralic studies, either a simple list of works, or a more detailed breakdown by etymology. It would be interesting to see e.g. how much of the compared material across the times is individually reconstructible within the two families … there is sometimes “cherrypicking” of words from just one subfamily, and in at least some cases they turn out to be clearly better analyzed as loanwords from IE into Uralic.
  • Studies on the history of extensively spread areal sound changes. Two that come to mind easily are w > v, found pretty much everywhere between the Atlantic Ocean and the Urals; and p > ɸ > h/f, found across Eurasia roughly in a belt from Hungary to the Aleuts, as well as across most of Northern Africa plus Arabia. It is not clear to me if the two last-mentioned are really two separate areas, or rather just one, or perhaps more than two.
  • A look at what level of language Zipf’s Law follows — orthography, phonology, phonetics? (This could have been done already, I have not searched for this in detail.)
Tagged with: , , ,
Posted in Commentary

New and updated links

Updates to blog sidebars are easy to overlook. So, this is to note some historical-linguistics-related journals or publication series available online that I have added links to recently:

  • A nyelvtörténeti kutatások újabb eredményei
    Article collection series from University of Szeged. The archive includes also several smaller release series, and the more specifically Hungarological series A mai magyar nyelv leírásának újabb módszerei.
  • Keleti szemle / Revue orientale (previously tangentially mentioned on Tumblr)
    An early 20th century Hungarian journal. From the viewpoint of this blog, one noteworthy contribution is Heikki Paasonen’s six-part article series “Beiträge zur finnischugrische-samojedischen Lautgeschichte” in vols. 14–17 (1912–1917). This is maybe the culmination point of pre-World War Neogrammarian efforts in Proto-Uralic phonological reconstruction, charting out consonant correspondences from PU to Samoyedic in nearly the same shape as known today. (I am considering posting more detailed a commentary later on.)
  • Magyar nyelvőr
    Long-running Hungarian linguistics journal. Recent issues are available online too, but I’ve linked instead the older archives, which include also some general comparative studies. E.g. volumes 11–12 (1882–1883) have, in several installments, Munkácsi’s lengthy re/overview of the Finno-Ugric theory as seen through Budenz’ comparative dictionary. (The archives could really use an index, though.)
  • Studia orientalia (previously tangentially mentioned in my post on Meshcheran)
    An ongoing e-journal on one hand, a sizable monograph etc. archive on the other, covering several fields: literature, linguistics, sociology, ethnography etc. The most common topics are Indology and Altaistics, with occasional coverage also from Africa and elsewhere in Asia. A few of the article collections even have Uralistic and Indo-Europeanist tidbits, e.g. Bertil Tikkanen and Asko Parpola‘s Festschriften.

I have also fixed the broken Fenno-Ugrica Suecana link, and added a permalink to my post indexing the Studia Uralo-Altaica series, which seems to be a popular visitor destination.

Tagged with: , ,
Posted in Commentary, Links

Notes on the phonology of Kamassian

For a language family mostly made up of minority languages, Uralic is really quite well documented by any standards. Most of the smaller languages have received decent descriptions already in the 19th century, and many also theoretically updated reflections later on in the 20th and 21th. The big exception had for long been the Samoyedic languages, with the literature being mainly dictionaries and comparative studies, and only Tundra Nenets being described in good detail by multiple scholars. By now however, even the Samoyedic situation has improved. Within the last few decades, Nganasan has received a reference grammar by both Wagner-Nagy (2002) and Katschmann (2008), Forest Nenets by Pusztay (1984), Forest Enets by both Künnap (1999a) and Siegl (2013), Kamassian by Künnap (1999b), Mator by Helimski (1997) (and that’s all before me having looked too much into the literature in Russian). For Selkup I’m not sure of a source worth singling out, but there’s a fair amount of scattered literature. Tundra Enets is the only well-delineated variety I do not know to have been specifically covered by anyone, though it’s also very close to Forest Enets anyway (treating them as two different languages entirely has not struck me as warranted).

At this point it then seems that not only has descriptive work on Samoyedic caught up with comparative work, on a few fronts the former has passed the latter altogether. Janhunen’s classic Samojedischer Wortschatz from 1977 closely follows his 1976 reconstruction of the vowel system of Proto-Northern Samoyedic, [1] and in particular southern Samoyedic data seems to not be quite systematically integrated into the reconstruction. This task has been later accomplished only for Mator in Helimski’s monograph. For Selkup, there have been numerous specialized studies, [2] but AFAIK no substantial synthesis so far.

The situation is the most haphazard for Kamassian. There are overview notes in comparative Samoyedic works like Paasonen (1913–1917) and Mikola (2004), and also in Collinder’s even more generalist Comparative Grammar; as well as two monographs by Künnap on historical morphology. But that’s about it. To my knowledge, no-one has ever taken a good detailed look at the basic phonological and etymological issues of the language.

With today’s resources, this will be not an especially hard task. I already have my WIP database of proposed Proto-Samoyedic etymologies in a decent shape. Extracting a list of etymologies extending to Kamass and/or Koibal takes 30 seconds; gathering the corresponding reflexes from SW a bit longer, maybe two days’ worth of work. Double-checking other sources would take longer though. This is especially due to inconsistent treatment of the most thorough records available, those of Kai Donner in the 1910s. Janhunen gives what I believe is Donner’s full transcription, but others seem to habitually drop “unnecessary” diacritics.

However, if we don’t have a solid grasp of even the basic phonological inventory of Kamassian, can we really be sure which diacritics are “unnecessary”? Overviews like Künnap’s grammar or the 1998 handbook on Uralic languages seem to present basically a bare minimum system with just the “base letters” extracted. Which is a good place to start from, but already a look at the other Samoyedic languages (e.g. Selkup with its giant vowel system, or Nenets with its extensive palatalization) suggests that more complexity could have plausibly existed.

I offer here, for starters, two suggestions for amending the synchronic phonology. A full survey of the Donner materials would be required really, but already the inherited Samoyedic component seems to allow tentative conclusions.

The uvular stop /q/ should probably be recognized for Kamassian (as also for Selkup). Donner transcribes [q] versus [k] (actually [qʰ], [kʰ] in most cases, but I will ignore aspiration) [3] somewhat but not totally consistenly depending on the following vowel:

  • before a, å: 2×k, 20×q
  • before e, ɛ: 5×k
  • before ə: 3×k, 2×q
  • before i: 1×q
  • before o: 2×k, 9×q
  • before u: 9×k, 6×q
  • before ʉ: 5×k

So there is potentially decent evidence for a contrast /k/ : /q/ at least before u. Importantly, this seems to be etymological. ku- is mostly from *ku-, *ko- (3× *kå-), while qu- is mostly from *kå-, *kə- (1× *ko-).

Before ə the situation also looks promising at first, but may not stand up to scrutiny. The distribution is again etymological: the three cases of kə- are from *kë-, *kï-, *ku-, while the two cases of qə- are from *kə-. However, two of the three former cases are actually transcribed with ə̣ (= turned ė; there is no ‹ə› as a base symbol in UPA), which seems to be better considered an allophone of /e/ or perhaps /i/. This vowel mostly continues earlier PSmy front vowels *i, *ɛ, *ə̈; it takes front vowel harmony (ə̣d-ľɛm ‘I am visible’, mə̣ɛm ‘I sell’, sə̣βəʔ-ľɛm ‘I take out’), varies with e in Donner’s transcription (bej-ľɛm ~ bə̣j-ľɛm ‘I go over’), often corresponds to Castrén’s i (C mija ~ D mə̣j`ɛ ‘earth’) and can be sometimes seemingly triggered by a present or lost palatal consonant (ə̣ŕåŋ ‘drill’ < *pərəjəŋ, nə̣mi < *jumpə ‘moss’). In other cases there’s instead back ə̑, which mainly continues *ə, and might be even analyzable as an allophone of /a/ (with the result that there would be no ˣ/ə/ in Kamassian after all). If so though, then the apparent | contrast could be mostly rendered as a contrast between /Ke/ and /Ka/, instead of /k/ versus /q/. (There still remains a near-minimal pair, though: kə̑m ~ kɛm ‘blood’ < *këm | qə̑ʔ ‘pus’ < *kət.)

Word-medial cases of Proto-Samoyedic *k are not common, but where not palatalized, they also seem to show a somewhat Turkic-style split between velar g (nāgur ‘3’ < *nakur) and uvular ʁ (ťaʁa ~ ďåʁå ‘river’ < *jəkå), which perhaps remains analyzable as allophonic.

I also believe vowel length was phonemic in Kamassian. Donner and Castrén are not completely consistent with each other on this, but there are several indications that lenght is regardless neither random nor context-dependent.

As far as basic statistics go, the different Proto-Samoyedic vowels are split in two clear groups: the open vowels *ä *å and especially *a, and mid *o often yield long vowels, while the close vowels *i *ü *ï *u, reduced *ə and mid *ë only rarely do. Ignoring quality changes and counting half-long vowels (à ù etc.) as short for now, the quantity reflexes are as follows:

  • *a > short ×24, long ×26 (52%)
  • *ä > short ×47, long ×23 (33%)
  • *å > short ×47, long ×16 (25%)
  • *o > short ×19, long ×7 (27%)
  • *ë > short ×25, long ×1 (4%)
  • *u > short ×26, long ×5 (17%)
  • *ï > short ×10, long ×2 (17%)
  • *ü > short ×18, long ×0 (0%)
  • *i > short ×51, long ×2 (4%)
  • *ə > short ×68, long ×6 (8%)

Of course open vowels normally tend to be longer than close ones, so this kind of a chart is to be expected. However, the qualitative changes in the vowel system mean that at minimum ā, å̄ < *a are effectively in contrast with a, å < *ə. At least ō and ē seem to be also phonemic. Minimal or near-minimal pairs can be found (å replaced by a for clarity):

  • tar̀ ‘hair’ < *tə̈r | tār ‘gills’ < *čar
  • qăn̆-ńȧm ‘I freeze’ < *kəntɜ- | māⁿ-ńim ‘I measure’ < *mančə-
  • qăzɪl̀ ‘wart’ < *kəsər | qāzə̑ra ‘nutcracker’ < *kasɜra
  • kora ‘reindeer bull’ < *korå | kōla ‘fish’ < *kålä
  • le ‘bone’ < *lë | ďē ‘heel’ < *jä, ‘woman’ < *nä
  • pel-ľɛm ‘I put’ < *pën- | sēlə-ľɛm ‘I sharpen’ < *sälä-

The analysis can be improved by noting also some conditional developments leading to the “wrong” vowel length. For one, before consonant clusters or stem-final obstruents, almost no long vowels occur. For two, Donner’s records have also words with stressed or long vowels in the 2nd syllable; these, too, never seem to have long vowels in the 1st syllable. A near-minimal pair of this kind is toli· ‘thief’ < (? *tōlī <) *tåläjə, vs. tōlu ‘darkness’ < *tålwə. For three, long vowels from *close vowels, *ə or *o often seem to be the result of vowel contractions, and after taking this into account, *o patterns together with the other mid vowels after all:

  • *i: pīdi ‘thumb’ < *pij-
  • *u: ďēdər-ľɛm ~ ťʉ̄dərə-ľɛm ‘I dream’ < *jujtə-, ńī ‘child’ < *ńuə(j), šʉ̄ ‘fire’ < *tuj
  • *ə: bʉ̄źɛ ‘husband’ < *wəjs (surely rather *wəjsä?), ťīma ‘tail’ < *təjwå; also, from Castrén: khâŋ < *kəjŋ 'thunder', ‘moose’ < *kəå
  • *o: ~ ‘branch’ < *moə, qōriʔ < *koər 'container', šō-ľȧm ‘I come’ < *toj-, šōmi ‘larch’ < *tojmå

An additional indirect line of evidence for phonemic long vowels is the transcription of consonant length. Donner transcribes medial and final single consonants as half-long in some cases ( etc.). At least in CVC words this seems to depend on vowel length: word-final consonants after short vowels are fairly consistently transcribed as half-long, word-final consonants after long vowels consistently as short (though only r occurs more than once: bōr ‘ridge’, ńēr ‘point’, tār ‘gills’, ťēr ‘center’).

I also notice a tendency that long vowels in words of the shape CVCV seem to occur most often before close or reduced vowels in the second syllable, not so often before mid and open vowels (a pattern closely resembling Selkup; cf. Helimski (2007), as cited in footnote 2). But this issue should wait for a detailed survey. Vowel lengthening has probably taken place more than once and probably with variable exact conditioning depending on vowel quality.

One further issue to investigate would an issue I already brushed above: the front unrounded vowels. Donner’s transcription distinguishes no less than five heights i ė e ɛ ȧ and several reduced counterparts ɪ ə̣ ə ə for the first three, which obviously should be phonologized as something simpler (among back rounded vowels he only has u o å). But offhand I am not sure how the different heights should be delineated. Etymologically Donner’s e ɛ are mostly from PSmy *ä, while ə̣ are mostly from *i *ə̈ *ə. This may suggest that ė ə̣ should be counted as allophones of /i/ and not /e/. Some apparent front vowels could also be fronted allophones of i̮ e̮ — or perhaps the situation is the opposite, and these back illabial vowels are, despite always continuing PSmy *ï *ë, actually synchronically just backed allophones of the front vowels (similar to the situation in Nenets). Just the native Samoyedic part of the vocabulary is probably insufficient to work out a solution for this though, since this seems to be mostly an issue of free variation and not conditional allophony. Probably the best line of evidence would be instead the degree of variation within Kamassian, e.g. as noted by Donner from his different informants, or between root words and their derivatives and compounds, or between Donner and Castrén’s records.

I would have observations on historical phonology as well, but those shall be left for another time.

[1] Janhunen, Juha. 1976. “Adalékok az északi-szamojéd hangtörténethez: Vokalizmus. Az első szótagi magánhangzók”, in Néprajz és Nyelvtudomány 19–20: 165–188.
[2] Some examples:

  • Helimski 1976, “О соответствиях уральских a– и e-основ в тазовском диалекте селькупского языка”, in Советское финноугроведение (SFU ) 12: 113–132.
  • Katz 1979, “Beitrag zur Lösung der Problems der Entwicklung von ursam. *j im Selkupischen und der hiemit zusammenhängenden Fragen der historischen Morphologie dieser Sprache und des Uralischen”, in SFU 13: 168–176.
  • Mikola 1981, Adalékok a szelkup vokalizmus történetéhez. Nyelvészeti dolgozatok 193.
  • Terentjev 1982, “К вопросу о реконструкции прасамодийского языка”, in SFU 18: 189–193.
  • Helimski 2007, “Продление гласных перед шва в селкупском языке как фонетический закон“, in Linguistica Uralica 43/2: 124–133.
  • Gusev 2012, “О возможных источниках селькупского сочетания -lć-: ПС *jw, *jk, *jm“, in SUST 264 (Festschrift Janhunen): 77–81.

[3] In UPA: versus k.

Tagged with: , , , , ,
Posted in Reconstruction

A Fourth Laryngeal in PIE

The Proto-Indo-European laryngeals seem to form, in most people’s thinking, a kind of a phonological subsystem. Usually they end up as a class of back fricatives, or at least some kind of weaker back consonants. They certainly have similar diachronic behavior… but if this implies also unique synchronic similarity is not immediately obvious. After all, there is a rather wide range of consonants that can be easily lost from a language (in the “merges with zero” sense). And inversely: even if many members of some natural class are lost, not every one of them will have to. E.g. transient voiced spirants in various Uralic languages: early pre-Permic *β *ð *ɣ are all lost by late Proto-Permic, while out of late Common Finnic *β *ð *ɣ in Eastern Finnish/Karelian, only the latter two are lost, and *β instead gives /v/.

Occasionally PIE internal reconstructors will go further still, and point out that the most widespread reconstruction with three laryngeals would be tempting to compare with the three series of velar consonants, suggesting rewriting *h₁ *h₂ *h₃ as *x́ *x *xʷ. The analogy is clearly imperfect though. E.g. the laryngeals do not show much signs of a centum / satem isogloss, not along the usual dividing line at least; [1] there are no parallels to the conditional neutralizations among the velar stops, such as *ḱr > *kr; the labiovelar stops *kʷ *gʷʰ *gʷ do not show any *o-coloring effects (for *k *gʰ *g some *a-coloring effects have been proposed though). A more common objection still however seems to be that there is a widely held alternate hypothesis: many mainstream IEists think that *h₃ is better mapped as a voiced fricative: [ɣ], [ʁ] or [ʕ], and *h₁ as a glottal consonant: [h] or [ʔ].

This semi-consensus view still assigns *h₂ as a voiceless back fricative: [x] or [χ], as the direct Anatolian evidence also strongly suggests. The occasionally suggested pharyngeal [ħ] can be IMO ruled out per arguments such as those in Michael Weiss’ recent paper. (I have already opted to use *x and not *h₂ in my index of the LIV roots, and will mostly do so in the rest of this post too.) However, this leaves an opening for an objection that does not seem to be commonly made, but to me feels quite relevant. If *h₁ and *h₃ are really something like *h and *ɣ, would *h₂ = *x then really be an isolated voiceless velar fricative, without palatovelar and labiovelar counterparts? [2]

A brief typological survey shows that such gaps among back fricative systems are indeed not common. In particular, any language that has both /kʷ/ and /x/ is rather likely to also have /xʷ/. [3] A look at the PHOIBLE data turns up the following results:

  • all of /k kʷ x xʷ/: 35 languages
    (Bilin, Buwal, Central Atlas Tamazight, Central Siberian Yupik, Chipaya, Chipewyan, Comox, Cupeno, Dghwede, Gavar, (Paraguayan) Guarani, Gwandara “4 and 6”, (Northern) Haida, Iraqw, Jicarilla Apache, Kumiai, Lagwan, Lamang, Luiseno, Mezquital Otomi, Nootka, Quileute, Seri, Serrano, Shuswap, Tachelhit, Tera, (Southern) Tiwa, Tlingit, Tolowa, Tonkawa, Wamey, Wichi Lhamtes Nocten, Yuqui)
  • only /k kʷ x/: 14 languages
    (Awing, Ese Ejja, Kwasio, Nizaa, Nuclear Daba, Purepecha, Saliba, Sui, Taushiro, Tilquiapan Zapotec, Uru, Ute-Southern Paiute, Yala, Yurok)
  • near misses: Haka-Chin with /k kʷ x w̥/, Izi-Ezaa-Ikwo-Mgbo with /k kʷ χ/, Wuzlam with /k kʷ χ hʷ/.

So a language that has /kʷ/ and /x/ is about 2.5 times more likely to have /xʷ/ than not; a very substantial result, when otherwise only some 3.2% of the languages in the world PHOIBLE sample have /xʷ/.

There are moreover plenty of languages that have /k kʷ/ and some non-velar pair of ±labialized back fricatives. The most popular setup by far is /k kʷ h hʷ/ (Amharic, Arabela, Argobba, Cherepon, Fwe, Gikyode, Guinea Kpelle, Gwandara “2”, Hausa, Ikwere, Inor, Iyive, Kamayura, Kawaiisu, Kistane, Mbuko, Merey, Mesqan, Mofu-Gudur, Moloko, Nuclear Igbo, Piaroa, Sebat Bet Gurage, Siona, Suya “2”, Vame, Wandala, Wari, Wolaytta, Yeyi). Now, /k kʷ h/ is also very common; but given that x > h is a common sound change, it seems likely that many of this group of languages have come about from earlier *x *xʷ. In three cases /h hʷ/ also combines with an unpaired buccal back fricative: /k kʷ x h hʷ/ (Mfumte, Nyam), /k kʷ χ h hʷ/ (Tewa). [4] Other similar inventories are:

  • /k kʷ ç çʷ/ (Quechan)
  • /k kʷ χ χʷ/ (Bana, Kabyle, Xamtanga)
  • /k kʷ ħ ħʷ/ (Bade)
  • /k kʷ χ χʷ ħ ħʷ/ (Moroccan Arabic)

Lastly there is also the notable Pacific Northwest cluster of languages (Bella Coola, Coeur d’Alene, Lushootseed, Spokane, Squamish, Straits Salish, Upper Chehalis) with either just /kʷ xʷ/ (no plain velars; all have non-labialized uvulars though) or /k kʷ xʷ/ (with /k/ looking like a recent reintroduction by loans). This is tangential to the question, though.

Remarkably, this typological trend continues even within Indo-European! Nowadays Hittite is analyzed as having indeed phonemic /xʷ/ ḫu ~ uḫ beside plain /x/ (for a recent detailed review see Suter (2014) [5]). Per correspondences like Lycian /kʷ/ q, the same is also thought to have been the case already in Proto-Anatolian. This *xʷ corresponds to traditional PIE *h₂w, and is usually considered to come about by simple cluster coalescence. It would be however also quite feasible to set up *xʷ already for PIE itself, so that there wouldn’t be any asymmetry in stop versus fricative labialization. (This idea is supported already by Suter, whose article I only found after coming up with the idea myself.)

This will require a slight change in thinking: the concepts of “laryngeal” as “a consonant that is deleted” and “laryngeal” as “a back fricative” will need to be uncoupled. *xʷ will be a “laryngeal” in the second sense, but not in the first: it leaves at minimum a *w behind in core IE, after all. I think this sharpening of concepts would be beneficial, as Indo-European studies already suffers from treating the laryngals as excessively phonetically vague.

I belive additional evidence for *xʷ can be also found in PIE root structure. Clusters of (plain) velar + *w are often set up for PIE, but they’re much rarer than the labiovelars proper. LIV has the following counts: *kʷ 15 + 18 (root-initial + root-final), *gʷʰ 8 + 14, *gʷ 17 + 16; — *kw 7 (root-initial only), *gʰw 0, *gw 2. For *xw there are however 18 cases initially + 7 finally, which would make this both the most common *Cw cluster and by far the most common *HR cluster in PIE. [6]

Even more interesting are the verb root *xwyedʰ- ‘to strike dead or injured’, and the noun *xwl̥h₁néx ‘wool’: these appear to have a very rare *CRR- onset structure, unparalleled elsewhere in PIE to my knowledge. Reconstructing a monophoneme *xʷ and not a cluster **xw would however reduce these to the usual *CR-. Labiovelar stop + resonant clusters are rare as well, but at least attested, e.g. *kʷles- ‘to furrow’, *gʷyeh₃- ‘to live’, *gʷʰreh₁- ‘to smell smth’.

I would even suggest that some further internal reconstruction can be applied here. The typical onset structure in PIE is *(F)(T)(R)- (with F = fricatives, T = stops, R = resonants). In traditional reconstructions this is however violated by a number of cases of *w + resonant (attested in LIV: *wl- 1–3, *wr- 10–12, *wy- 3). However, many of these could be probably replaced by *xʷR-. Even the development to attested /wr-/, /vr-/ in a few descendants such as Germanic and Indo-Aryan would not have to be common core IE: it could represent independent developments, versus direct loss (or maybe *xʷ > *x > *h > ∅) in branches like Italic. — For *wy- some cases seem to be attested almost solely in zero-grade. They could probably be also reconstructed with *i as an original non-zero-grade root vowel, and an analogical full grade in some sporadic Indo-Iranian reflexes, similar to the case of *bʰux- ‘to grow’.

The above is just structural reanalysis, so far. It is less clear to me so far if setting up a PIE *xʷ will have implications also for the routing of the reflexes in the daughter languages; if some cases will regardless have to be retained as a cluster *xw; or even, if this could also be set up in a few additional positions.

Suter proposes one readjustment of this type: reconstructing ‘to wash’ as *lexʷ-, and not anything like *leh₃w- or *lewh₃- (and with intervocalic *-xʷ- > -[ɣʷ]- in Hittite, same as with plain *-x- > -[ɣ]-). This promisingly enough seems to cut out some ad hoc “laryngeal metathesis” rules. However, it also suggests an odd property for *xʷ: a-coloring in Latin (lavō) but o-coloring in Greek (λοέω).

How does this fit together with the seven examples I mentioned that have already been earlier reconstructed as *Ceh₂w-?

  • *deh₂w- ‘to roast on a spit’: Sanskrit dunóti < *du-ne-H-, Greek δαίω, δέδηε < *daw-ye-, *de-dāw-, OHG †zuscen < *du/ū-sḱe-, Irish dóïd < *do/ōw-eye- etc.
  • *geh₂w- ‘to be glad’: Greek γαίω, γάνυμαι < *gaw-ye-, *ga-n-u-, Latin gaudeō < *gāwedʰ-, and perhaps also some reflexes that LIV splits as a separate root *geh₂dʰ-.
  • *ḱeh₂w- ‘to set on fire’: Greek *kaw-ye-, *kāw-s-, Lith. kūles ‘Brandpilze’ (?!), Albanian than ‘to dry’ < *ća-, Tocharian *kaum ‘sun’. Kind of a weak-looking semantic grab-bag root etymology.
  • *keh₂w- ‘to hit’: reconstructed in LIV with *-h₂w- per Tocharian *kɐw- : *kåw- < *kəw- : *kāw-, even though most reflexes (Latin, Germanic, Balto-Slavic, Greek) point instead to *kuH- : *kewH-. If ad hoc metatheses are going to be assumed, why not in Tocharian rather than in all the other languages?
  • *kleh₂w- ‘to cry’: Greek + Albanian *klaw-ye-.
  • *melh₂w- ‘to grind’ — probably not with *xʷ, but rather an extended stem *melx-w/u-, from the more common *melx- ‘to grind’.
  • *peh₂w- ‘to stop, finish’: only Greek πάυω < *paw-.

It seems that the behavior here is rather different from the ‘wash’ case, with several examples confirming a-coloring in Greek. But they also all seem to involve more complex constructions; maybe the difference could be one between coda *xʷ (retained until a-coloring?) and medial *xʷ (leniting to *w earlier?). Many also seem to involve reflexes that point to *CuH-, instead of expected *ā(w) : *aw from *ex(w) : *əx(w). And does dunóti involve o < *aw < *exʷ, maybe coming about by some kind of a *dux- > *duxʷ- development?

Nowadays lengthened grades are usually thought to be secondary, so I even wonder if instances of ā that surface here are that, instead of from *aH < *ex. The (partial) late PIE ablaut scheme for roots in *xʷ would then be *āw : *aw : *u (lengthened grade : *e-grade : zero grade). Eichner’s Law (*ēx > *ē and not **ā) on the other hand still seems to require that a-coloring is usually younger than the rise of lengthened grade.

Latin lavō can be of course also explained through Thurneysen-Havet’s Law: *o > a / _wV́. And so, if this and λοέω are *o-grades after all, there will be no trouble in assuming that *xʷ is leftwards a-coloring, just like plain *x.

So far, in summary: introducing *xʷ gets rid of several typological-phonotactic anomalies in PIE. These include at least all *CRR- roots, a large group of *CeCR- roots, possibly numerous *RR- roots, the strange abundance of the cluster *h₂w, and the unusual /k kʷ x/ inventory.

The second of these issues is, however, not exhausted by this reanalysis. CeCR- roots also regardless remain like a suspicious feature of laryngeals in particular: there are no roots in anything like **-sw-, **-dy-, only things like *-h₁w-, *-xy-. One can wonder if *xʷ is maybe only the top of an iceberg, and also a few additional “laryngeals” of this kind (back fricatives that do not get deleted entirely) should be assumed.

But there will be many other options available too, especially with laryngeals other than *x that cannot be easily grounded in direct Anatolian evidence. For very quick offhand speculation for the sake of example… since laryngeals’ presence is in some ways easier to determine than their exact position, and since in particular *-Hy- clusters are often assumed to be subject to metathesis, we could rewrite these as the more typical *-wH-, *-yH-, and simultaneously then rewrite the roots currently reconstructed as *CewH-, *CeyH- as being instead “close-vowel roots” *CuH-, *CiH- (with ablaut only secondarily by analogy).

[0] Thanks to various members of the Zompist Bulletin Board for a number of discussions on this topic.
[1] It is true that *h₂e > *a and *h₃e > *o merge often, and conceivably this could even have gone through an early merger of *h₂ and *h₃. But this happens also in the non-satemic Germanic, while failing in the satemic Armenian. The corresponding “centum” merger of *e and *a as distinct from *o also seems to be unattested entirely.
[2] The same could be asked of *h₃ as *ɣ too, but there happens to be a very easy answer here — just identify “missing” *ɣ́ and *ɣʷ with the semivowels *y and *w, or at least assume that the fricatives merged with the semivowels at some early stage.
[3] The situation for palatalized velars seems similar, but the controversy over if if *ḱ was [kʲ], or if *ḱ *k were perhaps instead [k q], makes this question harder to survey.
[4] How these cases have come about seems harder to figure out from just general principles. Some hypotheses I can think of would be asymmetric debuccalization, i.e. *x ≡ but *xʷ > hʷ; and later secondary lenition, such as *q > χ or *ɸ > x, some time after the introduction of contrastive labialization. Loanword phonemes could be involved, too: for a not quite exact parallel, Udmurt has /k kʷ/ natively (the latter is usually, but IMO unconvincingly, analyzed as a cluster) versus /x/ only in recent loans from Russian.
[5] He also refers to the same typological sound inventory argument as I do, but working with an earlier stage of PHOIBLE, he only gets together 25 examples of symmetric /k kʷ x xʷ/ versus 11 of asymmetric /k kʷ x/.
[6] The other *H + glide clusters come in at *h₁w at 7 + 4, *h₁y at 1 + 1–5 (with lots of cases where it seems to be unclear if *y is a part of the root that gets deleted, or a widespread suffix), *xy at 0 + 1–6, *h₃w at 2 + 2, *h₃y at 1 + 0. All *H + liquid or *H + nasal clusters occur initially only, with *xm- the most common at 7 examples. Other *Cw clusters are likewise root-initial only: *sw- 21 (in this position more common than alleged *xw, but not altogether), *tw- 8, *dʰw- 7, *dw- 5, *ḱw- 5, *ǵʰw- 3–4, *ǵw- 1–2.

Tagged with: , , , , , ,
Posted in Reconstruction

The fate of *w in Altaic

A fairly striking typological commonality between the “micro-Altaic” language groups: Turkic, Mongolic and Tungusic (Tk, Mg, Tg) is the lack of a labial glide such as /w/.

This is clearly out of line among both the world’s languages in general, and Eurasia in particular. /w/ is one of the most common phonemes in the world’s languages, that can usually be found even in languages with seriously impoverished consonant inventories such as Hawai’ian (at ZBB we [1] once compiled stats on this thing), and there is no shortage of *w in any of the other major language families hanging out nearby: IE, Uralic, Semitic, Dravidan, Sino-Tibetan, Austronesian, Eskaleut, you name it. Even in languages that lack /w/ precisely (Finnic, Slavic, most non-English Germanic…), it has usually not gotten too far off-field and has merely become a more frontal labial continuant such as /β/, /v/, /ʋ/. Yet none of these can be found in Turkic / Mongolic / Tungusic either. This clearly means that any long-range relationship hypotheses like Nostratic, Eurasiatic, Ural-Altaic will need to explain whatever happened to *w in Altaic.

There are two main hypotheses going around that I know of: *w > ∅ versus *w > *b. The former is the stance of some old-school Ural-Altaicists like Räsänen, among Nostraticists apparently Bomhard [2] and I gather also Illič-Svityč. The latter is the stance of, at minimum, Dolgopolsky. (He proposes also *w > ∅ before labial vowels in Turkic. [3])

I think the actual answer is neither of these, and the demise of *w is only post-common Altaic (if such a thing existed at all) — since comparison with Uralic seems to be able to show a fair number of good examples of both developments, yet strongly split according to their distribution. It does not really matter for this purpose if the comparanda are real cognates or loans … but see below for a hypothesis.

In the following, I have stuck to the clearest data, where comparison with Uralic seems, usually on semantic grounds, preferrable to or at least equally good as the proposed Altaic connections. Checking up on the non-EDAL lexicon of the languages would probably also turn up something, but I will leave that for later.

1. Turkic: *w > *b

(1.1) *bāj ‘rich, noble’ ~ Samic-Finnic *wäjä- ‘to be able, have power’, Hung. vív ‘to fight’
(not worse than ~ Mg ‘strong’, Tg ‘many’, Jp ‘to surpass’)

(1.2) *bakɨr ‘copper’ ~ PU #wäśkä ‘(reddish) metal, ? copper’ > Khanty *wăɣ ‘iron’
(rather than ~ Mg ‘patina’, Jp ‘dust’)

(1.3) *balk- ‘to shine’ ~ PU *wëlkəta ‘light, white’
(rather than ~ Mg *mel-, Tg *mial- with no **-k-; Ko *mark- may or may not belong; maybe here also Tg *beli ‘pale’, rather than ~ Mg ‘dark’)

(1.4) *bań ‘fat’ ~ PU *wajə ‘id.’
(rather than ~ Mg ‘churn’, Tg ‘storage’)

(1.5) *bek ‘firm, stable’ ~ Samic-Finnic *waka ‘id.’
(rather than ~ Mg Tg ‘big’)

(1.6) *bejŋi ‘brain’ ~ PU *wajŋə ‘breath, spirit’ > Selkup *kȫŋə ‘brain’
(rather than ~ Mg ‘forehead’)

(1.7) *bij- ‘sharp edge’ ~ Samic-Finnic *wijə- ‘to be sharp’
(rather than ~ Mg ‘to crush’, Tg ‘to mince’)

(1.8) *(b)ōl-, Mg *bol- ‘to become’, Japonic *wər- ‘to be’ ~ Uralic *(w)alə- ‘to be’ > Ob-Ugric ‘to be, to become’

(1.9) *burun ‘nose’ ~ PU *wara ‘mountain’ > Hung. orr ‘nose, †peak’
(rather than ~ Jp Ko ‘beak’)

(1.10) *būt ‘leg’ ~ Samoyedic *utå ‘hand’
(Tg *begdi may or may not belong)

(1.11) *dabul ‘wind’ ~ PU #tɜwlə ‘id.’
(rather than ~ Mg ‘typhus’, Tg ‘to be infected’)

(1.12) *debe ‘camel’ ~ Samoyedic *tëə < ? *tëwə ‘(tame) reindeer’
(rather than ~ Mg *temeɣen; long compared also with isolated Karelian tevana ‘elk cow’ (often mis-cited as Finnish))

(1.13) *sib- ‘to spin thread, pull out fibre’, Tg *sib- ‘id.’ ~  PU *siwə ‘fibre’
(rather than ~ Mg ‘to tuck up’)

I suspect most of these to be loans into Turkic from early Ugric, and in the case of *bōl-, thence into Mongolic. At least #wūta is probably better taken as a loan in the opposite direction, since this is innovative vocabulary replacing PU *kätə (and Samoyedic does not tolerate **wu-). Perhaps likewise for #dewe.

For a few IE parallels, I can moreover mention e.g. Tk *basu ‘hammer’ ~ II *wadźra- ‘hammer, mace’; Tk *ebin ‘grain’ ~ IE *yewo- ‘id.’; Tk *gēb- ‘to chew’, Mg *gebi- ‘id.’, Tg *keb- ‘to bite’ ~ IE *ǵyew- ‘to chew’. Comparison with Japonic would also immediately provide examples for *w > *b. There has been some debate on if *b or *w should be reconstructed for Proto-Japonic, but as far as I gather, *b has been assumed for ease of Altaic comparison, while most of the actual data clearly sides with *w. [4]

*w > *b also has good areal parallels, being found in both the north(west)ern and south(west)ern neighbors of Turkic: on one hand widely in Samoyedic, viz. in Enets, Nganasan, Kamassian and Mator (partly even in Yurats and eastern dialects of Tundra Nenets), on the other, in East Sakan (Khotanese and Tumshuqese).

There is also one notable exception where Turkic seems to have *w > ∅ instead: *öl- ‘to die’ ~ PU *widə- > Hung. *ül- > öl- ‘to kill’. This isolated example could be, however, merely an accidental similarity, esp. since the semantics are off. (‘Die’ and ‘kill’ are close enough concepts, but usually do not interchange without causative / anticausative morphology.) Contrast also ‘nose’, where we seem to have *wu > *bu in Turkic but *wu > *u > o in Hungarian.

All in all, the details may use further fine-tuning, but I think there is good evidence to assume that earlier *w develops into *b in Turkic. Contrary to what I earlier commented on this topic though, it is also easy enough to find equally good-looking cases of Turkic *b ~ Uralic *p (e.g. *bas- ‘to press’ ~ *puńćə- ‘to press, squeeze’, *beliŋ ‘panic’ ~ *pelə- ‘to fear’, *bɨč- ‘to cut’ ~ *päčkä- ‘id.’, *bulun ‘cloud’ ~ *pilw/ŋə ‘id.’), so probably this was still a merger with a pre-existing *b.

2. Mongolic: *w > ∅

Supported by less data, but even fairly tight reins on semantics still allow finding some evidence.

(2.1) *oŋgi ‘hole’, Tg *uŋgV ‘id.’ ~ PU *woŋkə ‘id.’
(rather than ~ Tk ‘to dig’)

(2.2) *ök/g- ‘to give’ ~ PU *wexə- ‘to take somewhere’ > Samoyedic *ü- ‘to drag’
(rather than ~ Tk, Tg ‘to heap up’; maybe here better Tg *bū- ‘to give’?)

(2.3) *udu- ‘to lead’ ~ PU *we/ätä- ‘to pull, lead’, PIE *wedʰ- ‘id.’
(rather than ~ Tk ‘to send’)

(2.4) *usu ‘water’ ~ PU *wetə ‘id.’
(rather than ~ Tk *sɨb)

(2.5) *üdže- ‘to see’, Tg *edže- ‘to understand’ ~ PU *weńćä- ‘to look, watch’ [5]
(rather than ~ Tk ‘to think, understand’; ‘understand’ is surely secondary in both etymological groups, and ‘think’ ~ ‘see’ does not match)

(2.6) *ündü-sü ‘root’ ~ PU *wanča ‘id.’
(possibly suggests that PU *č < *ts or *tU; Tg *ŋǖŋte may or may not belong)

A possible IE parallel that looks like it could have been transmitted thru Uralic: *ös ‘revenge, hate’ ~ II *dwiša- ‘hate’ (→ Permic, Finnic #wiša) (not worse than ~ Tk, Tg ‘bad, evil’). This is not attested in Ugric or Samoyedic, though, unlike all of the above examples.

The different treatment here is possibly however simply due to geography / relative chronology and not due to an actually different native development. Mongolic is a more eastern family, and may have gotten rid of *w already before contact with Uralic or some flavor of Para-Uralic — perhaps still indeed by > *b as per comparison with Turkic. So the correspondence here might indicate that in later loans, *w was substituted as zero.

I have not managed to find any reasonable-looking cases of Mongolic *b ~ Uralic *w (other than ‘to become’, see under Turkic).

The loanword layer interpretation can be also supported by how for Tungusic I cannot on a quick look-around find any clear etymologies of either type at all (i.e. where comparison with Uralic would be clearly preferrable to supposed Altaic origin). You can find some Tg cognates above under both my Turkic and Mongolic comparisons, but they might be loans. I could still add a few word-internal cases suggesting *w > *b, though: *dolba ‘night’ ~ Samoyedic *tålwə ‘dark(ness)’ (no worse than ~ Mg ‘to stay up overnight’); *nebi ‘new’ ~ PIE *new-.

[1] “We” being at least 90% the OP “Nortaneous” (lingblr yeli-renrong); myself probably not more than 5%, and a handful of remaining people suggesting single datapoints.
[2] He does not explicitly say so, and in his book leaves the Altaic column empty in the overview of Nostratic sound correspondences; but the few examples he has of a root with *w- being reflected in Altaic show zero onset.
[3] Well, “with rounding of the adjacent vowel”, but I would not buy any current claims about Proto-Nostratic vowel reconstruction with a nine-feet pole.
[4] As for Korean, the modern language has /w/, but I have the impression it mostly occurs due to vowel breaking or in loanwords from Chinese. I admit knowing very little about Middle or Old Korean though, and hence I am skipping over Korean in this post entirely.
[5] IMO better thus than UEW’s *wića-. Permic *dź ~ Hung. gy clearly proves *ńć, and front-harmonic cognates in these clearly prove *-ä and not *-a. Hung. front-harmonic í is also almost always from *e, not *i. Finnic can be routed as “*weŋ́śä-” > *wejśä- > *viisä-, and for Permic I suspect early *e > *i next to palatals in certain cases.

Tagged with: , , , , , , , ,
Posted in Reconstruction

Yurats Addenda

One step up from the likes of Meshcheran, probably the most obscure Uralic language to have still been rudimentarily documented is Yurats: a Northern Samoyedic language recorded in one wordlist by G. H. Müller in the mid-1700s. As far as I know, we have zero other information about the language, not even any clear idea on when it might have gone extinct. A century later Castrén did not record it, but to my knowledge he also did not really search for it either; * unlike Mator, which we can be pretty sure was indeed extinct by 1845.

Some parts of the data were reprinted by Pallas in the late 1700s and Klaproth in the early 1800s (a reproduction of the latter can be found in Donner’s Samojedische Wörterverzeichnisse, pp. 36–50). Janhunen’s Samojedischer Wortschatz (1977) only takes these secondary editions into account when listing Yurats cognates. Just the year before in 1976, though, Helimski had put out an article that actually reviews Müller’s original data instead (but presumably back in the 1970s article collections published in Tomsk were not yet in the habit of diffusing to Helsinki within one year). He also includes a transcript of the vocabulary. This article has by now been conveniently reprinted in Helimski’s 2000 compilation book Компаратистика, уралистика (Moscow: Языки русской культуры).

This is somewhat corollary-snipey, but I might as well still put this out there: a comparison of Janhunen’s Yurats coverage with the original data. Several additional etymologies can be easily noted, at least.

  • áddinelma‘ < PSmy *ånčɜ (perhaps a loan from Enets due to lack of ŋ-?)
  • cháru ‘larch’ < PSmy *kårwï (not in SW)
  • ja ‘flour’ < PSmy *jaə (not in SW; loan from Indo-Iranian *yawa- ‘grain’)
  • jur ‘fat’ < PSmy *jür (loan from Turkic *ür₂)
  • kírwa ‘bread’ < PSmy *kïrɜwå (not in SW)
  • módi jarra ‘I cry’ < PSmy *jåru-
  • maraga ‘cloudberry’ < PSmy *məråŋkå (not in SW, but cf. PU *mura)
  • mug ‘arrow’ < PSmy *muŋkə (not in SW, but cf. PU or maybe better a west Siberian Wanderwort #muŋkɜ)
  • nócha ‘arctic fox’ < PSmy *nokå
  • ngóde ‘berry’ < PSmy *wota (with *wo- > *o-, as also in Ne En)
  • pi ‘aspen’ < PSmy *pi
  • pimà ‘boot’ < PSmy *pajmå (loan from Turkic *bal₂mak)
  • poiju ‘alder’ < PSmy *pəjɜ (not in SW; misglossed by either Müller, Helimski or some intermediate editor as ‘almus’ pro ‘alnus’, but it’s in the middle of the tree names section)
  • pämesúma ‘darkness’ < PSmy *pəjmä (not in SW, but cf. PU *pid₂mä)
  • túa ‘wing’ < PSmy *tuəj

There would be more cases that only go back to Proto-Northern Samoyedic or perhaps just Proto-Nenets (e.g. sárnu ‘egg’, wuing ‘sea’ ~ Tundra Nenets sar°ʔńu, wī̮ʔ < *sarəʔnü, *wïəŋ), but I cannot claim to have put together any reasonably good coverage of these.

A small etymological puzzle is múde. Janhunen lists this under two different roots: from Pallas under *mərkä ‘shoulder’ (with the comment “(? < En)”), and from Adelung under *utå ‘hand’. Müller only has the sense ‘arm’, which could be a semantic shift from either, but also suggests there is only one word here, not two homophones. Straightforwardly we’d probably expect ˣmarze, ˣŋuda, so maybe contamination is however possible. — A reflex of ‘hand’ with ŋ- indeed appears in ngudéesse ‘ring’ (‘hand-iron’), but (j)ésse ‘iron’, with *ẃ > j, is clearly a loan from Tundra Nenets, and so maybe the first part of the compound is as well. Actually, nothing rules out even a third interpretation: that in Yurats *ŋ > m / _u regularly?

Another intriguing case is ngä́mme ‘breast’. This seems related to PSmy *ńimmä, but not as a direct descendant: it points to something like *əjmmä instead. *ə- rather than *ńi- could be perhaps by analogy from *əm- ‘to eat’ … but it could be also an archaism, since ‘breast’ is derived from *ńim- ‘to suck’, which in turn has also a variant *imə- in Proto-Uralic (> Fi. imeä, Hu. emik etc.). I believe that if a derivative *imə-mä > *immä had been formed already in PU, then this would regularly develop into *əmmä in PSmy, reaching quite close to the Yurats form. But I still have no good explanation for palatalization to ä.

The comparison also reveals a few words in SW sourced from Pallas that do not seem to appear in Müller. These are mainly anatomical terms: лы ‘bone’, пулы ‘knee’, хоба ‘skin, bark’, хыва ‘blood’. Pallas’ materials have elsewhere also an issue with words from a single source being duplicated under multiple languages, so maybe here as well? On the other hand, at least the last still looks phonologically clearly like Yurats specifically: *k- > x- and *-m- > -w- rule out Enets (which has kiʔ : kio- for ‘blood’), while *ë > ɨ seems to rule out Tundra Nenets (which has xe̮m ‘blood’; xe̮wa- ‘to bleed’).

Altogether, give or take some unclear cases like this, the number of Yurats words with a Proto-Samoyedic etymology seems to be some 140±5. This already suffices to work out the main points of historical phonology. Even already among the above examples you can note a few repeating correspondences: *å > a, *ŋk > g and various trivial identities. The big picture seems to be of a language with a vowel system close to (Proto-)Nenets (*ə > a, some apocope, *a > ä and *ä > e kept apart, almost no vowel clusters), but with a few quirks in the consonant system that instead align with Enets (chiefly *mp *nt *ŋk > b d g). Basically everything seems to be also derivable from Proto-Nenets-Enets without reference to the other Samoyedic languages.

There are at least a few individual quirks however. One is the development *w > b, which in Yurats only seems to happen before front vowels: bedu ‘intestine’ < *wätə; behánna ‘sturgeon’ < *wäkånå; bi ‘water’ < *wet, bidímat ‘to drink water’ < *wetɜ-; ’10’ < *wü(ə)t. Before back vowels, w remains: uáddu ‘root’ < *wånčå; wark ‘bear’ < *wərkə; wéneku ‘dog’ < *wën. So at least allophonic palatalization of labials for some time existed in Yurats. Having /bʲ/ but no /b/ would be weird though, and I suppose the split may have been one where *ẃ simply drops its velar component to yield *β > /b/.

Another distinctive conditional shift is that normally *a > ä, but *ja instead > ja, as in ja ‘flour’ < *jaə; jákki ‘smoke’ < *jačkə; jálle ‘light, sun’ < *jalä. Since the fronting of *a is a common Nenets-Enets (“Northwestern Samoyedic”?) feature, I would think this is probably a back-development similar to *Ca > *Ćā in Nenets. This is also suggested by two examples of *jü > ju (jur ‘fat’, jur ‘100’) versus retention otherwise ( ’10’, tükǘjalle ‘today’).

As you may have noticed, Müller also marks stress most of the time. This seems to be primarily on the penult (cháru, nócha etc.; behánna etc.) but there are also smaller groups of words with final stress, invariably marked with a grave accent and not an acute one (e.g. pürrè ‘pike’), or with initial stress on a trisyllable (wéneku). Tetrasyllables are rare, compounds aside, but seem to most commonly (6 out of 10 cases) have antepenult stress (e.g. tehánuda ‘wolf’). I have no idea if any of this has comparative significance.

* Addendum 2018-10-11: I have been informed that Castrén may have met the Yurats after all, as he mentions meeting, near the mouth of the Yenisei, a Nenets group whose speech had similarities to Enets. While he does not have any records marked as being specifically from this dialect, apparently his Nenets materials do contain a few dialectalisms that look Yurats-ish. My thanks to Olesya Khanina and Juha Janhunen for the correction.

Lastly, under the cut: the full wordlist itself (in Helimski’s transcription).

Read more ›

Tagged with: , , , , , ,
Posted in Etymology

Enter your email address to follow this blog and receive notifications of new posts by email.