An Attestation of Meshcheran

Slowly poking around digitized back issues of Studia Orientalia, I recently ran into Kecskeméti (1968), an article indexing PallasZoographie (1811). This is a notable early source of animal names from several languages of Russia, collected since the late 1700s. Some of these languages would not be otherwise substantially attested until the 1900s, and for a few it is just about the last source available before extinction. (Pallas’ consistency in transcription and coverage are both poor, but we’ll take what we can get.)

During a closer look, for checking some Samoyedic data, I however had to do a double-take upon reaching the heading Mᴇsᴛsᴄʜᴇʀᴀᴇᴄɪs. This is obviously Meshcheran, one of the extinct more western Uralic languages. (Interestingly also with /-sč-/ and not the evidently Russified /-šč-/?) Except, all sources I have seen so far have claimed that Meshcheran went extinct already somewhere around the 1500s…

OK, Pallas only records four “Mestscheraic” words, and a distinct Meshcheran ethnicity is reported to have lingered long after Russification — in at least one case even into the current century! [1] So fairly likely we are dealing here not with a living language, only with substrate loan vocabulary, a natural enough fate for animal names. Yet this is still interesting due to being an attestation securely flagged as Meshcheran. There are two competing theories on the affinity of this language within Uralic — one sees it as a branch or sister of Mordvinic, the other, Permic. To my knowledge both of these build mainly on evidence such as toponymy found in the traditionally Meshcheran region, which is susceptible to errors from pre-Russification population movements.

The list comprises bird names entirely, all given with obsolete binomials:

  • Büdaenae ‘Tetrao coturnix’ (= hazelhen, Coturnix coturnix)
  • Kagau ‘Accipiter milvus’ (= red kite, Milvus milvus)
  • Kuki ‘Cuculus borealis’ (= a cuckoo sp., probably the common cuckoo Cuculus canoris)
  • Schibirtschik ‘Motacilla albeola’ (= common wagtail, Motacilla alba)

The third is obviously undiagnostic of anything, but the others may be worth something.

I cannot make much of the first on a quick lookaround: it would be a hard match for the common Mordvinic term for the hazelhen (Erzya /povo/, Moksha /pova/ < PU *püŋə) and even poorer for the common Permic term (Udmurt /śala/, Komi /śɤla/) — at most it has some very vague similarity to Komi /bajdɤg/ ‘partridge, tarmigan’. [2] The second is however a good match with Mordvinic /kaval/ ‘kite’ (? < *kaɣal), though the implied vocalization of final *-l meanwhile looks amusingly Permic. The last has vague and probably insufficient similarity with Moksha /šäjgiča/ ‘wagtail’ (← /šäj/ ‘valley’ + /kiča/ ‘gull’) on one hand, Russian шибать ‘to hit’ (< Proto-Slavic ‘to whip’) on the other.

I do not feel like rooting around for possibly related names of similar birds; but per ‘kite’ I would at this point lean cautiously towards a Mordvinic-ish affiliation for Meshcheran.

[0] In a blog post this short you’ll probably manage without me doing hyperlinks for the footnotes.
[1] Thus per V. Patrušev apud Rahkonen (2009) in a single village “in a Mordvin area”.
[2] This is probably a loan from early Hungarian or some common source though — cf. Hu. fajd ‘grouse’, from earlier #paďt- per Mansi *paľta ‘black grouse’. If this really were a common Uralic root, I would expect instead **poľt- in Permic (and the cluster *-ďt- would be also unprecedented). OTOH Komi seems to show *p-d > /b-d/, which may allow dating the word to the common Permic era regardless.

Tagged with: , , ,
Posted in Etymology

Stop voicing across Uralic: some musings

Finnish often gets used as an example of a language that does not contrast voiced and voiceless consonants. While this is not really correct for Standard Finnish (which at least prescribes all of the voiced stops /b d g/), it’s true for many dialects, especially in pre-modern times. [1] The same also holds for most reconstructions of Proto-Uralic and Proto-Finnic. A few times I’ve seen this even given as a typical feature of the Uralic languages. This much is not the case, though. The presence of voiced stops in the recorded Uralic languages varies, but generally tends towards inclusion.

  • No voiced stops:
    • Most spoken Finnish; Northern Karelian
    • Most of Ob-Ugric
    • Forest Nenets, Northern Selkup
  • Allophonic voiced stops:
    • Estonian (short stops optionally voiced medially)
    • Ingrian (voiced before sonorants)
    • older Mari (voiced after nasals)
    • some varieties of Ob-Ugric, at least Southern Khanty per some descriptions (voiced medially)
  • Phonemic voiced stops:
    • most Samic languages
    • most of Finnic: Standard Finnish, Livonian, Votic, Southern Karelian incl. Livvi, Ludian–Veps
    • all of Mordvinic
    • newer Mari
    • all of Permic
    • Hungarian
    • most of Samoyedic: Tundra Nenets, Enets, Nganasan, Southern Selkup, Kamassian, Mator

(The distribution of voiced sibilants such as /z/ is very similar, though they are additionally lacking in standard Finnish and in northern Samoyedic. They are however less important for the forthgoing points, so I will focus on the voiced stops.)

This might still be a higher proportion of languages without voiced stops than within most language families of the Old World, though. Within Indo-European I can only think of Tocharian; Icelandic and varieties of High German; Scottish Gaelic; and, per some views, much of Anatolian. Maybe one of the Eastern Iranian languages that are heavy on spirantization? Even outside IE, the only other national language examples I know of are Chinese (not even in its entirety; at least Wu and Min still preserve the Middle Chinese voiced stop series) and Mongolian. Continental Southeast Asia has plenty of languages that are short on voiced pulmonic stops proper, but these often “compensate” by having instead implosives or prenasalized voiced stops; e.g. Vietnamese with /ɓ ɗ/, Hmong with a full series from /ᵐb/ to /ᴺɢ/.

Reconstructions could be added to the picture as further data points of their own, e.g. Proto-Samic and Proto-Samoyedic are both reconstructed without any voiced stops. However, when we move from synchrony into history, it is probably more important to consider the origin of voiced stops. This shows variation as well, but some particular pathways crop up repeatedly:

  • *-P- > -B- (general voicing of original singleton stops):
    • Southern Sami
    • Finnic: Livonian, Ludian–Veps
    • Mordvinic
    • Tundra Nenets, Kamassian, Mator
  • *-P- > -P- ~ -B- (voicing of singleton stops through consonant gradation):
    • Kola Sami
    • probably Proto-Finnic
    • ? Southern Selkup
  • *-Đ- > *-B- (hardening of earlier voiced spirants):
    • Standard Finnish (*ð > /d/ only)
    • Votic (*ɣ > /g/ only)
    • newer Mari (†ð, †ɣ > /d/, /g/)
  • *-NP- > -B- (simplification of stop+nasal clusters):
    • most of Samic (Southern through Skolt); usually as geminate -BB-
    • Permic
    • Hungarian
    • Enets

You might notice that all of these apply word-medially only. I have also left some more complicated cases off the list for now.

One wildcard approach is Nganasan, where the two most widely established phonemic voiced stops /b/, /ď/ come typically from *w, *j and are unrelated to the original stop consonants. [g] only occurs natively as the equivalent of /k/ under consonant gradation; [d] is even more limited, found as the weak grade of /t/ in the cluster [nd], while between vowels the result is [ð]. (Due to loanwords both could be probably now considered phonemic in modern Nganasan, though strangely enough these kind of inventories seem to then call the dental phoneme /ð/ per its intervocalic allophone and not, as could be expected, /d/.) Also some /b/, /ď/ come by gradation from PU *p, *ś. Their strong grades though are not the corresponding voiceless stops, but instead, a few sound changes later, /x/ and /s/. [2]

A similar setup *w *j > /b dž/ is found also in Kamassian and Mator, in these accompanied though by regular medial voicing. *j > /ď/ alone is more common yet: this is standard in Southern Karelian and Ludian, and found also in some varieties of Veps, Mari, Udmurt and Enets, at least. I even recall reading about a dialect of Hungarian that does this, but I don’t have any good overviews of Hungarian dialectology on hand to double-check with.

This is all also from the POV of synchronic voiced stops. Medial voicing, gradation-related or not, has likely happened at some point in by far most Uralic languages, but this often continued on with further lenition. E.g. in Permic, intervocalic *-p- *-t- *-k- are all continued as zero, most likely with intermediate > *[b] *[d] *[g] > *[β] *[ð] *[ɣ]. In at least one case, two separate rounds of medial voicing have been involved: thus in Southern Karelian, which has both consonant gradation and general medial voicing, so that original singleton stops yield the alternations b ~ v, d ~ ∅, g ~ ∅. This continues earlier stop/spirant gradation: *p ~ *v, *t ~ *ð, *k ~ *ɣ, [3] which in turn is probably from even earlier voiceless/voiced gradation: *p ~ *b, *t ~ *d, *k ~ *g.

Something similar may be actually the case in Permic. There’s reason to suspect that the full *-NP- to -B- shift was later than the lenition of medial single stops. Insted of filling in new voiced stops after the lenition of medial single stops to spirants, these clusters may have instead, in the first phase, filled in new voiceless stops already before the simplification of the original geminates. This is suggested by how a few late loanwords from Iranian still show *-NP- > -B- (/pad/ ‘crossroads’ ← Ir. *panta- ‘path’) but also seem to retain voiced stops as is (Udm. /vudor/ ~ Komi /vurd/ ‘otter’ ← Ir. *udra-); even Indo-Iranian voiceless stops can be continued as voiced (Udm. /kureg/ ~ Komi /kurɤg/ ‘hen’ ← Ir. or II *karka-; per *a > /u/ this must be an older loan than the previous two). So perhaps words of this group were all originally borrowed with simple voiceless stops (*pantɜ or *päntɜ > *patɜ, *vutrɜ, *karäkɜ > *kurekɜ), and they then went through a second round of medial lenition in late Proto-Permic, before the fall of final vowels (> *padɜ, *vudrɜ, *kuregɜ > *pad, *vudr, *kureg)? On the other hand, loaning from some Iranian variety with medial voicing is also conceivable, in the last case even an alternate analysis with *-eg as a suffix, and *rk > *r as in native vocabulary. (The epenthesis to *karäkɜ that would need to be assumed otherwise looks very sketchy, actually.)

I have even wondered if this could have been the same voicing process that affected Proto-Permic single voiceless stops after an unstressed syllable in mainline Komi, but not in Udmurt or Komi-Permyak (e.g. in the adjectival ending Udm. /-et/ ~ K /-ɤd/ ~ KPerm. /-ɤt/). But the fate of the original geminates suggests this is unlikely: since they yield modern Permic simple voiceless stops, same as everywhere from Veps on east, their shortening would have to be later than the voicing of any transient secondarily introduced medial voiceless stops. And it seems rather unparsimonious to assume geminates were still maintained as late as Proto-Komi.

Hungarian also has both *-P- > *-Đ- (*-p- *-t- > -v- -z-; *-k- > †-ɣ- > -v- ~ ∅) and *-NP- > -B-, but here we likely only need a single common round of medial voicing, followed by a chainshift of sorts of *-B- *-NB- to *-Đ- *-B-. Unlike Permic, new /-NP-/ or /-NB-/ clusters are established early-ish; though in loanwords from Iranian the only example seems to be kincs ‘treasure’ < pre-Hu *kenčɜ ← *gandz-. [4] Several others have correspondences elsewhere in Uralic, but I suspect these cases to be mostly loans / Wanderwörter rather than proper native inheritance. (They probably deserve to be more carefully looked at at some point, though.)

This big picture, I think, also raises some questions about the supposed retention of voiceless stops in a few languages.

I am not talking about any kind of a spin on the alternate reconstruction by Steinitz — who outright posited an original stop versus spirant contrast *-t- : *-ð-, instead of a gemination contrast *-tt- : *-t- (and, among the dentals, shunting *d₁ = traditional *ð then off as an absurd “retroflex spirant *δ̣”). This remains conclusively debunked by loanwords from Indo-European, whose voiceless stops turn up with traditional *-t- etc. (Indo-Iranian *ćata ‘100’ → *śëta > Hungarian száz, Erzya сядо /śado/, etc.), instead of Steinitz’ *-t- = traditional *-tt-. A weaker version of this could be perhaps still entertained: medial *-tt- : *-d- etc., but I don’t really see any particular benefit to it. In my opinion the situation found in Samic, Finnish–Karelian, Nganasan and perhaps Selkup can be still considered archaic, with all stop consonants voiceless by default, voiced (> lenited to non-stops in Finnish, Karelian and the immediately adjacent Sami varieties) at most under consonant gradation.

But the other four cases of Uralic languages without any voiced stops seem more dubious. To reiterate: (most of?) Mansi, (most of?) Khanty, Forest Nenets, Northern Selkup. These are all bundled together in western Siberia; the two latter have close relatives that do show medial voicing (i.e. Tundra Nenets and Southern Selkup); and even the former two are usually considered somewhat closely affiliated with Hungarian. Unlike Finnic and Samic, they also all show general shortening of geminates. In most Uralic languages this has been associated with earlier medial voicing, i.e. *-tt- : *-t- > *-tt- : *-d- > *-t- : *-d-, with the length contrast transphonologized as a voicing contrast, as is more common worldwide.

The languages have also gone through some non-general medial lenition: *-k- > *-ɣ- in Ob-Ugric (including even clusters such as *sk), and in Samoyedic *-k- is lost at least in *ə-stems (though not in all cases in *A-stems; established examples of retention include *pirkä < *pid₁kä ‘high’, *kåjkə < *kod₂ka ‘spirit’). In Far Eastern Khanty also *-p- > /-w-/. There is also some limited direct evidence of stop devoicing: like Nganasan, Kamassian and Mator, Selkup also fortites *w and *j — but all the way to voiceless *k, *ḱ.

So I suspect that voicelessness of all stop consonants, as could be proposed for Proto-Uralic, is not actually directly continued in these languages. This looks more like an areal feature, either an innovation wave that crossed a few language boundaries on its way, or subtrate influence. Direct influence from Forest Nenets or some extinct related variety seems possible for Northern Selkup, while in the case of Ob-Ugric, this is maybe more likely to to have been taken up from the original pre-Uralic substrate languages of the region.

This would also mean that degemination and medial voicing could be reconstructed as common Ugric features, if desired; with voicing developing further into spirantization in Hungarian, but eventually mostly reverted in Ob-Ugric. If so, this continues undermining further the notion of Ob-Ugric as a genetic subgroup within Uralic. Previous surveys by Honti and Viitso have not found any common innovations in the languages’ consonant systems other than the nearly trivial degemination, and several trivial shared retentions such as the maintenance of *w- as still /w-/. The evidence of Hungarian-Mansi isoglosses (e.g. *wi > *wü- > *ü-) and even Hungarian-Khanty ones (e.g. *d₂ > *j, further shared by Samoyedic) should also be weighed here: perhaps it is rather some of these that are old common inheritance after all, as has been suggested by various people at various times.

[1] Note that /f/ versus /v/, a contrast fairly widely established in western dialects, does not count as a voicing distinction: the latter is the approximant/semivowel [ʋ]. This is even treated as further equal to /u/ in some generative models of Finnish phonology. I write this as /v/ in broad transcription both for simplicity & following the traditional Uralistic transcription (which itself follows Finnish standard orthography), much like I also generally use /a ä/ instead of the IPA-compliant [ɑ æ].
[2] This also has, I think, implications for the reconstruction of the history of consonant gradation, since *z > /ď/ does not seem plausible. Either we have to date the emergence of consonant gradation between voiceless and voiced grades already into pre-Proto-Samoyedic (= effectively Proto-Uralic), with further ramifications; or, if we want to consider this pattern an innovation specific for Nganasan that never occurred in its close relatives (note in particular that while medial stops are generally lenited in Enets and Tundra Nenets, the same does not apply to /s/), then it is instead the loss of palatalization in *ś that must be also dated as post-Proto-Samoyedic. We would not need to assume an outright palatalized stop or affricate, though: a conceivable route to the modern situation would be *[ś] > [ś] ~ [ź] > [s] ~ [j]  > [s] ~ [ď]. Note also that while palatalized *kʲ > *ć in early common Samoyedic merges with /s/ in northern Smy, in southern Smy these have distinct reflexes /š/ and /s/, suggesting *ś > *s rather soon after PSmy, at the latest.
[3] Traditionally the labial spirant stage is given as [β], but to my knowledge, there is no evidence whatsoever anywhere in Finnic for a distinction between this and regular /v/ < *w; only for retained /b/ in Livonian and Ludian–Veps. Setälä conceived of the latter as a re-fortition from [β], but to me a marginal archaism that never went through a spirant stage seems more likely. It’s conceivable that the shift from *w to labiodental /v/ was not yet completed by the time of [b] > [β], and so this may have been immediately a merger, with [β] > [v ~ ʋ] only following later. The fact that several Finnish dialects are reported to have [w] for /v/ next to rounded vowels (e.g. in SE Tavastia [wuos] for vuosi ‘year’, [sywä] for syvä ‘deep’) may even support reconstructing [w] still for Proto-Finnic in some positions at least.
[4] Judging by the voiceless k and cs, this looks like one of those early loans where Proto-Iranian *c *dz (later > /s z/) were substituted by *č in Uralic, instead of anything directly related to the unexpected appearence of /dž/ in Persian گنج ganj.

Tagged with: , , , , ,
Posted in Reconstruction

Three observations on Bactrian

As a part of my ongoing quest to get a better handle on the Indo-Iranian languages (mostly, yes, but not only due to their important early contact influence on the Uralic languages), some time ago I caught wind of Saloumeh Gholami’s PhD thesis Selected Features of Bactrian Grammar (2010) and have given it a thorough-ish read. Bactrian has been and probably continues to be one of the more poorly documented Iranian languages, and Gholami provides what seems like a good summary of the newer ongoing research.

Already at this point there are a few interesting observations to be made. And I hope you will not be too disappointed to find out that my thoughts so far mostly involve the historical phonology of Bactrian — the syntax and morphology no dout have interesting phenomena going on too, but I probably won’t be able to say anything intelligible about those before knowing much better how they work also in the other Iranian languages from the same period and/or area (Sogdian, Xwareshmian, Middle Persian, Pashto etc.)

Gholami’s overview of the phonology of Bactrian is introductory in nature but still very historically grounded: she gives a pre-Bactrian etymology for almost every example word mentioned. These are not sourced, so it is hard to tell how far back they are supposed to go (all the way to Proto-Iranian?), but I get the impression that they’re based on earlier groundwork on Bactrian by Nicholas Sims-Williams, whom she mostly refers to for basics.

The thesis also does not contain any kind of a word index, so I’ve had to comb the initial chapters by hand for examples, getting a bit over 400 of them together. Further vocabulary would appear in the grammatical chapters with their extensive interlinear glosses, but generally without proto-forms. If we regardless suppose her given pre-Bactrian reconstructions to be reliable, they seem to allow for the following observations.

One: there seems to be a rule of non-open vowel shortening.

Middle Iranian *ē (in Bactrian from Proto-Iranian *ai, *aya, *iya, *ā-i) is in Bactrian spelled varyingly as ‹η› (likely /eː/) or ‹ι› (likely either /i/ or /iː/). Gholami suggests that *ē develops to ‹ι› before a nasal, on the basis of the following data: *waina- > ‹οιν-› ‘to see’, *kainā- > ‹κινο› ‘revenge’, *abi-dayanā > ‹αβδδινο› ‘custom’, *abi-dayana-ka > ‹αβδδιγγο› ‘way, manner’, *xrayanā > ‹αρχινο› ‘purchase’. Raising of long vowels before nasals is common across Iranian, sure enough. However, Bactrian shows no signs of the parallel developments *ōN > **ūN (*gauni-čiya- > ‹γωνζο› ‘basket’, *čiyāt-gauna > ‹σαγωνδο› ‘as, like’) or *āN > **ōN (*bāmušn > ‹βαμοϸνο› ‘queen’, *gawāna > ‹γαοανο› ‘fault’, *nāma > ‹ναμο› ‘name’, *fra-māna > ‹φρομανο› ‘command’, *fšupāna > ‹χοβανο› ‘shepherd’…)

An assumption of pre-nasal raising also does not exhaust the cases with *ē > ‹ι›: this also occurs in *ziyakā > ‹ζιγο› ‘damage’, *waignā > ‹οιγνο› ‘famine’ (unless phonetically with [-ŋn-]?), *-iyaθwa > ‹-ιλφο› ‘a suffix’ (thanks Gholami, very illustrative glossing).

I would instead suggest the following rules:

  1. *ē gives ‹ι› before an original unstressed *ā. This handles ‘damage’ and ‘famine’, but also ‘revenge’, ‘custom’ and ‘purchase’. This is likely primarily also shortening *ē > *e, with raising *e > /i/ only following secondarily.
    • This does not seem to apply to /ē/ from i-umlaut of *ā: *dāraya- > ‹ληρ-› ‘to have’, *wādaya- > ‹οηλ-› ‘to lead’, *wādžaya > ‹οηζο› ‘ability, power’, *wi-čāraya- > ‹οισηρ-› ‘to purchase’. These could suggest either that implicit intermediate unstressed *ē (*dārē- > *dērē-, *wādē- > *wēdē- etc.) did not trigger shortening; or, alternately, maybe i-umlaut of *ā initially led to a distinct low front vowel *ǣ, which was only raised to ‹η› after the shortening/raising of *ē from *ai, *aya, *iya. The latter might be preferrable in light of one case with *au > *ō > ‹ο› (rather than ‹ω›) before *aya > *ē: *tauxmaya > ‹τοχαμηιο› ‘relationship’ (here *ē is not lost; thanks to further suffixation?). As a vowel, ‹ο› probably mostly stands for /u/, as is suggested by its use also for /w/ (‹οηζο› = /wēdz/, etc.) and the general typology of vowel systems across Iranian: Old and Middle Iranian languages mostly do not have short /o/. [1]
  2. *ē gives ‹ι› also before word-final consonant clusters. (NB: ubiquitous final ‹-ο› is thought to be only a Greek-derived orthographic device.) This handles ‘way’, as well as the ‹-ιλφο› suffix, and maybe also *fšuyantīčī > ‹φινζο› ‘lady’ (though here we instead have *-uya-, which I suppose could have contracted to *ī rather than *ē already to begin with).
    • This is again applicable also to the development of *ō: *aitat-gaunaka > ‹δαγογγο› ‘such, in this way’, *bawanta > ‹βονδο› ‘completely’.

These rules only seem to leave the verb root ‘to see’ unaccounted for. However, a more general version of rule 1 might cover some inflected forms (*wēn-ēd > ‹οινηδο› ‘see.2PL’), and actually also an allomorph with retained *ē exists (*wēn-an > ‹οηνανο› ‘see.subjunctive-1PS’). Gholami thinks these are chronologically separated versions before and after the sound change from ‹η› to ‹ι› (early /wēn-/ > late /win-/?), but if there is a chronological difference, maybe this rather involves levelling-away of the /wēn-/ allomorph.

Rule 1 then suggests that before the onset of root stress and the reduction of all suffix and prefix syllables, Bactrian went through a stage of mobile stress attracted rightwards by long vowels, as I believe occurs in several other Indo-Iranian languages (though don’t ask me about the exact details on this).

Two, a few notes on vowels in prefixes. These are mostly reduced heavily, and are spelled varyingly with ‹α› or ‹ο›, which Gholami interprets as [ə]. E.g. *fra-gāwa > ‹φρογαοο› /frəɣāw/ ‘profit’, *ni-kanta- > ‹νακανδο› /nəkand-/ ‘to dig’, *uz-bara- > ‹αζβαρο› /əzvar-/ ‘to bring forth’. There is also epenthetic /ə/ before some consonant clusters: *spāsV > ‹σπασο ~ ασπασο› /spās ~ əspās/ ‘service’. Despite some cases of variation like this, schwa seems to be still an underlying phoneme, however: consider *xšayanta- > ‹αχανδ-› /əxānd-/ ‘to control’, with first *xš- > *əxš-, followed by *š > ∅ (if not rather > *hx > /xː/, spelled simply as ‹χ›?); and *upa-stāna > ‹αβαστανο› /əvastān/ or /əvəstān/ ‘support’. There doesn’t seem to be much evidence against considering [ə] an unstressed allophone of /a/, though. (Gholami takes no stance on questions about the phoneme inventory of Bactrian and operates only with orthographic vs. surface phonetic levels of analysis.)

There are also some cases where *ni- is still spelled as ‹νι-›. Gholami suggests that these would be retentions. I think they might be however secondary umlaut developments: in the data given, they occur mostly preceding a palatal root vowel ‹ι› or ‹η›, as in *ni-štaya- > ‹νιττι-› /nihti-/ ‘to send (a message)’; or preceding a palatal sibilant (possibly itself originally conditioned by *i through RUKI), as in *ni-šadman > ‹νιϸαλμο› /nišalm/ ‘seat’. There are also examples of ‹ι› continuing earlier prefixal *a in a similar context: *waz-antiyaka > *wəzindēg (with umlaut in the root: *a-i > *i) > ‹οιζινδδιγο› /wizindiɣ/ ‘current’. Gholami attributes this last example to a supposed development of *a to ‹ι› before /s z/, which would also be seen in *dasta > ‹λιστο› /list/ ‘hand’. There are however plenty of counterexamples, say *aspa > ‹ασπο› ‘horse’, *ā-xasa- > ‹αχασ-› ‘to quarrel’, *basta- > ‹βαστο› ‘to bind’, *dasa > ‹λασο› ’10’; *azam > ‹αζο› ‘I’, *azdā > ‹αζδο› ‘knowledge’, *gazna > ‹γαζνο› ‘treasury’, *waza- > ‹οαζ-› ‘to use’. I don’t know what is up with ‘hand’; theoretically, some kind of suffixation to *dasta-ya- would work. [2]

Lastly, one case with the development of *fra- ‘pre-‘ suggests that vowel reduction actually has been fairly early, resulting in this prefix first in *fr̥-, which then in unstressed position mostly unpacks again to *frə-. Consider *fra-stāya- > ‹φοϸτιι-› ‘to send’: this exemplifies the sound change *rs > /š/ (compare e.g. *kr̥sta- > *kirsta- > ‹κιϸτο› /kišt/ ‘to detain’), and therefore requires *fr̥stēy- > *fštīy- > /fəštīy-/.

Three, the development of *š shows double treatment. Gholami notes that in some cases, *š is retained as ‹ϸ› /š/; in others, it developes to ‹υ› /h/, which can be further lost (or perhaps only unwritten in various consonant clusters, I wonder?). This does not appear to be a simple case of dialect mixture or whatever, since both outcomes can sometimes occur in the same word: *ni-šašta- > ‹ναυαϸτο› /nəhašt/ ‘to settle’.

Examining the data, to me the distribution does not appear to be entirely unpredictable, though. *š > *h seems to be the main development for *š originating by RUKI:

  • *is, *us > *iš, *uš > *ih, *uh
    • *awa-gta > ‹ωγοτο› /ōɣu(h)t/ ‘to conceal’
    • *d-manyu > ‹λρουμινο› /lruhmin/ ‘enemy’ [3]
    • *fra-ta-ka > ‹φρητογο› /frē(h)təɣ/ ‘messenger’
    • *kasta- > ‹κισατο› /kisə(h)t/ ‘youngest’
    • *ni-gaa- > ‹ναγαυ-› /nəɣāh-/ ‘to hear’
    • *ni-šašta- > ‹ναυαϸτο› /nəhašt/ ‘to settle’
    • *ni-štaya- > ‹νιττι-› /nihti-/ ‘to send (a message)’
    • *snā > ‹ασνωυο› /əsnōh/ ‘daughter-in-law’
    • *wi-šmāra- > ‹οαυμαρ› /wəhmār/ ‘to account’
    • *wrta-ka > ‹ροτιγο› /ru(h)tiɣ/ ‘rope’
  • *rs > *rš > *r(h)
    • *ā-pr̥št- > ‹βαρτ-› /var(h)t-/ ‘to be necessary’
    • *gr̥šta- > ‹γιρτο› /ɣir(h)t/ ‘to complain’ (past stem)
    • *hr̥šta- > ‹υιρτο› /hir(h)t/ ‘to leave’ (past stem)
    • *wi-xwata- > ‹οοχορτο› /wəxur(h)t/ ‘to quarrel’
  • *ḱs, *k⁽ʷ⁾s > *ćš, *kš > *š, *xš > *h, *x(h)
    • (PII *ćš >) *pašman > ‹παμανο› /pa(h)man/ ‘wool’
    • *āθriya > ‹χαρο› /x(h)ār/ ‘ruler’
    • *ayant- > ‹χανδ-› /x(h)ānd-/ ‘to control’
    • *apā- > ‹χαβρωσο› /x(h)avrō(t)s) ‘night-and-day’
    • *nauθra > ‹(α)χνωρο› /(ə)xnōr/ ‘satisfaction’
    • *wašti > ‹χοατο› /xʷa(h)t/ ’60’ [4]
    • (PII *ćš >) *xšwašti > ‹χοατο› ’60’
    • *waa > ‹οαχο› /wax(h)/ ‘interest’

In one case I’m not sure if RUKI or *ćt > *št is involved: *paršti-čī- > ‹παρσο› /parts/ ‘backwards’.

Meanwhile, retention of *š seems to be entirely regular in the position *a_V, *ā_V. In these positions *š would be maybe the most likely to continue PII *sč < *sk(e), though *ćš is also an option, and some could be innovative Iranian vocabulary from somewhere else entirely:

  • *dāšinV > ‹λαϸνο› /lāšn/ ‘gift’
  • *fra-xāšaya- > ‹φριχηϸ-› /frixēš-/ ‘to seduce’
  • *paga-šaka- > ‹παχϸιιο› /paxšiy/ ‘in-law’
  • *uz-gaša- > ‹αζγαϸ-› /əzɣaš-/ ‘to dissent’
  • *xāša-ka > ‹χαϸιγο› /xāšiɣ/ ‘clothing’

A few clear cases of retained /š/ from RUKI also appear:

  • *kr̥šāka > ‹κιϸαγο› /kəšāɣ/ ‘plough-ox’ (<< PIE *kʷels- ‘to plough’)
  • *ni-šādman > ‹νιϸαλμο› /nišalm/ ‘seat’ (<< PIE *sed- ‘to sit’)

In most cases of retention I am not sure about the pre-Iranian origin of *š (but RUKI is conceivable in many of them):

  • *a-xwašn- > ‹αχοαϸνο› /axwašn/ ‘unpleasantness’ (any relation to Ir. *xwad- ‘to make pleasant’ < PIE *sweh₂d-?)
  • *bāmušn- > ‹βαμοϸνο› /vāmušn/ ‘queen’ (any relation to Persian بانو /bānu/ ‘lady’?)
  • *daxštana > ‹λαχϸατανιγο› /laxšətaniɣ/ ‘crematory’ (from pseudo-PIE *dʰegʷʰ-sth₂no-?)
  • *hāwišta-ka > ‹υαϸκο› /hāšk/ ‘pupil’
  • *pitr̥-šti- > ‹πιδοριϸτο› /piðurišt/ ‘ancestral estate’ (from pseudo-PIE *ph₂tēr-steh₂-?)
  • *ni-šašta- > ‹ναυαϸτο› /nəhašt/ ‘to settle’ (maybe *š-s > *š-š, if from PIE *steh₂-?)
  • *škara- > ‹αϸκαρ-› /əškar-/ ‘to follow’
  • *wi-xwarša- > ‹οοχωϸ› /wəxōš/ ‘quarrel’
  • *xšāya- > ‹ϸιι-› ? /šīy-/ ‘to be able’ (< PII *kšaH-? but cf. /x(h)/ in the derivative ‘ruler’)
  • *xšidža-ka- > ‹ϸιζγο› /šidzɣ/ ‘good’

I could suggest at least that before a vowel, *rš >/š/ (‘plough-ox’, ‘quarrel’), while before a consonant, *rš > /r(h)/ (‘to be necessary’, ‘to complain’, ‘to leave’, ‘to quarrel’ and ‘backwards’).

The cases with *št from PII *ćt seem to be rather evenly split. *š > *h appears in:

  • *aštā > ‹αταο› /a(h)tā/ ‘8’ (<< PIE *oḱtōw)
  • *ham-gašta- > ‹αγγιτι› /angi(h)ti/ ‘to receive’ (past stem) (< East Iranian *gādz- ‘to receive’, of unknown earlier origin per Cheung)
  • *ni-pixšta- > ‹νιβιχτο› /nəvix(h)t/ ‘to write’ (past stem) (<< PIE *peyḱ- ‘to paint, decorate’)

while retention appears in:

  • *pašti- > ‹παϸτο› /pašt/ ‘agreement’  (<< PIE *peh₂ḱ-; cf. pact)
  • *rašta- > ‹ραϸτο› /rašt/ ‘true, loyal’ (<< PIE *h₃reǵ-; cf. right)

Same goes for cases with *fš, though examples are rather rare:

  • *š > *h in *pati-fšarV > ‹πιδοφαρο› /piðəf(h)ar/ ‘honour’; *fšuyantīčī > ‹φινζο› /f(h)indz/ ‘lady’
  • retention in *kafši > ‹καφϸο› /kafš/ ‘shoe’
  • and even: *fš > /x/ in *fšupāna > ‹χοβανο› /xuvān/ or /xəvān/ (or /xʷvān/?) ‘shepherd’.

Is it perhaps relevant that ‘shepherd’ comes from PII *pću-, while the others are more likely to be from *ps with Iranian “second RUKI” to *fš? Maybe additionally *fš- > /f-/ root-initially versus retained medially.

It’s also worth pondering that *š > *h fits somewhat poorly into the phonological big picture of Bactrian. Usually *š >> /h/ correspondences go through *x (thus in e.g. Finnic or Spanish; Pashto remains at the /x/ stage), but Bactrian retains Proto-Iranian *x just fine. Two other possibilities come to mind, but they both would require Bactrian to have split off from the other Iranian languages relatively early:

  1. Perhaps *kʰ > /x/ and *kC > /xC/ are fairly late in at least some parts of Iranian: they are, after all, not reflected in some of the languages, such as Balochi and Wakhi. *š > *h could then have passed through a transient *x state already earlier.
  2. Perhaps the path was here rather *š > *s > *h, the second change being common (but not Proto-) Iranian. But this leaves many cases unexplained: original *-st- for example does not develop into **-ht-, but *-št- still does in many cases (‹νιττι-›, ‹φρητογο›, ‹ωγοτο›, etc.)

Likely having more clarity on this issue would require examining also the cognates elsewhere in Iranian, and not necessarily taking Gholami’s pre-Bactrian reconstructions as a given. But this remains difficult as long as there is no general Iranian Etymological Dictionary to consult.

[1] Gholami suggests /o/ for cases with *a-u > ‹ο›, such as *madu > ‹μολο› ‘wine’. Other eastern Iranian languages with this assimilation, though, end up with *u, e.g. Ossetian муд. I-umlaut of *a-i also gives ‹ι›, not ‹ε›, e.g. *kanyā > ‹κινο› ‘canal’.
[2] Sometimes it is proposed that ‘hand’ in Iranian would be native only in Persian, and borrowed from there to most of the other varieties, since this has PIE *ǵʰ- and is expected to give /d-/ only in Persian, but /z-/ elsewhere (and Avestan indeed has that). In this case the widespread Middle Iranian fronting of short *a to *æ, which appears to be absent from Bactrian, might result in *destV > *ðistV > /list/ in Bactrian. However I think that dissimilation before syllable-final *s is perhaps more likely: PIr *dzast- > *dast- (this proposal I’ve seen from Martin Kümmel). — There is however the fact that ‘hand’ contains original PIE *s, while my counterexamples like ‘horse’, ’10’ and ‘I’ mostly have secondary *s *z from PIr. *c *dz < PII *ć *dź⁽ʰ⁾ < *PIE *ḱ *ǵ⁽ʰ⁾. This could be perhaps leveraged, if wanted, but I don’t see what phonetical sense this would make, and so I don’t feel like doing a full check-up on the matter.
[3] The (rather funky!) consonant cluster /lr-/ presumably by folk etymology from *drauga > ‹λρωγο› /lrōɣ/ ‘false(hood), wrong’.
[4] In principle pre-epenthesis *swašti > *šwašti could also work, with *š > *h then feeding into common Iranian *hw > *xw?

Tagged with: , , , ,
Posted in Reconstruction

Were there Proto-Samic *š-stems? Some issues of Samic-Finnic chronology

Despite ongoing disputes about the subgrouping of the Uralic family, it is clearly the case that the Finnic and Samic languages have been at least neighbors for several millennia now, exchanging linguistic features and material back and forth. With care, this allows teasing out substantial facts about the relative chronology of the history of the two families. (Germanic can be also added to the bundle, though the evidence from here is much more unidirectional.)

The sibilant system shows several good examples. While Finnic /s/ and Samic /s/ correspond to each other consistently all the way from Proto-Uralic to the present day, the “shibilants” have a more complex history. In old inherited vocabulary the main correspondences for these are Finnic *s ~ Samic *č (from original *ś ~ *ć, be it Proto-Finno-Samic or all the way from Proto-Uralic) and Finnic *h ~ Samic *s (from original *š). The latter correspondence can also appear in old (perhaps mostly either parallel or Finnic-mediated) loans from Germanic, whose *s was substituted as *š at least on the Finnic side; no way to tell if also in Samic.

(This probably indicates that pre-Finnic *s was, following the merger of *s and *ś, realized as laminal [s̻], while *š was (sub?)apical [ʂ]. Germanic *s was likely apical [s̺], and therefore matched better with pre-Finnic *š. I am not sure how far back the modern Finnish realization of /s/ as apical [s̺] dates, but at least the Northern Karelian shift of *s to an apical postalveolar [s̱] š most likely starts from this same value.)

The correspondence Finnic *s ~ Samic *š appears in a small number of native-looking cases, where they seem to represent original preconsonantal *ś (PF *laskë- ~ PS *lōštē- < *laśkə- ~ *laśk-ta- ‘to let out, pour, etc.’; PF *kisko- ~ PS *këškē- < *kiśka(w)- ‘to tear, pull’; PF *vaski~ PS *veaškē < *wäśkä ~ *waśka ‘copper’). It is however more common in loanwords between the two. E.g. Finnic *s before *i and *ü seems to be fairly regularly substituted as *š in Samic; the YSS data has 5 examples of this out of its 11 examples altogether of PS *š-. [1] All late loanwords from Samic into Finnish also show *š → /s/, for the obvious reason that Finnish has had no other sibilants for most of its independent existence. (Even the modern loanword phoneme š or sh is still limited to educated speakers. Probably a rather large proportion of Finns counts as “educated” by typical contact linguistic standards by now, though…)

Lastly, also the fourth theoretically possible correspondence between plain sibilants is attested: Finnic *h < *š ~ Samic *š. (I will not be treating the various affricates in this post.) This might be the group that has the most value for establishing chronology, since it is bounded both from above and below: prevocalic *š only occurs in loanwords in Proto-Samic, but any such loanwords from Finnic must then pre-date the pan-Finnic change *š > *h.

Some of the data in this group suggests that it stretches beyond the breakup of Proto-Samic. One example is the word for ‘coal, ember’; in Finnic *šiili > *hiili (Fi. hiili etc.), which then appears as pseudo-PS *šilë in Southern, Ume and Pite Sami (SS sjïjle etc.); as pseudo-PS *hilë in Lule and Northern Sami (NS hilla); and pseudo-PS *ilë in Eastern Sami (Inari illâ etc.). I’ve sometimes seen also the explanation that these kind of cases would not be parallel loanwords, but rather several layers of re-loaning, with each new loanword then flushing out the previous one. This however seems unlikely to me, especially when dealing with a non-cultural term like ‘coal’ that has no reason to be repeatedly loaned from Finnic, and when the distribution of the different variants is perfectly complementary. [2]

Meanwhile, *š > *h is usually taken to be late Proto-Finnic, i.e. at least Proto-Core Finnic (probably later than at least the splitting of South Estonian though). Does this mean that Proto-Samic is therefore younger than even Core Finnic? And how does this measure up with how e.g. Jaakko Häkkinen (Jatkuvuusperustelut ja saamelaisen kielen leviäminen, osa 2: see table on p. 19) comes out with the opposite result: Proto-Samic would have broken up earlier than Proto-Finnic?

One option would be to sigh and concede that apparently words like ‘coal’ are multiple layers after all. But I would hold out for a different explanation: we can probably shift the dating of *š > *h ahead quite a bit beyond its various termini post quem. E.g. the introduction of *h → *h in Germanic loanwords into Finnic does not have to be enabled by the development of a native /h/ in Finnic; it can represent also the taking-up of a new loanword phoneme, which besides probably already existed as an allophone in the clusters *kt [ht] and/or *sl *sr *sn [hl hr hn]. In fact, since Proto-Finnic also had all four of *st *kl *kr *kn, then the introduction of [h] in both *kt and the *sR group would have already been sufficient to phonemicize it: it could be no longer identified uniquely as either /s/ or as /k/. — Again, I plan on writing a full article on this topic in the future.

This finally brings me to the topic I mention in the title. The Samic languages have borrowed *š from early Finnic also in several consonant-stem nominals. However, while these have consistently /-š/ in Western Sami, they seem to have dual representation in Eastern Sami: sometimes they surface with /-s/, sometimes with /-š/. At first sight this sounds like it might be related to the fact that some of these cases are loaned from PF *-is and not *-eš — but no, that contrast appears to be completely orthogonal.

Let’s roll out the data:

(1) Eastern Sami /-š/ ← Finnic *-eš

  • F *imeš (> Fi. ihme ‘wonder’) → S *imëš > e.g. North imaš, Inari iimâš
  • F *kadëš (> Fi. kade ‘jealous’) → S *kāðëš > e.g. North gáđaš, Inari kaađâš
  • F *laudëš (> Fi. laude ‘seat in sauna’) → S *lāvtëš > Skolt laaudâš
  • F *murëš (> Fi. murhe ‘sorrow’) → S *morëš > e.g. North moraš, Inari muurâš
  • F *säigeš (> Fi. säie ‘thread, fiber’) → S *šeajkëš > Kildin šieigaš
  • F *säigeš also? → S *sājkëš > e.g. Skolt saaiǥâš
    (This looks like a contamination of the previous word × the verb *sājkē- ‘to wear out’ reflected in most of Samic; which is probably not loan, but older inheritance from original *säjkä, as no vowel-stem forms survive in Finnic. North sáiggas ‘worn’ is then simply a native derivative from the verb, as also per the semantics.)
  • F *tarbëš (> Fi. tarve ‘need’) → S *tārpëš > e.g. North dárbbaš, Inari taarbâš
    (In this one case, with an *s-stem quite widely alongside: *tārpēs > Southern daerbies, also Lule; *tārpës > e.g. Skolt taarbâs, also Pite, Lule. Lule Sami seems to have all three variants: dárpaj, dárpes, dárpas, and even a vowel-stem dárpa. There is Finnish dialectal tarvis as well, so the diversity clearly goes back to parallel loaning in some fashion.)

(2) Eastern Sami /-š/ ← Finnic *-is

  • F *kallis (> Fi. kallis ‘expensive’) → S. *kāllëš > e.g. North dial. gállaš, Skolt kaallâš
    (in Inari *ēs-stem kaalles, apparently with a nativized adjective ending)
  • F *ruumis (> Fi. ruumis ‘corpse’) → S. *rumëš > e.g. North rumaš, Inari ruumâš
    (parallel *romës in Skolt roomâs, [3] compareable with the Fi. dialectal variant rumis from Southern Ostrobothnia; and with a vowel stem in Southern Sami: *romē > räbmie.)
  • F *rugis (> Fi. ruis ‘rye’) → S. *rukëš > e.g. North rugaš, Inari ruuvaš
  • F *valmis (> Fi. valmis ‘ready’) → S. *vālmëš > e.g. North válmmaš, Inari vaalmâš

(3) Eastern Sami /-s/ ← Finnic *-eš

  • F *kantëlëš (> Fi. kantele ‘a traditional string instrument’) → S *kāntëlës > Inari kaddâlâs
  • F *kiireš (> Fi. kiire ‘hurry’) → S *kirës > e.g. Inari kiirâs
  • F *kärmeš (> Fi. käärme ‘snake’) → S *kearmëš ~ *kearmës > e.g. North gearpmaš, Ter “kermʾs
  • F *pereš (> Fi. perhe ‘family’) → S *pearëš ~ *pierës > e.g. North bearaš, Kildin пӣрас
    (vowel-stem *pearë in Skolt piâr)
  • F *terveš (> Fi. terve ‘healthy’) → S *tearvëš ~ *tiervëš > e.g. North dearvvaš, Inari tiervâs
  • F *voidëš (> Fi. voide ‘lotion, ointment) → S *vōjtës > Inari vuoidâs
    (From Sammallahti’s reverse dictionary of Inari Sami. Álgu does not have this lexeme, so I have no idea if there are equivalents elsewhere in Samic. This could be also an independent derivative within Samic from the base verb: PS *vōjtë- ‘to grease, anoint’, interestingly an *ë-stem one instead of *ē-stem, as could be expected.)

(4) Eastern Sami /-s/ ← Finnic *-is

  • F *nakris (> Fi. nauris ‘swede (type of turnip)’) → S *nāvrëš ~ *nāvrës > e.g. North návrraš, Inari naavrâs
  • F *saalis (> Fi. saalis ‘catch’) → S *sālëš ~ *sālës > e.g. North sálaš, Inari saalâs

A few initial comments:

  1. I’ve only included cases with -s when Western Sami, or failing that Finnic, actually points to *-š. Of course F *-š ~ S *-s can be also found in older shared vocabulary, as in ‘boat’: *venəš > F. *veneš > Fi. vene; > S. *vënës > e.g. North vanas, Inari voonâs; > Mordvinic *venəš > e.g. Erzya венч /venč/. ‘Hurry’ could be theoretically also of this type; per the vowel correspondence *a ~ *ā, kaddâlâs clearly cannot.
  2. This entire word group seems to be centered on Northern and Inari Sami. Reflexes are practically absent from Southern Sami (only gïermesj ‘snake’), very rare also in Ume and Pite Sami. This would fit well together with late separate loaning from early Finnish specifically + occasional diffusion into other Sami varieties.
  3. Some of these words are originally from Germanic, and could be in theory partly borrowed directly from there into Samic, but I haven’t found any examples where Proto(-Western)-Samic *-ëš appears in a loanword without Finnic equivalents. Also, many enough cases are native Finnic, either wholly (e.g. kantele perhe säie, nauris saalis) or at least the *S-derivative is (terve); or come from Baltic (käärme). The only case where parallel loaning is clearly involved is the ‘need’ group: probably *tārpëš via Finnic, versus *tārpës directly from Scandinavian *þarbiz.

Here is a quick distribution chart, as you may wish to consult for point 2: [4]

*imëš:      - - - L N I - - -
*kātëš:     - - - - N I S - -
*lāvtëš:    - - - - - - S - -
*murëš:     - - - L N I S - -
*seajkëš:   - - - - - - - K -
*sājkëš:    - - - - - - S K -
*tārpëš:    - - - L N I - - -
*kāllëš:    - - - - N I S K T
*rumëš:     - - - L N I S - -
*rukëš:     - - - L N I - - -
*vālmëš:    - - - L N I S K T

*kāntëlës:  - - - - - I - - -
*kirës:     - - - - - I S K T
*kearmëš/s: S - P L N - - - T
*pErëš/s:   - - - L N - - K T
*tErvëš/s:  - - - - N I S K T
*nāvrëš/s:  - U P L N I S K -
*sālëš/s:   - - - L N I - - -

          S U P L  N  I  S  K T
totals:   1 1 2 10 13 13 10 8 6

OK then, caveats done with, what is actually going on in here?

Mikko Korhonen in Johdatus lapin kielen historiaan mentions passingly (p. 200) only that the *-ëš-group “appears in correspondence to the loan original’s h(“š esiintyy itämerensuomalaisissa lainoissa originaalin h:ta vastaamassa“). The same is stated in stronger terms by Mikko Heikkilä in Bidrag til fennoskandiens språkliga förhistoriet i tid och rum (p. 107), where he claims that late Proto-Finnic *h would have been adopted in Samic as *h syllable-initially and *š syllable-finally. This seems phonetically implausible to me however, given that (1) Scandinavian /h/ is regularly borrowed into Samic as /h/ ~ ∅, never as **š, (2) Finnic coda *h from *k is never borrowed as Samic **š, and (3) there definitely is also a layer of loanwords where Finnic onset *š gives Samic *š.

Heikkilä seems to suggest that late substitution as *š could have involved loaning from an intermediate stage of the *š >> *h shift, that he gives as [ç]. A palatal fricative could indeed be plausibly borrowed as Proto-Samic *š, especially if this was a palatal sibilant [ɕ] (as suggested by its origin from *ś, and its later development to /j/ in Western Sami, when before a consonant). This intermediate reconstruction is however based on a common misunderstanding. Sound changes of the type *š > *h do not involve a trek through every single intervening POA you can find on an IPA chart! [5] These are rooted in the tendency of retroflex consonants in particular to acquire a velar coarticulation, which can then take over as the primary POA; and also for spirants such as [x] to lenite to [h]. Palatal [ç] would be overpassed entirely in this process.

So I see no other explanation than that the cases of *-ëš ~ *-eh must have been borrowed before the Finnic sound change *š > *h (before the loss of the sibilant feature, to be exact). And the distribution suggests that Proto-Samic would have been by this point already quite thoroughly broken up: after all, these words seem to have been borrowed independently mainly into the precedessors of L N I S. Perhaps Proto-North-Lule and Proto-Inari-Skolt at the deepest, in case such entities could be assumed (usually classifications of the Sami languages go with Pite-Lule and Skolt-Kola groupings instead, but I am not entirely sold on this).

In other words, I answer my headline question regardless in the negative: no, there did not exist any *š-stems yet in Proto-Samic, not even in any possible early subgroups like Proto-Western, Proto-Eastern or Proto-Non-Southern; they have only come about later through contacts with early Finnic.

I have not invented any real explanation for the dual treatment of *-š in Eastern Sami. For this, I can only offer a few hypotheses (that all point in different directions):

  • maybe early on there was a sound change *-š > *-s in Eastern Sami, and cases with retained are newer loans, perhaps partly from Northern Sami (since they seem to be fairly rare in Kola Sami)? This probably could not be equated with the general Samic shift of original *š to *s, since there are many enough good examples of the retention of prevocalic PS *š- in Eastern Sami, and none of unexpected *s- (that I know of). [6]
  • maybe Finnic *š was for a while again borrowed in Eastern Sami as *s, due to being increasingly non-palatal [ʂ], while cases with *-š are older loans from an [ʃ] stage?
  • maybe a lost Finnic variety has been involved where word-final *-š > *-s? A late analogical development of *-h-stems to -s-stems is known from Southern Ostrobothnia… which is however nowhere near the attested Eastern Sami languages.

Going by the vowel substitutions also diverging in *pearëš ~ *pierës, *tearvëš ~ *tiervës and *rumëš ~ *romës, the last two explanations sound somewhat better than the first.

This problem very likely needs to be further tied in with *-eš ~ *-is variation appearing even within Finnic, again largely with a West-East divide, such as Western Fi. tarve ~ Eastern Fi. tarvis; Fi. käärme ~ Karelian keärmis; Fi. säie ~ Karelian säijis; Fi. laine < *laineh ~ Olonets-Ludian-Veps lainis ‘wave’. But it is not clear to me if this is good enough to run with my third hypothesis, since there seems to be very little correlation in the occurrence of alternation in Eastern Sami versus in Finnic: there is no e.g. **seajkës in Samic, and more importantly, no **kantelis, **peris or **nakrëš, **ruumëš, **saalëš in Finnic.

[1] YSS has been now added to my fairly slowly growing Bibliography. If anyone’s curious, the five words with *š ← *s(i, ü) are: PS *šëlëtē ← PF *siledä ‘smooth’; Western *šëljō ~ Northern+Eastern *šiljō ← Fi. silja ‘courtyard’ (clearly rather one of the post-PS loanwords); PS *šëlmē ‘eye of an ax’ ← PF *silmä ‘eye’; PS *šëltē ← *silta < PF *cilta ‘bridge’; PS *šëntë- ← PF *süntü- ‘to become, be born’. I also suspect that PS *šōjē ‘rowan’ may derive from PF *sooja ‘protection’, as in Finnish (as also e.g. Germanic) mythology / folk belief the rowan tree has been considered to grant protection to the homestead. It’s not quite clear why would we have *š- and not *s- here, though. An independent loan from the same Indo-Iranian source (*sćāyā- ‘protection’) would also work.
[2] A slightly better explanation along almost the same lines might be “etymological alienization”, where the existence of Finnish hiili would have prompted a reshaping of e.g. expected Northern Sami ˣšilla into hilla, possibly fairly late then. This does not seem to be feasible in the case of Eastern Sami, though: in particular Inari and Skolt Sami have only come into intensive contact with Finnish fairly late, but the lack of of ˣh- indicates relatively early loaning. (IIUC /h/ → ∅ has remained the default case in contacts between Karelian and Kola Sami, however.)
[3] Álgu gives for this a comparison with North ruomas ‘wolf’, which looks like a rather recent (taboo? epithet?) borrowing from Skolt.
[4] S U P L N I S K T for Southern, Ume, Pite, Lule, Northern, Inari, Skolt, Kildin and Ter Sami respectively. Yes, that’s “S” appering twice, but you can figure this out.
[5] One impressive example of this approach is the development path “ʃ > ʂ > ç > x > χ > ħ> h” given in Kallio’s “Kantasuomen konsonanttihistoriaa”.
[6] Amusingly but probably unrelatedly: in “An essay on Saami ethnolinguistic prehistory” Ante Aikio mentions five examples of the “opposite” correspondence, with *s- in Western Sami ~ *š- in Eastern Sami.

Tagged with: , , , , , , , ,
Posted in Reconstruction

Proto-Uralic *ë in Mari

Mari is one of the key languages for the reconstruction of Proto-Uralic *ë, in having a mostly unique reflex *ü > Hill Mari /ü/ ~ Meadow Mari /ü/. The only other known regular source of this vowel correspondence is would-be *ü̆ (from earlier *ü, *i, *e) in roots of the shape *CV, such as Hill Mari /šü/ ~ Meadow Mari /šüj/ ‘neck’, from PU *śepä.

The development *ë > *ü was first explicitly proposed by Wolfgang Steinitz in his Geschichte des finnisch-ugrischen Vokalismus (1944) (in his notation: *i̮ > *ü). This fact has been later on essentially forgotten, though. E.g. fifty years later (1994), Gábor Bereczki in Grundzüge der tscheremissischen Sprachgeschichte recognizes only two examples theoretically falling under this: /šüm/ ‘bark, crust’ (< *śëmə ‘scales’) and /nölə-pikš/ ‘blunt-tipped arrow’ (< *ńëlə ‘arrow’), which he furthermore explains, following Erkki Itkonen’s views from 1954, instead as “sporadic” fronting from *u and *o. [1]

The grounds would have been ripe for a reassessment of the historical vocalism of Mari already since the rehabilitation of *ë by Janhunen and Sammallahti in the 80s. It has been taking a bit longer, though. The next source after Steinitz that is on board with his theory seems to be a footnote by Ante Aikio in his 2006 article “Etymological nativization of loanwords”, [2] hence adding up to a blackout period of more than 60 years. I believe this has been an independent rediscovery rather than a revival as well. Aikio notes also that the conditions for the change are unclear, and it is indeed the case that PU *ë as reconstructible per the evidence of the other languages is often enough reflected also as Mari *å (Hill /a/ ~ Meadow /o/) or *o (Hi. Me. both /o/). So how should we deal with these cases? [3] By now we have at least one initial suggestion, by Mikhail Zhivlov from 2014, that *ü would be the default reflex, *å ~ *o the reflex before the velars *k and *ŋ (but not *x).

The split *å ~ *o remains unclear for now as well, but this is also the typical development of *a, hence we seem to be dealing with an early lowering *ë > *a, as also in many other Uralic branches. This is general in West Uralic; in Permic it seems like the most common development (followed then by *a > *o > *u), versus retention *ë > *ë mostly before sonorants; and Hungarian has “a-umlaut”: *ë-a > *a. I suspect on the other hand that *ë-ə > *aa in Khanty is the result of relatively late lowering from earlier *ëë, which could be connected to the same change in Northern and partly Eastern Mansi. (The development *a > *aa is attested too, but rare, and more common developments like *a > *a, *a > *oo, *a > *uu seem to require room for maneuvering in pre-Khanty.)

After having looked over the data [4] once more though, I have settled on a different view: the primary conditioning seems to be instead syllable closure. (This is one of what I think of as the “stock” conditioning features for divergent vowel developments, along with metaphony and labial/palatal coloring due to neighboring consonants. [5])

1. *ë > *ü in open syllables (before a single consonant):

  • *ëla > *üla > *ül ‘under’
  • *ëŋas(V) > *üŋəSə ‘rested’ [6]
  • *jëxə- > *jüä- ‘to drink’
  • *lëCə-ta- > *lüðä- ‘to fear’ [7]
  • *lëčə- > *lüčä- ‘to get wet’
  • *mëxə > *mü-ländə ‘land’, *mü-ðe- ‘to bury’
  • *ńëlə > *nülə ‘arrow’
  • *ńërə > *nürə ‘flexible’
  • *sënə > *sünə > *Sün ‘sinew’
  • *sëtə > *Süðər ‘spindle’
  • *śëmə > *śümə > *Süm ‘bark, crust’
  • *śëta > *Süðə ‘100’
  • *wajə ~ *wëjə > *ü(j) ‘butter’

2. *ë > *a > *å ~ o in closed syllables:

  • *ëkta- > *akta- > *opte- ‘to put, place’
  • *lëkśə- > *lakśə- > *lokSə-ńća- ‘to chop’
  • *lëntV > *lantV > *lånda-ka ‘valley’
  • *mëksa > *maksa > *mokS ‘liver’
  • *ńëčkə > *načkə > *nočkə ‘wet’
  • *ńëkćəmə > *nakćəmə > *nåSmə ‘palate’
  • *pëŋka > *paŋka > *poŋgə ‘mushroom’
  • *tëktə > *taktə > *toktə ‘loon’
  • *wëlkətə > *walkətə > *wålɣəðə ‘light’

Set #2 certainly shows a lot of velars following either immediately (*kt *kś *ks *kć *ŋk) or as the second member of the cluster (*čk *lk), but this probably doesn’t need any explanation other than the general abundance of *k in the PU consonant cluster inventory. There is also one case with *nt. Set #1 meanwhile has only one example with *-ŋ-, but similarly, in CVCV roots *-k- is by contrast rather rare.

This pattern is complicated though by suffixation and consonant cluster simplification processes in Mari. In these cases we find both *üC and *aC, and I would hypothesize that this means that the split of *ë dates in-between some of these.

3. *ë > *ü also in secondarily open syllables:

  • *ëptə > ? *ëpə > *üp ‘hair’
  • *mërja > ? *mëra > *mür ‘strawberry’
  • *sëntə- > ? *sëtə- > *Süðä- ‘to clear woodland’

4. *ë > *a also in secondarily closed syllables:

  • *ďëmə → *ďëmə-pawə > ? *ďëmpa(w) > *lampa > *lombə ‘birch cherry’ (a compound with the word for ‘tree’ as the second member)

5. *ë > *a in “tertiarily open” syllables?:

  • *ëppə > ? *appə > *owə ‘father-in-law’
  • *këččə > ? *kaččə > *kåčə ‘bitter’
  • *lëmpə > ? *lampə > *lop ‘depression in ground’
  • *wëlka- > ? *walka- > *wåle- ‘to go down, descend’

6. *ë > *ü in “tertiarily closed” syllables?:

  • *ńërə(-ka) > ? *nürə-kA > *nürɣə ‘cartilage’

But it’s also possible that there are a few other, smaller conditioning factors here as well. It seems somewhat dubious to me in particular to to end up dating *mp > *p (in *lop) as younger than *nt > *t (in *Süðä-). In principle most cases here could be also further confirmed or falsified by other results on the chronology of consonant cluster simplification in Mari.

This hypothesis also points towards a different line of explanation for some other instances of Mari *ü. There is at least one case where we find *ü in a closed syllable clearly retained from PU, in what seems like an original back-vocalic environment. This is *jükSə ‘swan’, for which I have earlier sided with reconstructing *ë … but since none of other languages show evidence particularly in favor of *ë, maybe a development *o > *u > *ü or the like will be a better explanation for this one case after all.

[1] I don’t think I can dismiss strongly enough, in polite company at least, the notion of reconstructing “sporadic” sound changes. As some readers know, my (hopefully soon-to-be-wrapped-up) Master’s thesis treats the research history and reconstruction of the Proto-Finno-Ugric long vowels. One meta-result of this work has been that, by now, I see Itkonen’s insistence on sporadic sound changes as having prevented substantial progress in the reconstruction of comparative Uralic vocalism for just about half of the entire 20th century (to some extent even up to today). This device is not much more than a license to stop thinking — to avoid placing a given language group’s phonological structure in a general comparative context, and therefore, to be unable to discover more parsimonius explanations such as properly conditional splits. Closer to the topic though: I cannot blame Bereczki very much for not seeing /ö/ and /ü/ as etymologically equivalent, since the lowering of *ü to /ö/ (perhaps better: retention?), as later unraveled in detail by Aikio, has at least somewhat complex conditioning.
[2] In Diane Nelson & Ida Toivonen (eds.): Saami Linguistics, pp. 17–52.
[3] Steinitz did not have any trouble with these exceptions, since he postulated extensive original “ablaut” variation such as *a ~ *i̮ as a data-cleaning deus ex machina of his own.
[4] Three of the cases in section 1 are absent from all three recent overviews of the development of *ë either in general or in Mari in particular, i.e. Aikio 2014, 2016 and Zhivlov 2014 (see Bibliography). (1) The reconstruction *ëŋas(V) ‘rested’, reflected also in Samic *vōŋēs, is from Aikio’s PhD thesis (2009: 289). (2) *sëtə (> Mo Ma P) can be found in already in UEW (as *setɜ, and with Erzya /sad/ ‘stem, trunk’ rejected, though it fits perfectly under *ë; this may be a better etymology for the Mari word than the comparison with Finnic *kecrä, Mordvinic #kšťəŕə ‘spindle’ ← pre-II *ketstra-). (3) The ‘butter’ word has been consistently reconstructed only as *wajə (*waje, *wōje etc.) so far. Aikio 2016 notes that Samic and Mordvinic point to *ë — but so do also Mari as well as Udmurt /vɤj/. Finnic *voi (regular from both) and Komi /vɨj/ (irregular from both, though possibly less so from *ë) don’t allow disambiguating; therefore it is only the Ugric reflexes that point to *wajə, and perhaps it is them that have innovated here, not the western languages. — An additional similar case is (4) *lëčə-, appearing already in UEW as *lače-, and covered by Zhivlov, but not Aikio.
[5] I do not rule out other consonant-environment-related changes, of course. For just one of my favorite examples of something less obvious, there is how the labialization of earlier /wa/ to /wɔ/ in Early Modern English (later > /wɒ/, /wɑ/, /wɔːɹ/ etc.) (dwarf, quarter, swan, swap, walk, war, was, what, etc.) is blocked before velars (quack, twang, wag, wank, wax, whack etc. instead have the usual development /a/ > /æ/). But I would be hesitant to apply this type of explanation too liberally. At its worst this can turn into over-fitted sound laws where each specific environment applies to no more than one or two words.
[6] I’m leaving aside here the only marginally dialectally retained contrast between PU *s, *ś and *š, which is irrelevant for the present issue.
[7] A trisyllabic reconstruction with a lost middle syllable (all of *lëjə-, *lëwə-, *lëxə- and even *lëkə- would work) seems to be required to account for the correspondence between Mari /-ð-/ (normally < *-t-, *-tt-, *-d₂-) and Samoyedic *-r- (normally < *-r-, *-d₁-). The lenition of *-t- to *-ð- > *-r- in the latter, regular after noninitial syllables, seems to have taken place also in “contracted” roots of this type. Compare *jëxə- → *jëxə-ta- > *ë-r- ‘to drink’.

Tagged with: , , , ,
Posted in Reconstruction

Studia Uralo-Altaica Online

This Tuesday night, while looking for something else entirely, I’ve accidentally stumbled on another linguistic publication series making the leap online (a few years ago already in fact): University of Szeged’s book series Studia Uralo-Altaica, including also its Supplementum sub-series. I already had a number of these scans through indirect channels, but there are also many items of interest I either did not have yet in digital shape (e.g. László Keresztes’ two-volume Geschichte des mordwinischen Konsonantismus), or which I’ve not gotten acquaintanced with before at all (e.g. most of the Turkological literature).

The UoSz archive unfortunately does not give a single listing on the series’ contents, and the collection volumes have been split into separate articles (including forewords & such) — so for general convenience, here are all the volumes together in a single list:

Or let’s make it two lists, and keep the Supplementa in their own one:

Tagged with: , , ,
Posted in Links


Sometimes I feel I’d like to see an anti-etymological dictionary.

Given two or more different etymological dictionaries, especially for an entire group of languages, typically one of them (usually from the older end) is going to end up being less critical, while another one (usually from the newer end) is going to end up being more critical. If we want to know what is known so far about a word’s etymology (cognates, reconstruction, etc.), we’d look in the more modern dictionary, of course. But if we want to know what is not known about a word’s etymology — i.e. what research questions are still open? neither of these sources is really going to work. What’s needed for this is, at a pinch, the difference between them.

Sometimes older separate etymological groups get combined into a single one, and sometimes older single etymological groups will turn out to comprise unrelated words and will be disassembled into various different ones (maybe under different native roots, but maybe also as loans or derivatives). This is all no major problem so far, especially if newer research will bother to mention that earlier, zoop was considered cognate with foop, but per current understanding it is actually cognate with doop.

But etymologies can also simply vanish from the literature record without comment, or with minimal comments along the lines of “strike this” (this latter type I’ve seen in erranda or in “update notes” to new editions). This I find unsatisfying. Even when an explicit reason has been given (”the correspondence z ~ f is irregular”), if this merely renders the compared words without etymology, then we are again back to square one on what the words’ origin actually is. Or, for that matter, on why the earlier observed similarity exists at all?

It is possible for similarity to exist for reasons other than by proper common inheritance or pure random chance: loans between related languages, loans in parallel from a third source, common inherited morphology applied to different roots, contamination between semantically nearby words, universal onomatopoetic patterns… Traditional etymological dictionaries I’ve only seen commonly apply the last of these with any consistency. The first is usually invoked only in cases of obvious, long since established layers of loanwords (in Uralic context e.g. Finnic → Samic, Komi → Ob-Ugric). The second thru the fourth are rarely explored at all.

So I would hope for truly thorough etymological dictionaries to also include a discard pile of words and comparisons from earlier literature that remain without an adequate explanation, something which would definitely make future etymologists’ work slightly easier.

I am currently doing some “antietymological” groundwork myself: charting how much content there is in Collinder’s Fenno-Ugric Vocabulary that is not reproduced also by later sources (mainly the UEW on one hand, Janhunen’s Samojedischer Wortschatz on the other). It is not a lot, and most of the omissions are clearly dregs, but some small part of the material remains interesting. It is even possible to find examples that have later reappeared again: one is the comparison of Mari *lüðä- ‘to fear’ with Samoyedic *lër(ə)- ‘to be afraid’, rediscovered by Ante Aikio in his paper on new Mari etymologies from a few some years back.

A much bigger amount of work, however, would entail somehow bridging the still largely aligned FUV and UEW etymological corpora with the more heavily pruned ones in Janhunen 1981 and Sammallahti 1988. For most of the comparisons rejected by the latter two authors as insufficiently regular, this has been done quietly, without any arguments given at all. This may very well have allowed in increases in historical phonology, but at the cost of what seems like a hefty step back in how much we can claim to know about Uralic etymology.

Even further observations could be perhaps made by taking a look at even earlier etymological compendia: Budenz’ Magyar–ugor összehasonlitó szótár (1873–1881), Donner’s Vergleichendes Wörterbuch der finnisch-ugrischen Sprachen (1874–1888), as well as the extensive material quoted in the major historical phonology overviews that followed in their wake, such as Paasonen’s “Beiträge zur finnischugrisch-samojedischen Lautgeschichte” (1913). I again know of some recently rediscovered etymologies that have first been suggested already around this time or even earlier. Especially the first two include etymological comparisons still more boldly than FUV and UEW though (which were at least constrained by mainly compiling etymologies from already published literature), so the junk to real forgotten goodies ratio would surely be still lower.

There’s also another sense in which “anti-etymologies” could be compiled from this period, however. This far back it is not difficult at all to find comparisons that have been rendered firmly obsolete by now, not just left into a limbo of “irregularity”. These might be illustrative in showing how has etymological progress been achieved over the last 100+ years. Have they been superceded by new native comparisons enabled by new data? by loanword etymologies? by new morphological analyses? something else? … and the results of such a survey could perhaps be then used as a roadmap for future research as well, to work out what’s likely and what’s not likely to continue to provide new results.

Tagged with: , ,
Posted in Etymology

Phonology squib: ‘Clay’ in Proto-Uralic

I have a principle that applies quite often when working with quantity-over-quality mass comparative dictionaries (papers, databases, etc.): what is asserted without evidence can be dismissed without evidence.

The UEW is, unfortunately, a repeat offender on assertions without evidence. This comes up maybe the most with its own reconstructions, which do not seem to follow any definite scheme: there definitely isn’t one expounded on anywhere in the book, and to my knowledge none of the editors have published detailed papers on the topic, either. [1] This results in many junk reconstructions that seem to have only been hastily eyeballed together, sometimes with crass errors.

To avoid excess alarmism though: by “its own reconstructions”, I mean only a subset of the Proto-Uralic (Proto-Finno-Ugric, -Permic, etc.) reconstructions presented, those that seem to have been put together for the first time by the UEW team. Many of the reconstructions are however not all-new, and have been inherited from earlier research. Maybe the most direct source is Collinder’s Comparative Grammar [2], but various bits also trace back to earlier studies on historical phonology, such as Itkonen’s comparative vocalism surveys, or Paasonen and Setälä’s early 1900s Neogrammarian works that mainly involved consonantism, or even the 1800s comparative dictionaries of Budenz and Donner. Alas, none of this is explicitly referenced, and so the reader is left in the dark. Determining what, if anything at all, some particular reconstruction is based on would take a wild goose chase through the un-annotated list of literature found at the end of each entry.

(For non-specialists in Uralic reconstruction, as a quick rule of thumb I would say: any reconstruction with cognates in Finnish + at least two other Uralic subgroups can be treated as relatively safe; so can all remaining reconstructions that are continued in 6+ subgroups, which are usually given in bold; anything continued more narrowly is in principle suspect; anything prefixed with a question mark should be treated as unreliable entirely.)

Even if many of the UEW’s reconstructions are junk, this does not however imply that the etymological comparisons they are attached to would also be. Sometimes it will be fairly easy to work out a better reconstruction. Today I have taken a look at a word for ‘clay’ that the UEW reconstructs as *śojwa, and noticed that this seems to not match any of the descendants given…

Not absolutely everything is wrong, of course. The consonant skeleton *ś-jw- works well enough: we have entirely regularly Samic /č-/ ~ Permic /ś-/ ~ Samoyedic /s-/, and S /-jv-/ ~ P /-j-/ ~ Smy ∅ is reasonable. But the vowel reconstruction *o-a seems to be not really defensible.

  • In Samic, we have reflexes only in Kola Sami: Kildin /čuwwj/ (though apparently чуййв in the written language), Ter /čujjvɛ/. These nominally suggest Proto-Samic *čujvē — but, from earlier *śojwa, we would instead expect to see PS *čoajwē > Kola **čuəjjve. Compare PU *ojwa ‘head’ > PS *oajvē > Kildin /vuəjjv/ вуэййв, Ter /vɨəjjvɛ/.
  • In Permic, we have *o > Komi /o/ ~ Udmurt /u/. This is not a regular reflex of *o: it instead usually continues PU *a or *e. There are various other claimed cases of *o > *o (at least *kojə-ma > *kom ‘male’ — the source of the ethnonym Komi — seems unassailable, even if still possibly irregular), but normally we would expect *o-a to give *u.
  • The Samoyedic examples are a bit hard to assess offhand: we have reflexes only from Selkup and Kamassian, and so Janhunen’s Samojedischer Wortschatz leaves this word unconsidered. /üü/ in the former can go back to various pseudo-diphthongs; including *åj (*såjtə- > /süütɨ-/ ‘to sew’), *oj (*tojmå > /tüüm(ɨ)/ ‘larch’), *uj (*jujtə- > /küütäptɨ-/ ‘to dream’), *əj (*pəj > /püü/ ‘stone’), even *äj (*päjwä > /püü/ ‘warm(th)’). Kamassian /e/ does not seem to match any of these on a quick checkup, but there are probably various conditional developments involved that blur the picture. PU *o-a regularly gives PSmy *å-(å), so maybe the first is what we should bank on… However, in an *A-stem, *jw would be expected to remain in PSmy; and result then in *ľć in Selkup. [3]

The Kola Sami ~ Permic vowel correspondence can be however quite well derived from *a-a; developing to *ō-ē in Proto-Samic. This normally later gives /uu/ in Kildin, /ɨɨ-ɛ/ in Ter, but presumably (see below) earlier *uu was shortened here to /u/ before it could unround in the latter. *a-a also gives Samoyedic *å(-V), i.e. works at least as well as reconstructing *o-a.

I would also favor reconstructing medially *-wj- instead of *-jw-. UEW, I imagine, bases the latter on Ter Sami; however this is actually non-diagnostic, since in the language, there is regular metathesis of PS *-vj- to *-jv-. The Kildin form should be therefore instead taken as evidence for *-wj-. (In literary Kildin Sami, it seems that Ter-esque -ййв- is preferred in place of *-vj-, e.g. тоаййв ‘often’, while T. I. Itkonen’s Koltan- ja kuolanlapin sanakirja gives /tɑwwj/. Does this maybe stand for dialect variation within the language?) This in mind, the ad hoc-sounding shortening (*a > *ō >) *uu > *u also makes decent phonetic sense: we’d be dealing with [uːw] > [uw], a contrast that seems difficult if not impossible to maintain.

I believe no exact precedents are known for the development of *-wj- in Permic, but in general *-w- is lost always, while *-j- remains at least in various clusters; so *-wj- > *-j- seems about as good as could be expected. As for Samoyedic, *-w- is lost syllable-finally: this means we’d expect *śawja > *såj(V), which is at least a decent contender for the Selkup-Kamassian preform. (Preferrably not *såjå; contrast *kåjå > Kamassian /kuja/ ‘sun’. *-a > *ə is however quite common in Samoyedic, maybe in particular after (original?) consonant clusters.)

Altogether, I end up with the conclusion that all words given by UEW under *śojwa are better considered to continue Proto-Uralic *śawja.

These adjustments also open some new vistas. They allow the possibility to consider that my new and updated reconstruction might be a part of the same original root its established synonym: *śawə (UEW: *śawe). This is continued directly only in Finnic (*savi > Fi. savi etc.), but also in various derivatives: *śawə-nV in Mordvinic (*śovəń > Erzya & Moksha сёвонь) [4], Mari (шун) and Komi (сюн); *śaw(ə)-d₂V in Mansi (*suwľ(V) > Northern сӯли) and Khanty (*sawəj > *sawïï) [5]. It seems therefore likely that also the *śawja group is similarly originally a derivative *śaw(ə)-ja. The exact morphology going on remains however mysterious. *-nV is only known as a vague diminutive suffix; *-ja usually forms action nouns; *-d₂V is, to my knowledge, not reconstructible for Proto-Uralic at all (there may be one other parallel within Ob-Ugric though: *ńooɣəď ‘meat’, maybe *ńaKV-d₂a).

It would be also possible to shuffle the *-ja and *-d₂V groups around a bit: *-j in Khanty and Samoyedic can continue either just as well. At least the Mansi form with *ľ and the Samic & Permic forms with *j however must be distinct from each other.

[1] Editor-in-chief Rédei has arguably taken some steps towards this in his 1968 article “A permi nyelvek első szótagi magánhangzóinak a történetéhez” (NyK 70: 35–45). His “pre-Permic” vowel system does end up being identical to the Proto-Uralic vowel system that is currently accepted the most widely, but this may be just a happy accident: he makes no effort at all on the issues of if and how the other Uralic languages could be derived from the same system; and his treatment of which particular original vowel should be assumed in which particular words is very patchy as well, covering only some incidental examples.
[2] His Fenno-Ugric Vocabulary gave only comparative data; their associated reconstructions were only given in an appendix to CompGramm., wherein he had presented his thinking on Uralic comparative phonology and morphology as well.
[3] This oddball soundlaw probably proceeds something like *jw > *jj > *jɟ > *ʎɟ > *ʎtɕ = *ľć.
[4] *o is, I believe, due to the following development: first *a-ə regularly > *å-ə > *o-a, followed by a conditional split: *o > *u before a velar sonorant (regularly established in the case of *-oŋ- and IMO also occurring in the case of *-olk-); lastly *u > *o.
[5] With Kazym /sŏwĭ/, Krasnoyarsk (Southern) /săwə/ regularly retaining PU *-w-.

Tagged with: , , , , , ,
Posted in Reconstruction

Bonus Material 2017

A little recap of history: Freelance Reconstruction, the blog you’re currently reading, [1] was originally started as a Tumblr microblog. It turned out though that my blogging style needs a sturdier framework, and for several years now, I’ve been happy to be based on WordPress instead.

This much some old readers may recall. However I never have gotten much into doing quick-paced community engagement blogging on here, in part indeed due to the heavier-duty software. And since I still hang out on Tumblr for unrelated reasons, I’ve also found it useful to have an outlet to comment on things related to linguistics that come up in there.

Thus, enter a new, more casual linguistics sideblog: This has been running for a bit over a year by now, but I don’t think I’ve mentioned anything about it earlier on here. Perhaps I should also request that anon asks be redirected there instead of the old defunct version of this blog?

Here is also a list of some posts on there that might be of interest to the readers on here as well.

1. Original blog posts and commentary on topics:

— on the structure and history of Finnish:

— on Uralic linguistics in general:

— on phonological fun facts and typology:

— other stuff:

2. Links to other blogs, articles etc. without much additional insights of my own:

[1] I’ve seen this blog occasionally linked under the name “Protouralic”, but to be exact, that is only my blog’s URL, not the title. The discrepancy is mainly since I can foresee maintaining this blog long enough that I will no longer be doing freelance reconstruction… It remains to be seen what the blog will be renamed at that point, though.

Tagged with: , , ,
Posted in Links, Meta

An old etymology: aistiész

I find it interesting how modern advances in Uralic historical phonology can occasionally turn out to vindicate old sketchy etymological proposals, dating from the earliest phases of scientific comparison of the word stocks of the Uralic languages.

One of these cases appears to be a connection between Finnish aisti ‘sense(s)’ and Hungarian ész ‘reason’. This is a comparison that appears in 1800s work by the likes of Budenz. Already come the 1900s it had mostly been dropped, however (with decent reasons, as we shall see). But regardless, nowadays it seems that the two can be regularly connected after all.

Let’s start from the Finnish side. Any possibility of a comparison with Hungarian can only involve the first syllable, ais-. Once we factor in the other cognates within Finnic though, already internal reconstruction turns out to point towards the rest of the word being suffixal material.

“What other cognates”, you ask? Yes, there are no known cognates in the other Finnic languages, which usually doesn’t bode well for native origin. The Finnish dialects, however, make up a good reserve of lexical diversity (recall that “Finnish” in its widest sense is only a geographical collection of Finnic varieties, not an actual subgroup thereof). We can find in these some interesting parallel formations that allow some deeper exploration of this connection. Standard Fi. only provides aistin ‘sense organ’ and aistia ‘to sense’, both of which could be accounted as derived from aisti itself. More interesting are, however, aisto ‘intent’ (Southwestern dialects) and astaita ‘to observe’ (Tavastian and Far Northern dialects).

The aisti ~ aisto doublet points relatively clearly to an earlier lost verb stem *aistaa. For an exact parallel, cf. paistaa ‘to bake’ → paisti ‘roast’, paisto ‘baking’. Also aistia can be then better analyzed as an iterative aist-i- derived from this stem, not as a zero-derivation of aisti.

The heavy stem structure, in turn, suggests a segmentation of this verb as *ais-ta-. Almost all Finnish verbs of the shape CVXCtA- (where X is any segment: vowel length, semivowel, or consonant) are derivatives, most often of this kind. [1] Likely semantics at this point will be something like *ais(i) ‘senses, observation’ (in any case a nominal) → *ais-ta- ‘to sense, observe’.

So far this reconstructed *aisi does not need to be any older than medieval Finnish. e-stem inflection **aisi : **aise- would be usually be a good sign for dating a word back to at least Proto-Finnic, but we only have evidence for √ais- as purely a root element, not as an independent stem. It is true that consonant stems such as √ais- usually take e as their stem vowel, when required — but this kind of derivation can at least occasionally be based on other stem types as well. E.g. the i-stems kaali(-) ‘cabbage’, viini(-) ‘wine’ have regardless the analogical consonant-stem partitive singulars kaalta, viintä in colloquial Finnish; and verbs in -tA- derived from trisyllabic A-stem adjectives are quite regularly based on a consonant stem, e.g. kavala ‘treacherous’, kumara ‘slouched, bent’, matala ‘low’, viherä ‘green’ [2]kavaltaa ‘to embezzle’, kumartaa ‘to bow’, madaltaa ‘to make shallower’, vihertää ‘to be verdant’.

The key evidence for projecting the root fairly far back comes instead from astaita. Why do we have as- and not ais- in here? I believe the answer is that this is a very old parallel derivative, already from a Proto-Finnic *aisi. The overheavy stem structure *CVXCtA- is innovative in Finnish. In certain very old cases, we instead see consonant cluster simplification to a regular heavy stem CVCtA-. At least the following three cases are still apparent:

  • *kanci > kansi ‘lid’, stem *kant(ə)- > kant(e)-
    → *kant-ta- > *katta- > kattaa ‘to cover’;
  • *nowsə- > nouse- (infinitive nousta) ‘to rise’
    → *nows-ta- > *nos-ta- > nosta- (inf. nostaa) ‘to lift, raise’;
  • *vejcci > veitsi ‘knife’, stem *vejcc(ə)- > veitse-, veis- (partitive veistä)
    → *vejc-tä- > *vec-tä- > dialectal vestä- (inf. vestää) ‘to whittle’.

*ntt > *tt, seen in the first example, has been known for long, and has further support from inflectional morphology (e.g. in the ordinals: *kolmanci : *kolmant-ta > *kolmac : *kolmatta > kolmas : kolmatta ‘third’). Yet another instance of this same sound change, easiest formulated simply as *C₁C₂C₃ > *C₂C₃, is probably the loss of a stop before /st/. This is not evidenced in derivational morphology, but is quite regular in consonant-stem partitives or infinitives (the types lapsi : *laps-ta > lasta ‘child’; juokse- : *jooks-tak > juosta ‘to run’). The cases with loss of a semivowel do not build up a consistent a picture at all, but I think the cases with similar loss of *n, *p, *k etc. allow putting them on firmer ground. — Standard Finnish has analogical veistää for ‘to whittle’, but most other Finnic languages still retain the soundlawful ⁽*⁾vestä-.

Therefore, I would reconstruct here PF *ajs-ta- → *as-ta-, an earlier doublet of the later *ais-ta-. Further derivational extension towards astaita can be well later, however.

If an earlier Finnic stem *aisë- < *ajsə- can be therefore assumed, it turns out that this will an exact equivalent to Hungarian ész. The PU form can be reconstructed as *äśä: in Finnic we have first ä-backing to yield *aśə; followed by palatal breaking to yield *ajśə, and finally depalatalization to *ajsə. The modern Hungarian nominative could just as well continue earlier **eśV, but the short-vocalic (and vowel-stem) plural/accusative/possessive stem esze- clearly requires Proto-Hungarian *ä < PU *ä, just as in e.g. tél : tele- ‘winter’ < PU *tälwä > Fi. talvi.

(My proposal that *aĆV > *ajCV in Finnic still remains without a published defense. Anyone who is skeptical of this is welcome to reconstruct instead *äjśä in the meanwhile and assume cluster simplification in Hungarian.)

While this seems to work in principle, a look into any relatively modern etymological dictionary of Hungarian will present a different, simpler etymology of ész: borrowing from Proto-Turkic *äs ‘memory, mind’. Does this show that the Finnish-Hungarian parallel is only an elaborate coincidence?

I could argue that the loan etymology being “more simple” is mostly cosmetic. An etymology is in fact not more unlikely just because it involves a larger number of sound laws, as long as those sound laws are established well enough in the first place. The entire point of reconstructing sound laws is to group the phonetic development of multiple words under a single assumed event. New examples of a known soundlaw do not constitute new assumptions by themselves. As for morphological complications, my internal reconstruction of pre-Finnic *ajsə < ? *äśä is based solely on the Finnish data, therefore has no bearing on how we analyze the Hungarian.

However, there is also a better option! The connection of Hungarian with Finnish does not mean we have to discard the Turkic comparison entirely: we can simply invert the direction of loaning, and analyze this as a Hungarian loanword in Turkic.

Many of the numerous word comparisons between Hungarian and Turkic originate quite clearly from the Turkic side. Identifying features are common enough, even among the oldest layer of loanwords:

  • unetymological sound structure in Hungarian, e.g. bölcső ‘cradle’ < *belćöw < *belćəɣ ← Turkic *belčik; initial /b-/ is clearly non-native, and there are also no clear precedents for clusters of liquid + affricate in Proto-Uralic.
  • with a loan origin further away than in Turkic, e.g. gyöngü ‘pearl’ < *ďinďü ← Turkic *jinjü ← Chinese (Mandarin zhēnzhū)
  • replacing an established Proto-Uralic term, e.g. hattyú ‘swan’ << *qottVŋ ← Turkic *qotaŋ; contrast PU ? *jëxćə.

But in the absense of any evidence of this sort, it does not seem clear that we would have to continue to simply assume the direction Turkic → Hungarian. In the current case we indeed have equivalent evidence in favor of the other direction (an unproblematic cognate in Finnic, which moreover requires PU *ś > Ugric *s, which change would then be reflected also in Turkic). There is reason to expect more symmetry going on anyway: some of these loanwords go quite far back, to the early 1st millennium CE, when “Turkic” would have still been barely more than a single language (if likely with incipient dialect divisions), while “Hungarian” (maybe less anachronistically: “Magyaric”) would already have been an established branch of Uralic. The fact that Turkic today is a major language family stretching from Anatolia to the Lena, while Hungarian is a single language isolated within its family, is a much later development, from around the 2nd millennium CE.

I expect that a closer look at Hungarian-Turkic lexical parallels will reveal also other cases that can be analyzed as Hungarian loans in Turkic at least equally well as in the opposite direction.

A layer of early Hungarian loans in Turkic could moreover account also for a number of the known “Ural-Altaic” lexical parallels. I’ve posted before about *qujaš ‘sun’. Two quick further examples:

  • Turkic *al- ‘lower, below’: often compared with PU *ëla ‘under, below’. This seems to show the common-in-Uralic sound change *ë > *a, as well as apocope; both of these can be seen also in Hu. al-.
  • Turkic *tāla- ‘to rob, plunder’: well compareable with PU *sala- ‘to steal, hide’. The phonetic development lines the best up with Mansi or Samoyedic, where *s > *t. However, this could be perhaps derived also from a stage of Hungarian where *s > *ɬ had taken place, but further development > *h > ∅ had not. This would be then compareable to how /ɬ/ in Khanty tends to be borrowed as /t/ into Russian or other nearby languages that lack the sound. — EDAL compares the Turkic also with Korean and Japanese verbs for ‘to lure’, but this is a worse semantic match than comparison with Uralic (or, for that matter, with PIE *tsel- ‘to sneak’, whence Germanic *stela- ‘to steal’).

Elsewhere in Uralic, there are no clear inherited cognates that I would know of for my assumed *äśä . There is a Samic reflex though: PS *āj(c)cë- > NS áicat ‘to observe’, but the vowel correspondence *ā-ë ~ *a-e, and the unpalatalized sibilant, clearly point to a loanword from Finnic. (This also seems to have good chances of being one of the pseudo-PS reconstructions that never occurred in Proto-Samic proper.)

For a small tangent — the affricate -c- (= IPA [ts]) is interesting here. It would be possible to explore the possibility that this is somehow metathetical, and based on the suffixed verb *aista- (then further somehow contaminated with the *ë-stem noun to yield an *ë-stem verb). I suspect a different explanation, however.

Namely, the Samic languages are known to fortite inherited prevocalic *ś- to *č-. This unusual sound change could probably be reversed: taken to indicate that *ś- was originally an affricate *ć-, retained in Samic all along vs. normally deaffricated in all other Uralic languages. [3] The same is also suggested by the long-known Indo-Iranian loanwords like *śëta (*ćëta) ‘100’, *śarwə (*ćarwə) ‘horn’ (> PS *čuotē, *čoarvē > NS čuohti, čoarvi). Per the current understanding, these still had an affricate *ć in Proto-Indo-Iranian, retained as c in Nuristani. The fricatives ś in Indic and s in attested Iranian are therefore parallel innovations. Even Proto-Iranian may still require an affricate *c, to account for the development to /θ/ in Old Persian (though perhaps laminal [s̻] would work just as well). There also does not seem to be any reason to assume that any of the old II loans in Uralic would have come from the Indic branch specifically.

In Finnic, there is however no need to assume especially early deaffrication in all positions. We know by now that PF had an affricate *c, partly preserved in South Estonian, but later mostly deaffricated elsewhere. It would seem to be possible to assume that at least word-internal *-ś- in fact yields Proto-Finnic *-c-, not *-s-, and that this is only deaffricated later on, together with *c from other sources (such as the type *wetə > *veti > *veci > vesi ‘water’). — Since SE consistently only indicates s- for PU *ś- (sada ‘100’, sarv ‘horn’, silm ‘eye’, sälg ‘back’, süä ‘hart’, etc. [4]), I would however still assume early word-initial deaffrication: PU *ć- > *ś- > PF *s-. This would run in parallel to the often assumed development of PU *č- > *š- > PF *h-.

Therefore, to immediately correct what I write above, it may be preferrable to assume an original preserved affricate: PU *äćä > *aćə > *ajćə > PF *aici, borrowed at this point into Proto-Samic.

[0] This post is an extended version of an etymology I have presented before at one of the University of Helsinki etymology workshops, in case anyone feels like the basic gist is sounding overly familiar.
[1] E.g. kieltää ‘to deny’ ← kieli : kiel(e)- ‘tongue’; saartaa ‘to surround’ ← saari : saar(e)- ‘island’; köyttää ‘to tie’ ← köysi : köyt(e)- ‘rope’; varttaa ‘to graft’ ← varsi : vart(e)- ‘stem’; haistaa ‘to smell (tr.)’ ← haista : hais(e)– ‘to smell (intr.)’. Perhaps also paistaa ← *pais(e)- ‘to be baked’, given paisua ‘to swell, expand’, perhaps originally used of dough. (This shorter stem in turn has been explained as deriving from PIE √*bʰeh₁- ‘to heat’.) A few “overheavy” verb stems have instead been formed by a suffix -stA- plus a contracted first syllable, though, such as maustaa ‘to season’ < *maɣusta- ← maku ‘taste’.
[2] Mostly replaced in standard Finnish by the metathetic reshaping vihreä.
[3] Without going in too much details, this is a proposal that has already been made by various people, such as Abondolo, Janhunen and Katz. It does have the implication that something needs to be done with traditional PU *ć, though. The only reliable instances seem to be the clusters *-ńć- and *-ćć-. All other words are perhaps better considered later loanwords, diffused between the Uralic varieties. This is also suggested by how the candidates are disproportionally represented in Permic and Ugric anyway, and how they also often show vacillation between traditional *ć and *ś (say, ‘to break’: Permic *ćeg- < *ć-, but Ugric *säŋk- < *ś-).
[4] Võro-eesti synaraamat does have a nursery term tsimmä ‘eye’, but the affricate in here is probably better considered affective variation than an ancient retention. Secondary *ci- < *ti- can however remain; the usual example is tsiga ‘pig’ (~ Fi. sika).

Tagged with: , , , , , ,
Posted in Etymology

Enter your email address to follow this blog and receive notifications of new posts by email.