Native initial clusters in Udmurt

Typological definitions of Uralic [1] just about always note the lack of native word-initial consonant clusters. While the literary standards have their share of IE-derived clusters by now, in rural dialects and the Siberian languages clusterlessness is common enough to this day. However, exceptions can be found in the other direction too, although they seem to be an understudied topic.

The most obvious offender is Mordvinic, which sports all kind of words like /kši/ ‘bread’, /kšńi/ ‘iron’. Perhaps in most cases these are IE loanwords in ultimate origin, but involving native syncope, in these examples from preforms along the lines of *kərsä, *kərtnä. Russian influence can be still suspected though, since apparently this syncope is mostly post-Proto-Mordvinic. Two illustrative examples: Moksha /kštralks/ ‘bobbin’ ← /kšťir/ ‘spindle’ + /alks/ ‘bottom’, cognate to Erzya bisyllabic /ščeŕalks/; Erzya /troks/ ‘across’, cognate to Moksha /tərks/, /turks/ (PMo *turəks?). But I do not think the details of the development of these has been worked out in full, and several cases built on native Uralic material can be found also, such as Er. /pŕa/ ~ Mk. /pŕä/ ‘end, head’ (< PU *perä), /pškaďems/ ‘to blow’ (~ Fi. puhkua, Komi /pušky-/ < *puš-kV-). I could also submit some new wilder etymological hypotheses: e.g. could /pra-/ ‘to fall’ be from *pda- < *pədá- < PU *puďa- ‘id.’ ??

(Edit 2020-09-14: cf. now a hypothesis on reconstructing a first-syllable *ə still in PMo.)


The precise history of the Mordvinic initial clusters would really be a fairly large research project. Before diving into it, a decent typological parallel and a much more tractable case study of natively arising clusters in Uralic seems to be provided by Udmurt. In the literary standard, consonant clusters outside of Russian loanwords are rare but still extant. They seem to have a slightly extended presence in the dialects also. I’ve almost never seen this fact explicitly pointed out however, it came to my full attention accidentally only this March, while reading Michael Geisler’s Vokal-Null-Alternation, Synkope und Akzent in den permischen Sprachen (2005, Veröffentlichungen der Societas Uralo-Altaica 68) which primarily treats V2 syncope.

Interestingly there is a fairly simple phonological rule behind the rise of initial clusters in Udmurt: /ɨ/ is lost in the position CɨCV₂, where V₂ is a full = non-/ɨ/ vowel (though I have no examples with /u/), if the result is a “legitimate” consonant cluster. Nearly all examples I’ve found adhere to this (see below for one clear + one possible exception), and I’ve also found no widely distributed counterexamples in underived roots. In derivatives from or inflected forms of CɨC or CɨCɨ roots, syncope could be expected to be mostly reverted / prevented by analogy of course.

There is more uncertainty in the details of what counts as a “legitimate” consonant cluster, as well as in how widely this rule is reflected in the Udmurt varieties (it is almost surely post-Proto-Udmurt). The data below is mainly from the intersection of Wichmann’s Wotjakischer Wortschatz and Csúcs’ Die Rekonstruktion der permischen Grundsprache, the latter taken into account to ensure I am indeed dealing with inherited Permic material and not recent loanwords / coinages entirely. Dialect abbreviations are G(lazov) (northern), S(arapul) and M(almyž) (central), J(elabuga) (southern), Uržym (MU) and U(fa) (southeastern).

The best-established cluster type is stop + /r/:

  • /dɨr/ ‘probably’ → MU /dɨrak/ ~ /drak/ ‘id.’
  • /kɨrɨ-/ ‘to dig’ → G /krem/ ‘dike’
  • /kɨre(d)ź/ (most dialects) ~ literary & U /kreź/ ‘traditional box zither instrument’
    (similar to the Russian gusli, Finnish kantele, etc.)
  • /pɨr/ ‘always’ → /prak/ (several dialects) ‘id.; straight’
  • /pɨrɨ/ ‘piece’ ~ U /pri/ ‘id.’
  • /tɨr/ ‘full’ → /tɨros/ ~ /tros/ (several dialects) ‘id.; many’

In the Wichmann+Csúcs data, there is also one example each of /pl-/ and /sl-/:

  • /pɨlaśkɨ-/ ‘to bathe’ ~ G MU /plaśkɨ-/
  • /sɨlal/ ‘salt’ ~ G /slal/

So no big surprizes so far, just falling-sonority clusters of a globally common type.

A very different case can be found in ‘rye’: /dźeg/ in literary Udmurt and almost all dialects, but with a bisyllabic byform /dźɨźeg/ ~ /dźiźeg/ in Uržym, which is clearly more original in light of the Komi cognate /rudźɤg/. [2] Also interesting is the Ufa form /źeg/, since in this variety word-initial *dź- normally gives a nonsibilant affricate [ɟʝ-] (= ďj in Wichmann’s transcription). I would hypothesize that this is not a case of *dź+ź losing the second member, but rather *dź+dź losing the first member, already before the lenition *-dź- > /-ź-/ that is found in most dialects of Udmurt; then this new *dź deaffricates in Ufa even initially, which nicely parallels it having also *dž- > /ž-/. [3] Of course most of this might be also simply some sort of haplology, rather than ever going through an actual cluster *dź(d)ź- at all.

Another haplologyish case is ‘eight’. Per /kɨk/ ‘two’, and Komi /kɤkjamɨs/, the pre-syncope Proto-Udmurt form of this must be *kɨkjamɨs (as reconstructed also by Wichmann). Only syncopated forms have been recorded though: /ťamɨs/ in most dialects, a byform with /ťj-/ in Glazov as the only hint that something is up. [4]

Another group still is built up by clusters of the type sibilant + stop/nasal. These demonstrate some “regression to the mean” — they tend to “de-cluster” again across the Udmurt dialects, but this time by epenthesis of an initial vowel: /i/, partly also /ɨ/. This of course leaves the cluster as such intact, but does break it into two different syllables. The Glazov variety appears to fairly consistently retain the elsewhere syncopated original vowel however, though possibly colored to /i/ by palatals. As for where an epenthesized form occurs or not, I see no pattern. Double representation is common, and probably both variants exist widely side by side, and the literary standard and in some cases Wichmann have randomly ended up sampling only one or the other.

  • G /sɨkal/ ~ literary, G J MU U /skal/ ~ S M /iskal/ ~ J MU U /ɨskal/ ‘cow’
  • G /sɨpaj/ ~ U /spaj/ ~ MU /ispaj/ ‘beautiful, good’
  • G /šɨnɨr/, /-ń-/ ~ literary, S M J /iɨr/, MU /ińšɨr/, U /iɨr/ ‘threshing ground’
    ~ Komi /rɨnɨš/ < *rɨŋɨš < *riŋəšə > Finnic *riihi
  • G /śińer/ ~ M /er/, MU /śńer/, U /šńer/ ~ literary, S J /iśńer/, M /iśńɤr/ ~ U /ɨšńer/ ‘broom’
    ~ Komi /jiś/; compound with /ńɤr/ ‘twig, rod’
  • G /śike/ ~ MU /śke/, /ske/ ~ literary, M J /iśke/ ‘so, thus’
    ~ Komi /eśkɤ/ ‘conditional mood particle’

Syncope-then-epenthesis would not be the only possible history for these, but this has support from the fact that both /ɨ/-syncope and pre-sibilant epenthesis can be independently attested, the latter in Russian loanwords such as U /smolla-/ ~ M /ismola-/ ‘to tar’  ← смола ‘tar’, G /šľapa/ ~ MU /iślapa/ ← шляпа ‘hat’, G /štop/ ~ J /ɨštop/ ‘jug’ ← штоф ‘a measure’, J /iźver/ ‘predator’ ← зверь ‘beast’. Note also the lack of epenthesis to **šiľapa, **šɨtop in G.

I assume ‘threshing ground’ has been further metathesized from expected *išnɨr by folk-etymological influence of /in/ (~ J MU /iń/) ‘place’. Why the Ufa form has /m/ I have no idea; that the word had proto-Permic and thus most likely also Proto-Udmurt *-ŋ- does not clarify anything. In ‘broom’ Komi seems to suggest original *(j)iś-, but as this has no further etymology, maybe this is rather a loan from Udmurt with the 2nd part dropped. Also, perhaps the first part of the compound is /śi/ ‘hair, bristle, fibre’ (also occurring with /ɨ/ in MU /ďɨrśɨ/ ‘head hair’)? A broom is indeed a ‘rod with bristles’.

In ‘so, thus’ is the initial vowel is clearly original however, as this comes from Volga Bulghar *ećke > *ićke (> Chuvash /əśke/), and hence requires etymological nativization in G.

The case of ‘cow’ then seems to have relevance beyond Permic even. This has known cognates a bit more widely, but these are also syncopated and partly even epenthesized! /skal/ in Mordvinic, /škal/ ~ /əškal/, /ŭ-/, /u-/ in Mari. UEW treats these as coming from a common protoform *uskalɜ with somewhat arbitrary loss of the initial vowel. However, if I am correct about the Glazov forms in /SIC-/ being mainly archaisms, then this is probably not correct: at least the Mari forms should be considered one or more loans from Udmurt specifically. The Mordvinic form could still have come about by parallel syncope. As commented by Bereczki (1992), retained /a/ in the original 2nd syllable most likely regardless indicates an areal loanword of some unknown origin. But now we seem to know that the shape of this source has probably been more like #sukal or #sikal than #skal or #uskal.

A sixth possible member in this group could be /ɨštɨr/ ~ M /ištɨr/ ‘footrag’ from a *štɨr < *šɨtɨr, as I suspect on grounds of the unmotivated /i/ in the Malmyž variant. This too has a Mari equivalent /štər/ ~ /əštər/ that would again have to be a loan from Udmurt. Furthermore these have been compared by Wichmann [5] even with Finnish (+ Karelian, Ludian, Veps) hattara ‘footrag’ < ? *šattara. The vowel correspondence a ~ /ɨ/ is rare and irregular though, so probably this is in any case not all the way from Proto-Uralic. The Finnic lexeme has also an alternate etymology as a semantic specialization of the homonymous hattara ‘fluff’.


‘Threshing ground’ and my proposal for ‘footrag’ diverge from the other examples by showing syncope from *CɨCɨC. Even these could be seen as kind of regular, once we consider the mechanism more carefully. The basic conditioning mechanism is surely not vowel quality per se, but rather stress. A typical feature across the more central Uralic languages (and also Turkic!) is a pattern where stress is still technically initial by default, but is widely retracted onto “stronger” vowels (long, full, open) later on in the word. In other words: syncope targets specifically pretonic /ɨ/. This would suggest that the immediate precedessor of hypothetic *šnɨr and *štɨr was more specifically iambic *šɨˈŋɨr, [6] *šɨˈtɨr, in contrast to trochaic stress on the more typical *CɨCɨ roots. If so though, it is too early for me to take a guess on what would have been the reason for such unexpected stress placement.


If there has been fairly regular loss of pretonic /ɨ/ in Udmurt, a natural follow-up question is: what about examples where this doesn’t lead to initial consonant clusters?

Two subtypes can be considered. The first would be aphaeresis: we would expect words of the shape *ɨˈCV(C) (where V ≠ ɨ) to again loose the first syllable and to give plain monosyllabic /CV(C)/. There perhaps are too some of these out there, since per Csúcs’ Proto-Permic vocabulary, it seems that no examples of this root shape have survived intact in Udmurt. The only examples of basic word roots with surviving word-initial /ɨ/ are all either monosyllables (/ɨľ/ ‘moist’, /ɨń/ ‘flame’, /ɨm/ ‘mouth’…), have /ɨ/ also in the 2nd syllable (/ɨbɨ-/ ‘to shoot, throw’, /ɨšɨ-/ ‘to be lost’…), or have an intervening consonant cluster that would not work as a legitimate word-initial cluster (/ɨrgon/ ‘copper’). /ɨ/ remains also in the compound /ɨbes/ ‘gate’ (from /ɨb/ ‘field’ [7] + /ɤs/ ‘door’) and a few inflected forms like /ɨč-e/ ‘such’. However, I also do not find any clear enough candidates where a Komi word of the shape /ɨCV/, /uCV/, /ɤCV/, /iĆV/ would lack an Udmurt cognate. The closest are Old Komi /idɤg/ ‘angel’ (whose hypothetical Udmurt cognate would be expected to be *ideg and not **ɨdeg > **deg), and Komi /ɨrɤš/ ‘ale’, a derivative from a lost verb *ɨr- (so perhaps derived only within Komi). The root shape *ɨCV(C) indeed seems to be lacking in Proto-Permic entirely. Would it be too bold to hypothesize that these have actually lost their initial vowel in both Komi and Udmurt?

One speculative etymology of this type could be ‘udder’. The more southwestern Uralic groups all use some sort of a loanword from Indo-European (F. *udar, Mo. *odar, Mari *wåðar). The Permic languages however have an unetymologized /vera/ ~ /vɤra/. If this came from earlier *ɨvɤra, perhaps it could be a part of the same group after all? But the final /a/ looks worrisome (‘udder’ seems to have been a consonant stem all the way from PIE to attested Indo-Iranian reflexes), as does wringing /v/ out of *-ð- < *-d- < *-t-, which normally lenites all the way to zero in Permic.

— The second possibility of difficult-to-detect *ɨ-loss is syncope before a zero medial. Udmurt has occasional bisyllabic vowel clusters of a relatively wide variety, e.g. /ju.a-/ ‘to ask’, /ju.ɨ-/ ~ /jʉ.ɨ-/ ‘to drink’, /ju.o/ ‘I will drink’, /ki.on/ (~ /kijon/) ‘wolf’, /lu.o/ ‘sand’, /na.a-/ ‘to look at’, /śi.ɨ-/ ‘to eat’, /vu.ɨ-/ ~ /vʉ.ɨ-/ ‘to come, come to completion’, /vu.em/ (~ /vujem/) ‘row’ (< *’order’ < *’completion’). There however again seem to be no examples of the shape Cɨ.V — or at least: none surviving as such.

I can also propose at least one actual candidate for this type of syncope, with a bit more confidence than the previous example even. In nouns we would expect this to yield a simple monosyllabic /CV(C)/ root. There are, however, no fully monosyllabic verbs in Udmurt! All have a stem vowel, in the citation form = the infinitive (ending in /-nɨ/) either /ɨ/ or /a/. This marks what at least some (maybe most?) grammars call the inflection class of the verb. [8] So what would happen if a verb of the shape *Cɨ.a- were to be syncopated to *Ca-? It seems to me that a likely outcome would be to pleonastically apply a second instance of class-marking /a/. This gives, I think, at least a good hypothesis for what’s up with the rather strange-looking /na.a-/ ‘to look at’ (attested only in the Besermyan dialect). This has usually been considered a reflex of PU *näkə- ‘to see’, but we would expect the reflex of this root in Permic to be rather *ni- (cf. /ki/ < *kätə ‘hand’) or perhaps *nɨ- (cf. /tɨ/ < *täwə ‘lung’). And my thinking is that this may have been indeed the case still in Proto-Permic, if this Udmurt form comes from earlier *na- < *nɨ.a-.


The attrition of initial consonant clusters from a language’s phonology can be observed in dozens of languages (Sinitic, Tibetic, Indic, Iranian, Armenian, Albanian…). Their introduction natively seems much rarer though; yet this process should be equally important for understanding large-scale shifts in typology. Other examples I know of are however mostly limited to a few way-out-there cases that clearly must have deleted vowels quite rampantly, e.g. Itelmen (Western /kɬfənʲck/ ‘in front’), the Okinawan languages (Amami /ʔkwa/ ‘child’, Ōgami /pstu/ ‘person’), or all the “sesquisyllabic” languages of SE Asia. Udmurt is in contrast a quite pleasantly tractable case, where only modest clusters have arisen in minor amounts. Yet, as the *ST- *SN- section shows, even these can throw up further complications. Some further cross-linguistic comparison with other cases would be interesting… if they first can be found somewhere. I suppose I already have Mordvinic lined up next. Another case I’ve seen reported is Central Dravidian, where the main rule seems to be a Slavic-esque liquid metathesis. But that’s about it for leads I have within the major Eurasian language families that I have the most knowledge of. Probably I would have to look into e.g. minor Niger-Congo or Austronesian languages (subfamilies even?) to find further cases where it is relatively sure that a language has definitely evolved from allowing only simple onsets to allowing initial consonant clusters.

[1] Not that there are any exclusive and pan-Uralic typological features; any kind of a “Uralic typological profile” immediately bleeds further east towards “Ural-Altaic” and/or “Uralo-Siberian”.
[2] This is surely in turn in some fashion from Germanic–Balto-Slavic (and → Finnic) *ruǵʰis, seemingly with either metathesis (PP *rudźeg < *rugedź, somehow via Finnic?) or a new velar suffix (PP *rudź-eg ← *rudź via BSl.?; loss of *-s is expected, cf. *pårś ‘pig’ from IE *porḱos).
[3] The Ufa variety clearly must’ve already had its /ďj-/ at this point too. I even wonder if this could be a hint that this is a retention, that Proto-Udmurt *dź- (< PP *dź- and also *r- / _VĆ, _V{s z}) was actually rather nonsibilant *ďj- or even a stop *ď-. Which could then have interesting further implications too, e.g. should this be perhaps applied even to Proto-Permic? Already for Proto-Udmurt and Proto-Komi, very few instances of *ď can be reconstructed, even fewer still for Proto-Permic, and only word-medially it seems.
[4] What’s also curious is that most varieties do not have a general shift *kj > /ť/, and palatalization seems to have taken place in this case only due to the cluster *kj having been forced to occur within one syllable. In some cases apparent palatalization can be found also medially, e.g. ‘to laugh at’: standard and most varieties /śerekja-/, but Uržym /śereťa-/ ~ /-kť-/ ~ /-ḱ-/. But then this variety also has general *j- > /ď-/ word-initially, and that’s probably also how the form with /-ť-/ comes from in the first place, not as *kj > ḱ > ť. (The /-kť-/ variant also suggests the same.) This pathway looks even clearer when compared with Ufa /śeregďa-/, where presumable intermediate *-kď- has assimilated in voicing regressively, not progressively.
[5] Wichmann, Yrjö. “Etymologisches aus den permischen Sprachen”. — Finnisch-Ugrische Forschungen 12: 128–138.
[6] While *ŋ > *n is regular in a few Udmurt dialects, including in Glazov when adjacent to /ɨ/ (*šɨŋɨr > /šɨnɨr/), it is not widespread enough to seem to me like the main explanation for why no /ŋ/ remains anywhere else either. What strikes me as more likely is that syncopated **šŋɨr was immediately adjusted to *šnɨr due to Udmurt not tolerating word-initial /ŋ/.
[7] A cranberry morpheme in attested Udmurt, but an independent lexeme in Komi, and per also a second derivative /ubo/ ~ /ɨbo/ ‘beet’ in Udmurt, this probably still existed in Proto-Udmurt too, perhaps up to the time of the syncope rule.
[8] Unlike e.g. Hungarian, or typical older Indo-European languages, this contrast does not affect the choice of endings as such, only the stem morphotax. At a pinch, consonant-initial suffixes are added to a vowel stem, either /CVCɨ-/ or /CVCa-/; vowel-initial suffixes to a consonant stem, either /CVC-/ or /CVCal-/. It would be possible to treat the contrast also as one between underlying consonant stems and underlying vowel stems (/CVC-/ versus /CVCa-/), with /ɨ/ and /l/ inserted as prop vowel and prop consonant when required, though these are not anything like general morphophonological rules in Udmurt (the default prop consonant is /j/). — /ɨ/ is also syncopated in some positions in some dialects, roughly according to what medial consonant clusters Udmurt tolerates in general. This creates a more “natural” look for verb inflection (and in these dialects we definitely should speak of consonant-stem verbs). As a tangent: contrary to what most reconstructions claim, I however do think that this is indeed syncope, and more consistent vowel-stem inflection is the Proto-Udmurt and probably also Proto-Permic state of affairs. If dialect forms like /karnɨ/ ‘to do’, /punnɨ/ ‘to plait’ (even if nicely paralleled by Komi /karnɨ/, /pɨnnɨ/) were to be soundlawful precedessors of more widespread forms like /karɨnɨ/, /punɨnɨ/… why do words like /kɨrnɨ(d)ž/ ‘raven’, /tunne/ ‘today’ then not also turn into ˣ/kɨrɨnɨ(d)ž/, ˣ/tunɨne/? At most vowel insertion could be analogical, and this then fails to explain why the distribution of /CVC-/ versus /CVCɨ-/ in the “consonant-stem dialects” is quite consistently phonologically conditioned.

Tagged with: , , , , , ,
Posted in Reconstruction
31 comments on “Native initial clusters in Udmurt
  1. David Marjanović says:

    One way of introducing initial consonant clusters that doesn’t apply to Uralic is prefixing with vowel-free prefixes. That’s productive throughout Slavic, and in most of Upper German where be-, ge- and, when followed by r-, also zer- lose their vowels and create initial clusters that Standard German lacks. Simplification has also set in: be- and ge- simply disappear when a plosive or affricate follows.

    a bisyllabic byform /dźɨźeg/ ~ /dźɨźeg/ in Uržym

    Please try again :-)

    This would suggest that the immediate precedessor of hypothetic *šnɨr and *štɨr was more specifically iambic *šɨˈŋɨr, [6] *šɨˈtɨr, in contrast to trochaic stress on the more typical *CɨCɨ roots. If so though, it is too early for me to take a guess on what would have been the reason for such unexpected stress placement.

    Syllable weight? The stress goes on the first syllable unless the second is closed?

    But the final /a/ looks worrisome (‘udder’ seems to have been a consonant stem all the way from PIE to attested Indo-Iranian reflexes), as does wringing /v/ out of *-ð- < *-d- < *-t-, which normally lenites all the way to zero in Permic.

    From just looking at these, I’d rather wring the /r/ out of *d, perhaps by as- & dissimilation: *-dar > *rar > *ra. Also, my guess for the /v/ is *u > *wV, as must be assumed for Mari anyway.

    (…Plus, I think there are some Anatolian and maybe Tocharian hints that the PIE zero-grade of *w wasn’t [u], but [wə], but at best that needs more research. [u] is a safe reconstruction from Proto-Indo-Iranian onwards at the very least.)

    • j. says:

      One way of introducing initial consonant clusters that doesn’t apply to Uralic…

      Introducing them in particular words, sure, introducing them into the phonology altogether probably no. Also aren’t the Slavic ones just a special case of yer-loss? в- к= с- are from *vŭ- *kŭ= *sŭ- at least.

      I’d rather wring the /r/ out of *d

      *d > /r/ is possible in general of course, but looks to me like it would require some kind of an intermediary. Also calling this unetymologized just going off of Csúcs turns out to have been hasty, Lytkin’s older etymological dictionary of Komi refers to a comparison with Mansi that could point to something like *(w)ükɜrɜ…

      I’m honestly not up to speed on what’s the current thinking on the “schwebeablauted” Mari reflex, but I’ll stay on standby for my colleague S. Holopainen’s PhD, to be defended in December last I heard, which will be exactly a re-review of the Indo-Iranian loanwords in Uralic.

      — One of the /dźɨdźeg/’s now corrected to have /i/. I’d love it if this and /ɨ/ were a bit more distinct visually really. In my private notes I by now usually stick to the translitteration convention that has y for this. (Also ö for the vowel I transcribe in IPA as /ɤ/: this really is a similar central vowel, but a nonreduced one which would be misleading to write as /ə/, especially in Udmurt where some dialects even also have [ə] as one realization of /ɨ/.)

      • David Marjanović says:

        Introducing them in particular words, sure, introducing them into the phonology altogether probably no.

        Oh yes. Word-initially, /gf gʃ gv gh bs tsr/ don’t occur in German otherwise (and yes, /gh/ stays distinct from /k/; I’m not mentioning /gs/, though, because it’s probably not distinct from /ks/ which occurs in all loans with x-). My dialect has them e.g. in gefahren, geschehen, gewinnen, gehalten, besonders, zerreißen.

        Also aren’t the Slavic ones just a special case of yer-loss?

        Originally yes, but they’re still productive, so I mentioned them.

        • j. says:

          I mean that the occurrence of any clusters at all was not introduced into German via prefixation. The occurrence of specific types of clusters is clearly a finer level of detail.

          (Why doesn’t German have native /gv-/ < *gw-, anyway? I mean I know PIE *gʷʰ- goes instead to *w- in PGmc, but this gap seems unmotivated. Areally Latin has the same, Celtic and Balto-Slavic however do not.)

          • David Marjanović says:

            I should have mentioned that ge- is productive (in the dialects and in Viennese mesolect) in that it is added to English loans.

            PIE *gʷʰ- goes instead to *w- in PGmc

            What other source would there be? Maybe there’s a post-Grimm Celtic loan with PIE *gʷʰ lurking somewhere in a High German dialect, but I don’t know of any candidates.

            Synchronically the gap is unmotivated as far as I can tell. But synchronic gaps that only have diachronic motivations are common the world over.

            • j. says:

              Not that I had one in mind, I was only wondering about the motivation of the PGmc change, but now that you mention it — the earliest prefix univerbations are Proto-Germanic already — mostly *fra-, *f(a)-, but apparently an analysis of give etc. being < *g(a)-eba- < *kom-h₁ep- has been proposed also.

              For that matter, in later German, do even loans like Guano, Guanin, Gouache get /gv-/? Sources that I can find claim “/gu̯a-/” which sounds nonsensical.

              • David Marjanović says:

                For me, the first two get merciless spelling-pronunciations with /uː.aː/ (i.e. [u.aː ~ ʊ.aː]). The last doesn’t count, because I actually speak French and don’t have the word in my active vocabulary in any language. Likewise, unstressed realizations of /uː/ occur in Leguan “iguana” and all lingu- words.

                I don’t actually know what happens north of the White-Sausage Equator, where (even secondarily) stressed syllables that begin with a vowel phoneme get [ʔ] as an obligatory onset. After i, this is avoided there by reading it as /j/ = [j]. After u, [ʔ] is inserted e.g. in aktuell and eventuell, because there is no /w/. I think Guano gets [ʔ] before its stressed /aː/, while Guanin, where the /a/ is fully unstressed, just has /uːa/, and both words have three syllables as they do for me.

                the motivation of the PGmc change

                Actually three changes: to PGmc. *b, *g and *w, the last two conditioned by the accent (I can’t find the paper right now), the first perhaps conditioned by a following *e (if the bane is due to analogical leveling).

                • David Marjanović says:

                  I don’t actually know what happens

                  but I guarantee that /gv/ for gu never happens, neither word-initially nor otherwise, unlike in all of Slavic, Baltic and at least Swedish. Intramorphemic /gv/ seems to be inconceivable; also, /kv/ is signposted by the special letter q, while there’s none to indicate /gv/, and spelling is important because all spoken renderings of Standard German are either current or 100-year-old spelling-pronunciations. Schriftsprache eben.

      • David Marjanović says:

        a re-review of the Indo-Iranian loanwords in Uralic

        Ooh, awesome.

      • Blasius B. Blasebalg says:

        Just two remarks:

        * Slavic initial clusters due to prefixes:
        Vowel-less prefixes in modern Russian can produces clusters that are outlawed otherwise, such as in “vz-voz” (‘driving up’). As far as I understand, there can’t be a Russian word like **avzva.
        The initial consonants would have to be analyzed as extrasyllabic I guess.
        Of course, the existence of such words is reinforced by analogy with words employing the same prefix without producing illicit clusters, such as “vsyat'”.

        * re German prefixes:
        Apart from loans younger than 500 years, standard German seems to have only two
        different stems starting with “gn-“: “Gnade” (grace) and “Gneis” (gneiss).
        The former is clearly a lat MHG contraction with prefix “Ge-“,
        and the latter exists in doublets “gn-” and “gVn-” since its first attestation.
        Wherever “Gneiss” came from, it seems that the cluster was so strange that it was sometimes ‘repaired’
        to gVn- to make it fit better.

        • David Marjanović says:

          Gleis ~ Geleise “rails of a railway” shows variation even today, and I mean within the written standard; glauben is an early contraction.

    • Howl says:

      Saami also has a lot of words that start with sC. For example: skoppē ‘bowl’. Neither PU *kuppa nor Late Latin cuppa (via Germanic e.g. Swedish kopp) have initial s-. Does anyone know what’s the story there?

      >(…Plus, I think there are some Anatolian and maybe Tocharian hints that the PIE zero-grade of *w wasn’t [u], but [wə], but at best that needs more research. [u] is a safe reconstruction from Proto-Indo-Iranian onwards at the very least.)

      I don’t know about Hittite. I guess it depends on how you interpret the way Hittites spelled their words in cuneiform. But Tocharian reflects Proto-Tocharian Cəy/Cəw (TA/TB Ci/Cu) as zero grade of PIE *Cey/*Cew in ablauting stems, but Cʲə/Cə (TA C(ʲ)ä/ø TB C(ʲ)á/ä/ø) in non-ablauting stems. For initial PIE *y/*w Tocharian always reflects PT *yə/*wə.

      • j. says:

        Unexpected clusters in Sami seem to be mostly from the paleo-Arctic substrate(s). Ante Aikio’s “An Essay on Substrate Studies…” mentions as likely cases in Northern Sami skuogga ‘baleen’, skuoggir ‘ethmoid bone’, skuolfi ‘owl’ and the placenames/toponyms Skáippas, Skázá, Skiehččaras (apparently from a root probably meaning ‘watershed’), Skielda, Skielgan, Skoapmit, Skoavvalat, Skuohki, Skuorča, Snjierttet, Skirvi, Skiessvuotna. He also does not fail to note that these examples involve predominantly /sk/.

        • j. says:

          Skiehčča-, “de-Samifiable” as *skeččə- incidentally vaguely reminds me of PIE *skey- ‘to divide’, but this could be accidental, especially when there seem to be no **st- **sp- to go in parallel. *TR- clusters though we do not even expect to see, since Northern Sami strips these to just *R- also in older Scandinavian loans such as ráhput ‘to scrape (together)’, lávžá ‘gadfly’ (ON kleggi < *klaggjan-).

        • j. says:

          And I wonder now if the one example with snj- should be presumed to come from *sŋ-. This would go splendidly together with šnjierrá ‘mouse’ which looks like some kind of a bizarro reflex of PU *šiŋərə…

        • Crom Daba says:

          Is this also the language of the ship Germanic language then?

          • j. says:

            I don’t think I’ve heard of that one, but it kinda seems to me that you already need ships to make your way to Finnmark. Attestation in Gothic doesn’t really suggest northern substrate origin either.

          • David Marjanović says:

            I’ve seen ship derived from Latin scyphus.

            (/p/ for ph probably checks out, because there’s a PILIPPHVS on a wall in Pompeii somewhere. The first FILIPPVS only shows up in the following century.)

  2. Crom Daba says:

    You can also check Nugteren’s Mongolic phonology and the Qinghai-Gansu languages (2011) for some exploratory stuff on initial clusters in Mongolic.

  3. Alexander Savelyev says:

    I don’t think the Glazov Udmurt forms in /SIC-/ are archaisms.
    It is not only ‘so, thus’ that goes back to a form with an initial /ISC-/ (Old Chuvash *iśkä).
    ‘Beautiful, good’ is borrowed from Tatar ïspaj ‘tidy, neat’ .
    ‘Footrag’: this word is also found in Tatar (ïštïr) and may be of ultimate Turkic origin. Note the Common Turkic compound *ič ‘inside’ + *ētük ‘boot’, which yields Tatar *(ĭ)č-itĭk ‘soft leather boots’, also borrowed as Rus. ичетиги, чедыги, ичиги ‘ibid.’ The cognate of *ič ‘inside, interior’ in the Bulgharic branch is Chuvash ï̆š. It looks like an Volga Bulghar compound with *Iš as the first element was borrowed into Tatar, Udmurt, and Mari. The weak point is that the alleged *-tVr remains etymologically obscure.
    Regarding ‘cow’, there is a trend to the *ISTA- > *STA- development shared by different languages of Volga-Kamia. Cf. cases such as Mari *ĭške > ške ‘self’, similar examples are abundant in Chuvash and Tatar dialects.
    As you pointed out, ‘broom’ has Komi jiś as an external parallel, and ‘threshing ground’ goes back to a vowel-initial form. So, it does not seem there is a single reliable example with an initial *S-, after all.

    • j. says:

      Thanks, good Turkic addenda. Does ïspaj have itself an etymology though? I don’t recall seeing /sp/ in native Turkic vocabulary.

      For ‘footrag’, if this contains *ič- ‘inside’, a decent candidate for the 2nd member could be Permic *dära ‘linen’ (or maybe rather some early parallel derivative of this with pre-Permic umlaut of *ä to *ɨ — I see no easy way to get rid of the *ä and *a just by compounding). But we would of course need a variety where both of these lexemes exist(ed) as independent words.

      On the other hand, I think this would still leave in the air why does Glazov then have a different representation from the other Udmurt varieties. At minimum this is clearly not a productive phonotactic constraint, in light of Russian loanwords. The existence of an areal trend for *ISTA- > STA- also seems like a strictly weaker parallel than the existence of an apparently quite regular syncope rule for *ɨ; and note that I did not propose *ɨšnɨr as a precedessor of šɨnɨr, only as a precedessor of the iNšɨr forms. (Maybe we could also consider direct compounding to *in-šɨŋɨr and irregular “haplological” loss of the *ŋ.)

  4. Alexander Savelyev says:

    >>Does ïspaj have itself an etymology though? I don’t recall seeing /sp/ in native Turkic vocabulary.

    Indeed, it does not look native. Axmetjanov claims that ïspaj ‘handsome & stylish, tidy’ and sïbaj ‘horseman’ are of the same origin, both going back to Persian sepâhi ‘cavalryman, knight’. If the same word indeed, the two variants should be rather old as both can be found far away from the Volga region. But the *Is-initial variant looks specifically Kipchak Turkic (cf. e.g. Kumyk isbajï ‘handsome; dandy’). The Udmurt meaning is close to Tatar, Kumyk ‘handsome, tidy’, not to Tatar ‘horseman’, anyway. Then, we also have Northwestern Mari ə̑spaj ’good, beautiful’, probably as a separate loan from Tatar ïspaj.

    >> For ‘footrag’, if this contains *ič- ‘inside’, a decent candidate for the 2nd member could be Permic *dära ‘linen’

    I don’t know, but one consideration is that I’m not aware of any clothes-related terms that would be borrowed from Finno-Ugric into Tatar, while there are numerous loans in the opposite direction. It is not a real proof of course, but the pan-Tatar distribution of this word makes me expect Turkic > Udmurt & Mari by default, rather than vice versa.

    UPD …Oh, I see another proposal by Axmetjanov who derives this word from Chuvash ï̆š ‘inside, interior’ + tir ‘leather’. This looks perfect to me. Then I think the direction was Chuvash > Tatar/Udmurt/Mari.

    >> At minimum this is clearly not a productive phonotactic constraint, in light of Russian loanwords.
    Isn’t it (a) *ISC > Glazov and some other varieties *SC > Glazov *SIC- in old pre-Turkic and Turkic loans, but (b) a different picture in recent Rus. loans – because Glazov Udmurt does not fully adapt them? I keep the Chuvash situation in mind, where ‘rug’ is štav or ə̑štav depending on I don’t know what; I’d not be surprised if such semi-adapted forms vary form speaker to speaker, or even within idiolects.

    • j. says:

      *ïš+tir sounds indeed fairly good. Not that *dära is native in Permic though, per Metsäranta’s recent PhD thesis it’s from Iranian √dar- ‘to hold; > wear’ (> Oss. daræs ‘clothes’). Depending on the semantic range, maybe this Iranian root would be a possible etymology even for PTk *teri ‘skin’?

      By the way, one further example of the same Glazov / rest correspondence in a Turkic loanword: iśka-vɨn ~ G śika-vɨn, J śka-vɨn ‘relative’ (← Chuv. per Wichmann, though he doesn’t give a source form); with the initial vowel reflected also in Glazov in warm-iśka ‘brother-in-law’.

      Upon re-checking for potential counterexamples, you could be right about positing for Glazov an early re-epenthesis to SɨT-. Maybe this would also help with the difficult case of ‘threshing ground’. It’s also interesting to hear that Chuvash and Tatar dialects have the same variation between IST- ~ ST-, maybe even some of the Udmurt dialect variation is due to this instead of native-grown.

      The part that I still find it suspicious is why we would have so many examples of original *ISTA- but seemingly none of *SITA-. Some of it can be probably attributed to the general rarity of medial voiceless stops in Permic though (solely from original geminates in native lexicon).

  5. Alexander Savelyev says:

    >> maybe this Iranian root would be a possible etymology even for PTk *teri ‘skin’?

    We have a near-high e (= ẹ) in PTk (> i in Chuvash and Yakut), which is hardly compatible with the Iranian *a. So, I don’t think the Iranian connection is provable.

    >> By the way, one further example of the same Glazov / rest correspondence in a Turkic loanword: iśka-vɨn ~ G śika-vɨn, J śka-vɨn ‘relative’ (← Chuv. per Wichmann, though he doesn’t give a source form); with the initial vowel reflected also in Glazov in warm-iśka ‘brother-in-law’.

    This must be, ultimately, the same word that we have recently discussed on Academia (Chuvash əśke-j ‘brother-in-law’). My understanding is that Permic *ekse(j) is an early loan from Ossetian through Volga Bulghar, and Udm. dial. iśka is a late re-borrowing of the root from a source close to the contemporary Chuvash (Chuvash dialects that were spoken along the Kama River in the 14-17th centuries).

    >> The part that I still find it suspicious is why we would have so many examples of original *ISTA- but seemingly none of *SITA-. Some of it can be probably attributed to the general rarity of medial voiceless stops in Permic though (solely from original geminates in native lexicon).

    I would not say the *IST-initial roots are truly “original”. Some are borrowed from Turkic (as a major source, but probably also from other sources) where *ISTV- is a usual way to adapt non-native *TSV- and *STV-initial roots. It’s the case for iśka ‘brother-in-law’ and ïspaj ‘beautiful’ (assuming that the source was Middle Persian spāh- rather than New Persian sepâh). Then, we have compounds of the structure (H)IS + CV-: in addition to ïš+tir, note *iśkä ‘so, thus’ (Chuvash discourse particle *iś ? < Persian hič + Chuvash discourse particle -kä). This is also the case for ‘broom’ if the Komi word structure is original; and I have a strong suspicion that ‘cow’ is a borrowed compound, too.

  6. j. says:

    I mean by “original” just the shape preceding CC- in Udmurt of course. If a *SɨTA- had arisen from whatever source, it probably should have given at least in part STA-. The only counterexample-ish cases with wide dialect distribution might be the regular derivatives *sɨŋ-a(l)- ‘to comb’, sɨn-omɨ- ‘to rust’ (no **(I)sna(l)-, **(I)snomɨ-; indeed similarly also pɨr-a(l)-, **pral- ‘to enter’).

    Speaking of PP *eksej, we can perhaps drop the parentheses after all: Ponaryadov (2019) actually has a clever proposal to take *-ej > *-ɨ in Komi as a change parallel to *-eś > *-ɨś (which is clearly regular, though I don’t recall seeing it identified before either).

    and I have a strong suspicion that ‘cow’ is a borrowed compound, too

    Intriguing… Would that be with some sort of a reflex of IIr. *gau- as the 2nd member?

    • Alexander Savelyev says:

      >> If a *SɨTA- had arisen from whatever source, it probably should have given at least in part STA-.

      I agree, and I think the lack of *SITA-initial roots is coincidental, owing to the rarity of this sequence not only in the Permic lineage, but also in the major sources of its non-native vocabulary. For instance, I am working right now with an audio recording of a Viryal Chuvash fairytale, and in connection with our discussion I’ve noticed that the word šə̑dar- ‘to pierce, drill, make holes’ is realized as [štar-] in this particular variety. But it is a very rare example of *SITA- in Chuvash, and this word was not borrowed into Permic.

      >> Speaking of PP *eksej, we can perhaps drop the parentheses after all

      I see.

      >> Would that be with some sort of a reflex of IIr. *gau- as the 2nd member?

      I am thinking of a compound based on the Dargwa terms ‘ox’+’cow’: unc+q̇äl (Akusha Dargwa), uc+q̇ʷäl (Itsari Dargwa), us+q̇́ül (Kubachi Dargwa), etc. While this connection may look rather exotic (and is, of course, a topic for further research), there seem to be some other lexical isoglosses that connect the Volga-Kama area to the Northern Caucasus. At least, when I reported some of my guesses to archaeologists with a focus on the history of Volga-Kamia in the early and mid-1st millennium AD, their feedback was very positive. Only thereafter, I found that an almost identical etymology of the Volgaic term for ‘cow’ was proposed as early as by Munkácsi in “Árja és kaukázusi elemek a finn–magyar nyelvekben” (1901: 619-620).

  7. Christopher Culver says:

    “A sixth possible member in this group could be ɨštɨr ~ M ištɨr ‘footrag’ from a *štɨr Mari borrowing and explaining the Udmurt word though the following Uralic etymology:

    Besides the noun ɨštɨr ‘footrag’, there is also an Udmurt verb i̮šti̮ri̮ni̮ ‘волочить, тащить’. Footrags are traditionally pulled and wound around the foot. If we assume for Udmurt the same polysemy found in Russian волочить = both ‘to drag’ and ‘to draw something out’, then could this Udmurt noun and this verb not be cognate with Proto-Mari šü̆ẟərem ‘to drag’, assuming a Proto-Uralic root *šüttər-/šittər-?

    • Christopher Culver says:

      Sorry, I see that Malmyzh dialect of Mari points instead to PMari *sü̆ẟərem for the verb ‘to drag’, and this initial consonantism is not compatible with the Udmurt data. However, another Mari word ‘spindle; axis’ is to be reconstructed as *šü̆ẟər with initial š (which, incidentally, means that it doesn’t fit with Udmurt zu as proposed by the UEW and upheld in Bereczki’s etymological dictionary). I think that word would still match quite well semantically and phonetically with a reconstructed Udmurt ‘to wrap around’.

      Incidentally, Mari šüdə̑s ‘band or hoop placed around something’ shows some irregularities dialectally that prevent a straightforward reconstruction of PMari *šü̆də̑s: full ü in the dialect of Bol’šaja Šija instead of the expected reduced vowel, and unexpected illabial ə in Upša and Northwestern Mari. This makes me wonder if the word can be viewed as a borrowing of the same proposed Permian root ‘to wind around’ prior to metathesis. (However, due to the initial š- this would require entirely disconnecting the Mari word from the cognate set in the UEW 762–3.)

      • j. says:

        I believe you lost some material to overeager HTML tag parsing in the first message (as always the trick is to use the HTML entities &gt; and &lt; instead of the raw symbols > <), but sure, sounds intriguing.

        If šüδərem is not derived from a common stem with šüδəš, I suppose that leaves the semantic thread too thin to compare this either with Finnic *sito-, Mordvinic *sodəms ‘to tie’, even if it would be a decent phonological fit otherwise. On the Udmurt comparison though, I have some thoughts on *š- in Permic possibly being sometimes secondary which could help here; I’ll have to think about it. Do you know what the dialect distribution of i̮šti̮ri̮ni̮ might be (Wotjakischer Wortschatz doesn’t have it)?

        • Christopher Culver says:

          i̮šti̮ri̮ni̮ ‘волочить, тащить’ is in the green Soviet Udmurt dictionary, but only marked ‘диал.’ with no further information. i̮šti̮rti̮ni̮ (note that this form has another t) ‘перетащить’ is in Munkacsi’s Udmurt dictionary, page 43.

          • Alexander Savelyev says:

            Mari šüδъrem | šəδъrem < Chuvash sə°də°r- 'to drag';
            Udmurt ïštïrïnï < Chuvash sə°də°r- or its Tatar counterpart ə°stər- 'to drag'.
            So, the Uralic comparison is not valid, nor is the connection with the term for 'footrag'.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Enter your email address to follow this blog and receive notifications of new posts by email.

%d bloggers like this: