Mari *a versus *o: some preliminary notes

A little thing that has been vexing me pretty much ever since I first for some fateful reason decided to take a look at Uralic vowel history is the distribution of *a versus *o in Mari.

These two Proto-Mari vowels are only distinguished in Hill (Western) Mari, while Meadow (Eastern) Mari merges both to /o/. Or — is it a merger after all, or perhaps instead a split? The puzzle is that in the wider etymological context, no real conditioning appears. Both vowels seem to go back to similar Proto-Uralic vowel combinations (mainly: *a_ə, *a_a, *ë_a, and *o_ə). E.g. *kala “fish” → *kol, but *pala “bit” → *pal (EDIT: misquoted, see below; cf. instead *jalka “foot” → /jal/); or *poŋə “bosom” → *poŋəš, but *joŋsə “bow” → *jaŋəš. The rest is just about as ugly.

I finally seem to have hit some reliable results though. Namely, some partial soundlaws; conditioned by the Proto-Mari initial consonant.

  1. After *w, only *a appears. This is completely regular, based on about 20 examples. An impressive number for a specific two-phoneme combination, at least in a language like Mari that does not exactly abound in archaic inherited vocabulary.
  2. In the absense of an initial consonant, *o appears. This is mostly regular (both in good PU words such as *apta- → *opte- “to bark”, and areal ones such as *ožə “stallion”, which only has relatives in Permic ). I’ve gotten three exceptions together, though: *anće- “to blink”, *aškəl “step”, *ažnə “early”. While this is not too much to generalize upon, all three seem to contain a following palatal consonant (which are frequently depalatalized in Mari, but cf. Komi: /addźɨ-/, /voćkol/, /vodź/). Perhaps this, then, is further evidence for an allophonic distinction between *[a] and *[å] (IPA [ɑ] vs. [ɒ], if you will) in Proto-Uralic.
  3. After *p, *a appears. This is nicely parallel to the case of *w, and again mostly regular (*pontə → *pandə “stick”, *par(ə)ma → *parma “gadfly” etc.) but with three exceptions. Here the conditioning seems in a way the inverse of the previous case: all cases appear before velar consonants, namely *pokte- “to hunt”, *poŋgə “mushroom”, and the above-mentioned *poŋəš “bosom”. I have no idea if this should be interpreted as a backing effect, or perhaps as a fronting effect of dental/(post)alveolar/palatal medials. (No labial medials appear here, since we already have one initially; a common phonotactic restriction in the Uralic languages, if not wider across the world.)

It’s a start, though I can already see this won’t be the entire solution… I’ve also checked the remaining sonorants (*m, *n, *l, *r, *j), and lamentably, no clear pattern emerges yet for any of these. *m failing to abide to the pattern set by the other two labials is a particular bummer.

If the previous observations are anything to go by, the preceding consonants probably need to be taken into account too. E.g. a zero medial appears to condition *o (examples include *koe- “to dig”, *moa- “to find”, *roe- “to hack”, *šoe “duck”) — or, to be exact, a unique correspondence Hill /o/ ~ Meadow /u/, that however seems best explained by a raising *o → *u in the latter.

Also, since this all seems to be basically a Mari-internal phenomenon after all (and I’m definitely taking the results so far as a licence to treat *a and *o as interchangable for Uralic reconstruction), getting the full story will probably require looking into Proto-Mari words for which no Uralic cognates are known. Late loanwords will then probably interfere though… so finishing this job will basically have to wait for having a Mari etymological dictionary to consult. I’ve no interest in accidentally basing conclusions on what would turn out to be recent Russian loanwords for all I know…

  1. The distribution of Mari *å vs. *o is indeed a difficult nut to crack. I’ve come up with a somewhat different formulation. It seems that *o appears under two conditions:

    1) In CV(V)-stems (i.e., finally or before a hiatus)
    2) Before a velar, *m or *l if the word does not begin with a glide (*w or *j)

    These conditions also account for a large majority of Mari words that lack an Uralic etymology, but there are a few cases that show that the distinction *å : *o must have been phonemic in Proto-Mari already and cannot be a result of later split in W and Nw Mari.

    Some exceptions that do not fall under these conditions can be explained:
    – W amaš ‘shelter’ – this had *d which became deleted (cf. Fi uudin etc.)
    – W šaktem ‘I play (an instrument)’ (this had *j which became deleted: PU *śoji-kta- > Fi soitta-)
    – W waktam ‘I debark’ – this shows a irregular metathesis *tk > *kt (cf. Inari Saami vyetkiđ ‘debark’, MdE/M vatkams ‘peel, skin, beat’)

    Notably, both *-ow- and *-uw- seem to have given PMari *o after *k-:
    – W kon ‘lye’ < *kuwni (SaaN gutna ‘ashes’)
    – W kolam ‘I hear’ < *kuwli- (Fi kuule-, SaaN gulla-)
    – W kož ‘spruce’ < *kowsi (Fi kuuse, SaaN guossa)

    A few unclear cases remain, listed below.

    Unexpected *å:
    – W čaŋgem ‘I notch (building logs), set up (corner posts of a log house)’ (< *čaŋa-, cf. MdE čavo- 'hit', KhE čɔɣ- 'kick', NenT taŋa- 'rub')
    – W šalɣem ‘I stand’ (< *sa/olka-, cf. Hung áll)

    Unexpected *o:
    – W šor ‘dirt’ (but a in šaram ‘I shit’!) < *śara- (Hung szarik)
    – W kot ‘year, time’ < *kodwa (Fi kotva, SaaI kuáđfi)
    – W opta ‘barks’ < *a/opta- (Komi ut-, KhN ɔpǝt-)
    – W šož ‘barley’ < *čaši (Komi ćužj-)
    – W toreš ‘crosswise’ < *toras (SaaN doares-)
    – Nw toštam ‘I dare’ < *tošti- (SaaN duosta-, Fi tohti-)
    – W koδem ‘I leave’, koδam ‘I remain’ < *kaďa-, *kaďa-w- (SaaN guođđi-, guđđ-o-)

    • Juho says:

      2) Before a velar, *m or *l if the word does not begin with a glide (*w or *j)

      /oš(ə)/ “white” you’d then presume to derive from *akša as indicated by Mokša /akša/, not *aška as indicated by Finnic *hahka “eider”? I don’t think we have precedents for a metathesis *kš → F *hk though, while there does seems to be a clear case of *šk → Mo. *kš in *pukšə “thigh” (← *počka ← *pončə-ka). Also, note /a/ in /našmə/ ← *ńëkćəma “gills”, which would require assuming an earlier loss (due to the loss of the middle syllable?)

      It would be tempting to generalize further the o-before-velar rule I noted for *p-initial words, but it does not seem to be entirely universal.

      Some exceptions that do not fall under these conditions can be explained:
      – W amaš ‘shelter’ – this had *d which became deleted (cf. Fi uudin etc.)
      – W šaktem ‘I play (an instrument)’ (this had *j which became deleted: PU *śoji-kta- > Fi soitta-)

      At first glance, if these are to be projected to the pre-Proto-Mari stage, it would seem that the hiatus reflex should be then predicted here, as in *śoð(ka) → /šoe/ “duck”. (The first comparision incidentally also torpedoes my suggestion that word-initially → /o/.)

      Notably, both *-ow- and *-uw- seem to have given PMari *o after *k-:
      – W kon ‘lye’ < *kuwni (SaaN gutna ‘ashes’)
      – W kolam ‘I hear’ < *kuwli- (Fi kuule-, SaaN gulla-)
      – W kož ‘spruce’ < *kowsi (Fi kuuse, SaaN guossa)

      The IE loan etymology for Samic *kunë does not seem to be compatible with a reconstruction with *-uw- though. Also Komi /kun/ suggests a non-close vowel; if this and Mari are split as an etymon of their own, something like *kanɜ seems possible. Even something like *kaxənɜ could be posited — though most probably this is instead a separate loan from a different IE source.

      The 2nd can just as well be included under the conditioning for /l/ then, which leaves the 3rd isolated.

      Additional exceptions to your rules include at least
      /a/: /čakata/ “thick” (~ Komi /čɤk/ = PU *čokkə?), /laksə/ “pit” (~ F. laakso etc? though probably a late word on account of the /s/), /nalə/ “sap” (~ Udm. /ńɤl/ “phloem”, if not rather among the cognates of. F. nila), /šakta-/ “to shift” (~ Erzya /śuvtńe-/; though might be, like “to play”, from something like *śakə-kta-)
      /o/: /owə/ “father-in-law” (← PU *ëppə, or are you counting this as a velar?), /ožə/ “stallion” (~ Komi, Udm. /už/), /joɣe-/ “to flow” (~ F. joki etc.) despite /j-/, /jož/ “remaining snow” (~ Komi, Udm. /juž/ “hard snow”), /šoðə/ “lung” (~ Komi /šɤĺ/ “gills”, ← PU *šoðʲə?)

      • Thanks for the comments! I’ll give some comments on the individual etymologies you mentioned below. Not all the exceptions seem convincing to me.

        Mari *oš(ə) can hardly be cognate with Md *akšə; there has been no change *kš > *š in Mari. These would rather seem to be parallel borrowings from some third source.

        Md *pukšǝ ‘thigh, buttock, thick meat’, at any rate, cannot be cognate with Fi potka due to its irregular cluster. I would rather suggest that it is a loan from Proto-Aryan *pakša- > Sanskrit pakṣá- ‘wing, flank, side’.

        In Mari *nåšmǝ ‘palate, gill’ the loss of *k must be old; at least a three-consonant cluster *-kšm- would clearly have been impossible in proto-Mari. It is also worth noting that Mari consonant-stem derivatives exhibit a general morphophonological rule k > Ø /_CC (as in *jüštǝ ‘cold’ ← *jükše- ‘get cold’, etc.).

        The words *å(ǝ)makš and *šå(ǝ)kte- might perhaps still have had the hiatus in Proto-Mari. There are cases of Proto-Mari *CVǝC- where the hiatus is preserved in only some marginal eastern subdialects (elsewhere *CVǝC- > *CVC-).

        Saami *kunë ‘ashes’ is poorly compatible with Indo-European *kenis- / *konis- due to its vocalism, so this might simply be a wrong etymology. It is true, Komi has an irregular vowel (kun instead of *kïn), but there are also other cases of failed *u > *ï in Permic. On the other hand, if Saami *kunë ‘ashes’ is separated from the Permic and Mari words for ‘lye’, then it is tempting to connect them with Fi kuona instead.

        The Uralic etymology of Mari *čåkata is probably wrong – none of the compared forms show matching vowels.

        MariW laksə must be a new word due to its -s-, as you say. The comparison to Fi la(a)kso is hardly worth much. UEW even throws Saami *leakšā ‘boggy valley on tundra’ in the same bag, which defies all rules of Saami historical phonology.

        Mari *nålǝ ‘sap’ does not seem to work well with this picture, but its cognates in other branches show much irregularity, too.

        Mari *šåkta- ‘sift’ might also be a new word. At least I’d not bet much for the suggested Mari-Mordvin comparison because the Mordvin word is internally irregular.

        Mari *owǝ ‘father-in-law’ is obscure. Perhaps *o is regular before *w; compare *kompa ‘wave’ > PMari *kowǝ > E wüt-kowo (W ko ~ koe shows irregular loss of *w). The problem remains that the assumed development PU *ïppi > PMari *owǝ is in any case highly irregular – one would rather expect PMari *üp(ǝ) or the like.

        [Some garbled paragraphs fixed around here. —Juho]

        Mari *ožǝ ‘stallion’ must be a Permic loan, if the Permic form derives from Iranian *atsva- as suggested by Koivulehto; Mari ž cannot directly reflect an earlier affricate. The vowel correspondence is the same as in several other Permic loans (e.g. Mari *odǝ ‘Udmurt’). As a side note, why Mari *o frequently appears as a loan reflex of Permic *ŭ (Sammallahti) / *u (Lytkin) is unclear to me – perhaps these words stem from some extinct Permic variety, or perhaps there is something seriously wrong with all the proposed reconstructions of Proto-Permic vocalism. My bet is on the latter.

        Mari joɣe- ‘flow’ is certainly a loan from Chuvash jox- ‘flow’, and has no relation to the Uralic word for ‘river’ (*joki / *juki / *juka). The development PU *k > ɣ would be irregular; normally intervocalic *k > Ø.

        At least according to my data MariW jož means ‘cool air, cool wind’. This is in irregular correspondence to MariE juž (with the same meaning), so the Proto-Mari form remains unclear (*jož / *juž). The word seems to be related to SaaL joasso ‘chilly weather’, Udmurt juz ‘cool, fresh’ and SlkTy t’āt ‘early winter (when it is cold but there is no snow yet)’ (< PSam *jat), either as reflexes of PU *jasi or as borrowings from Permic.

        Mari *šodǝ ‘lung’ might be cognate with Saami *suovdē ‘gill’ (? Mari *o here. As regards Saami, the etymology is at least semantically better than the suggested Germanic loan etymology (involving Fi hauta ‘pit, grave’).

      • crculver says:

        For čakata, Mari -ata seems to be a suffix indicating an unknown substrate. I have been collecting examples of these words along with adjectives in -ə̑ra and -aka, as there are indications all three come from the same source, and most of these words are very difficult to provide etymologies (Uralic or otherwise) for. Better to leave this out of comparanda.

        Agyagasi argues in the Mari etymological dictionary (pp. 105–106) that laksa is of Turkic origin: the initial vowel of PT *olaq/ulaq would reduce in Chuvash and be lost at some point when Chuvash came to allow initial l-, just like what happened with the ancestor of Cv. lar ‘to sit’. Mari -sa reflects a dialectal form of the Chuvash diminutive ending. I’m not sure what I think of this etymology yet, but again probably better to avoid basing any Uralic reconstruction on it for the time being.

  2. Also, it is not clear to me what Mari *pal ‘bit’ is – does such a word exist at all? UEW, in turn cites Mari puldǝš ‘piece (of bread or meat)’ in this connection, but this is a wrong comparison – the word is an irregular dialectal development of purǝldǝš, derived from purǝ- ‘bite’.

    • Juho says:

      Ah, yes, this seems to be a mistake — apparently rather the Mordvinic word which has managed to jump one column to the right during my data sampling for this topic. Corrected, thanks.

  3. David Marjanović says:

    Mari *ožǝ ‘stallion’ must be a Permic loan, if the Permic form derives from Iranian *atsva- as suggested by Koivulehto; Mari ž cannot directly reflect an earlier affricate.

    But perhaps it doesn’t derive directly from Proto-Iranian *atswah. Perhaps a later stage like Avestan aspō or Old Persian asa was the donor. On the other hand, the *w might explain the ž: [sw] turning into [ʃ] wouldn’t be very surprising.

    • Juho says:

      Good catch. I was going to remark on this too once I finish looking thru the Mari data.

      Assuming *sw → š is not necessary though, since *s → š-, -ž- is regular in Mari. Unless we want to derive the Permic from the same preform too, at least. But since this word is absent elsewhere in Uralic, and we don’t really have evidence for a Mari-Permic grouping, it’s entirely possible the Permic word was rather originally adopted in parallel, as *ač(w)ɜ, versus Mari *as(w)ɜ.

      Re: your comment below, I have seen two arguments presented for reconstructing the reflex of PIE *ḱ / PII *ć as Proto-Iranian *ts rather than *s. The first is that this remains as an affricate in Nuristanian… But, Nuristanian is not obviously either Iranian or Indo-Aryan (“generally regarded as an independent group”, says Wikipedia), so this is iffy. The second is that PIE/PII *s → Iranian *h is apparently supposed to have been a later development, not yet Proto-Iranian, on account of loanwords in Elamite. There might be other reasons yet though, because I can well think of other options here: e.g. *ć vs. *s → [s̻] vs. [s̺] (laminal vs. apical), or [s] vs. [sʰ].

      At any rate, there is a good number of Uralic loans that reflect PII *ć as *č (e.g. Samic-Mari-Permic *počaw “reindeer” ← PII *paću ← PIE *peḱu “cattle”; Finnic poro may also belong here), or even as *ks (Finnic-Mordvinic *maksa- “to pay” ← PII *mandźʰa- ← PIE *menǵʰ-), so assuming older *-č- in the Permic word is not a problem. It is not always entirely clear if these came from Proto-Iranian, Pre-Proto-Iranian — or even some sort of Para-Iranian. E. Helimski has suggested that the language of the Andronovo culture, the likely donor of these II loans, to have been its own branch of Indo-Iranian.

      I suspect the last option. There are some of these “pre-Proto-Iranian” loans that are supposed to have been both later than distinctive Iranian developments such as the depalatalization of *ć, and earlier than the supposedly Proto-Indo-Iranian lowering of *e to *a! AFAIK this radical frontdating of *e → *a is an ad hoc postulation for explaining these loanwords, and for no other purpose. They might rather reflect a source language where PII *a had developed an *e-like pronunciation. As a piece of supporting evidence, I believe I’ve identified one II loanword where it is PIE *o, not *e, being reflected as Uralic *e. Full argument to be presented later. 🙂

      • David Marjanović says:

        That Para-Iranian hypothesis is very intriguing! But the *ks reflex is really hard to believe.

        • Juho says:

          It’s supposed to be a phonotactic substitution. Early Uralic dialects had neither an alveolar affricate *c nor a cluster *ts. Faced with a sound of that sort in loanword, they’d have had two options: loan it as their postalveolar affricate *č, or as a cluster of stop + their alveolar *s.

          At least one of the etymologies under that strikes me as suspicious, though. Namely, identifying the 2nd component in #kaktəksa “8”, #üktəksä “9” as Iranian *daca — because the *-t- seems to come already from *üktə “1”, *kakta “2”. An alternate idea that has been proposed is an analysis as a compound with *e-ks-, the reflexive of the negativ verb (which yields Finnic *eks-ü- “to get lost”), i.e. the words would mean “2 missing”, “1 missing” rather than “2 til 10”, “1 til 10”.

  4. David Marjanović says:

    (I don’t even know why *ts is reconstructed, but I have no reason not to trust Ringe on this…)

  5. David Marjanović says:

    I think it later dawned on me that, as you mention, the attested s must have arisen after the Proto-Iranian *s > *h shift. I didn’t know about the loanwords in Elamite… perhaps they’re actually from the Indic side of things (remember Mitanni), or they’re actually from some Pre-PI stage. There are Pre-Proto-Slavic place names in Austria because the last series of Proto-Slavic vowel shifts happened so very late.

    Also, I had no idea about the persisting affricates in Nuristani! All else being equal (underresearched as that family is), that’s evidence for Nuristani being the sister-group of Indo-Iranian!

    There might be other reasons yet though, because I can well think of other options here: e.g. *ć vs. *s → [s̻] vs. [s̺] (laminal vs. apical),

    Sure, but why would one of those be more likely to turn into [h] than the other?

    (That said, it’s intriguing that the modern Greek /s/ is apical, and that such a Basque-and-Old-Spanish-style distinction has been suggested for Old High German for reasons that are beyond me.)

    or [s] vs. [sʰ].

    Is [sʰ] attested from any language at all?

  6. David Marjanović says:

    …And the thing I forgot to mention yesterday was the thing that dawned on me next: if s is the only attested reflex, then the shift from *ć all the way to [s] must have been completed by the time of Proto-Iranian, and consequently the shift from *s to [h] must have been completed in some Pre-Proto-Iranian stage.

    Oh well. That’s what I get from commenting late at night.

    • Juho says:

      That’s impeccable as a syllogism, but the actual situation is a bit more complicated — the Old Persian reflexes of *ć and *ȷ́⁽ʰ⁾ are /θ/ and /d/ by default. I dunno what’s up with the /s/ in asa “horse” exactly, probably some kind of a conditional development (perhaps in original coda position?)

      This was my reason for suggesting a laminal [s̻] as a possibility for Proto-Iranian, as this is a more likely sound to be fronted to [θ] than its apical counterpart. I’ll redact that idea however: I was thinking of the apical then ending up backed to [s̱], and eventually all the way to [h] (similar to *š in Finnic, or x in Spanish), but Proto-Iranian also has *š (from RUKI) and *x (from lenition of *k) that get in the way here.

  7. David Marjanović says:

    the Old Persian reflexes of *ć and *ȷ́⁽ʰ⁾ are /θ/ and /d/ by default

    Oh, I didn’t even know!

  8. […] šolem doesn’t show /a/ as one might expect in a word from *salama. However, it does agree with a formulation by Ante Aikio that Proto-Mari *o appears before *l if the word does not begin with a […]

