Probing the roots of Samoyedic

Posted on Sat 2019-02-02 by sansdomino — 25 Comments

Last year I participated in a fruitful Academia.edu session on loanwords from Turkic into Samoyedic. I am now honored to see that the final article — P. S. Piispanen 2018, Turkic lexical borrowings in Samoyed, Acta Linguistica Petropolitana 14(3) — ends up incorporating + crediting in detail several of my suggestions. ^[1]

I would like to add here some detail on one of my four views to have made it into the paper. Footnote 4 mentions my proposed date of as far back as 3000 years of age (= 1000 BCE) for Proto-Samoyedic. This is not directly built on just my WIP database of Proto-Samoyedic though: it’s also informed by morphology and phonology. ^[2]

Samoyedic does seem to be the most internally lexically divergent branch of Uralic. We often find native Uralic roots continued in just 1-2 languages, ^[3] in contrast to the situation elsewhere in Uralic, where a native Uralic etymology also predicts good dialect distribution. This fact alone could probably be explained as some kind of a serial-substrate effect though: suppose substrate 1 in Proto-Samoyedic leaves an effect of replacing some Uralic core vocab, substrates 2a and 2b some more in Proto-Selkup and common Northern Samoyedic, some third-generation substrates still some more in Nenetsia, Taimyr, etc.

But it is also the case that Samoyedic is clearly divided in at least six branches with clear boundaries of intelligibility between them. This is quite different from all the other eight Uralic “main” branches, which all show dialect-continuum structure. Maybe the only other really clear within-branch language boundaries are Livonian vs. rest of Finnic, and Udmurt vs. Komi (although also later dialect shuffling has created other opaque language boundaries like Northern vs. Skolt Sami, or Standard Finnish vs. Standard Estonian). In Samoyedic, although there is a base layer of some old crisscrossing isoglosses, and probably late areally shared phenomena, all six of the Samoyedic groups also have a large share of unique distinguishing innovations. E.g. in consonant phonology:

Nenets(ic): *nt > n
Enets: *-C > -ʔ (general)
Nganasan: *ŋt > jt
Selkup: *j *w > *ć *k
Kamassian: *NP > *NN (general)
Mator: *kʲ × *sʲ > k

This selection is also not at all unique. Similar lists could be built also of solely vowel phonology, or inflectional morphology, derivational phonology, core vocabulary, loan vocabulary, notable semantic shifts — pretty much any one component of language. This is the key point that I see putting Samoyedic one “grade” ahead of the historical development of the other subgroups of Uralic. Within a group like Finnic or Khanty, no obvious taxonomy of this sort is possible. We can chart out a bunch of prominent local innovations (Western Finnish *ð > r, Southern Khanty *ɬ⁽ʲ⁾ > t⁽ʲ⁾…), but usually not even cover the dialect area by these, let alone divide it. There are always transitional dialects either lacking or overrepresenting putative branch-defining innovations. More damningly yet, in dialect continuum cases there’s not much coherence between the “phonological branching”, “morphological branching” etc. E.g. the plural genitive isogloss across Finnic (west *-den ~ east *-i-den) ends up being just one isogloss among a dozen or so that have been proposed as grounds for a primary division of the group.

Quite feasibly transitional Samoyedic varieties once did exist, but died out eventually, due to other groups such as Russians, Yakuts, Evenkis enroaching on the rather extensive Samoyed area; or due to “secondary expansions” within the family itself. (Yurats works as a proof-of-concept, assimilated to Tundra Nenets rather than Russian.) This does not make for a counterargument, though, since we by now see the same process playing out within the “younger” groups as well: all but Northern Mansi is gone, Southern Khanty is gone, Kemi and Akkala Sami are gone; Ume, Pite and Sea Sami are moribund, Votic and Ingrian are moribund; many traditional Finnish and Estonian dialects are rapidly assimilating into the standard. Assuming that the Samoyedic expansion ran out of steam (turned into a recessive, low-status language family) much faster than others sounds unwarranted too, especially given that it has in the end reached a much wider area than the other Uralic subgroups.

We do not have any historical-philological evidence for dating the early stages of Samoyedic. But we can do the same with Samic and Finnic, by leveraging the well-known history of Germanic (and even Latin) through loanword evidence. The results come out, in both cases, as showing that the first isoglosses within S and F start appearing already in the second half of the first millennium BCE, and clear dialect areas have been established by 0 CE, though many common innovations continue to diffuse across the dialect area until as late as the first major round of Slavic influence circa 1000 CE. ^[4] We know dialect continua can fracture into multiple clearly distinct languages quite rapidly (most of Finnic was still a single dialect continuum circa 1900, and is looking headed for just a handful of surviving discrete daughter languages by 2100) — but we also know Samoyedic was “discrete” already as early as about 1800. As a conservative estimate, I’d therefore then add about 400 years more age for Samoyedic. This adds up to a minimum age of about 2600 years BP for Samoyedic, which I then round up to the accuracy of one decimal, due to the numerous uncertainties involved.

This is all still a lower age limit. The only real upper limit seems to be that Samoyedic was still a single dialect continuum by the time of contact with Proto-Turkic, usually dated somewhere around 0 CE… but “standing” dialect continua can easily reach ages of a millennium or two! So 3000 BP really isn’t even a maximally bold suggestion. A pitch like 4500 BP would however start to have further implications: I’d obviously also have to backdate Proto-Uralic closer to the traditional 6000 BP than the recently proposed “shallow” chronologies branching off only at about 4000 BP.

Proto-Samoyedic also seemingly shows substantial general divergence from Proto-Uralic, but this does not mean that a “long” chronology would demand an outright Mesolithic dating for PU. Again as seems to be the case also in Samic and Finnic, various pan-Samoyedic innovations could be also re-dated into their common dialect continuum phase. Helimski’s vowel system updates (retained *a, *ä, *e in PSmy rather than Janhunen’s *ä, *e, *i) already point in this kind of a direction, as does the phenomenon of native Uralic roots being often restricted to a single Samoyedic language (this means that many may have been lost in parallel in all). I think two likely additional candidates are the sound change *ľ > *j, found even in isolated loans from Tungusic; and “coaffix insertion” into the local cases, which has been long known to have proceeded differently in Nganasan than in the rest of Samoyedic (and as I’ve recently learned from Valentin Gusev, Nenets and Enets have some quirks in this too in the possessed paradigm).

I will readily admit that none of the above discussion takes any direct archeological evidence into consideration. Again (cf. footnote 2), this is intentional. Archeology cannot date languages, not even identify them: it can only create a sociohistorical backdrop that we can attempt to pin language expansions on. At a pinch, all that really happens here is that we draw one directed graph indicating known relationships of archeocultural descent and influence; another directed graph indicating known linguistic relationships; and attempt to fit the latter as a minor of the former. If culture A begets B which begets C, a priori it would not be parsimonious to assume A, B and C to have all spoken different languages entirely; but it may also prove necessary to fit other pieces of the big picture in. If the proposed language of culture B has clear contact influence from language L, we’d like to assign also L to have been spoken in a culture that was actually in contact with B. Everything else, e.g. cultural reconstructions on Proto-Samoyeds as copper traders or reindeer nomads or hunter-gatherers or what have you, comes downstream of linguistics/archeology pairings based on the “topology of chronology”.

The recent decades’ paradigm shift on the origins of Finnic and Samic is again instructive, I think. The same language expansions were varyingly pinned on multiple known material-cultural expansions, with details filled in with assumptions where necessary. What had changed was not the archeological evidence: the new picture emerged due to new linguistic evidence, with results such as the early divergence of South Estonian and Livonian, the existence of a para-Sami substrate across most of Finland and far east into Russia, and the unviability of a common Finno-Samic node (itself done in maybe primarily by loanword research showing many “Finno-Samic lexical innovations” to be loans back-and-forth, or in parallel from Indo-European). These changed the topology of the Uralic linguistic family tree enough that it could no longer be fit into the “archeological family tree” in the same location.

And for Samoyedic, we don’t have a clear enough picture of this area of the family tree yet. There’s no consensus model for the branching of Samoyedic, nor for its splitting from Uralic. Those who side with an East Uralic group will be able to find a roughly suitable archeological assignment for it; so will those who side with a Finno-Ugric group; etc.

The fact that language does not have to coincide exactly with culture also helps to create a lot of wiggle space here. For one, linguistic descent can happen also through cultural “contact”, rather than cultural “descent”; for two, linguistic splits can happen invisibly, without any corresponding cultural split (especially if we’re talking about just basic dialect diversification); for three, cultural expansions can pull along multiple linguistic lineages at the same time. The last two in particular combine to form a situation where even if we could match cultural and linguistic lineages accurately, we still cannot use splits in one to date the splits in the other. I believe this is indeed the case in Samoyedic. There is strong archeological evidence to assume that Northern Samoyedic arrived on the Arctic coast only in the ballpark of 1000 years ago; ^[5] but this does not allow us to conclude that the language spoken at the time was really unified Proto-NSmy. I would think that at minimum a pre-Nganasan dialect and a pre-Nenets-Enets dialect already existed separately at this time, to allow for certain cases where Nenets-Enets shares isoglosses with southern Samoyedic branches like Kamassian or Mator. Perhaps more varieties yet, existing first as clan or family dialects before ballooning into full-blown languages.

I do not believe I am ending up with a radically different approximation for the age of Samoyedic from previous researchers — e.g. Janhunen in his 1998 handbook article guesstimates that “proto-Samoyedic seems to have dissolved as recently as the last centuries BCE”, i.e. in the same millennium as my conservative assumption does (or, for what it’s worth: Blažek’s recent glottochronological calculation comes out at 250 BCE). But as comes to the deeper end, I do make one methodological basic assumption that I do not think other linguists always properly appreciate: a proto-language is by definition unitary, and it is broken up already by the first emerging dialect isogloss. Not upon the emergence of more major division lines such as daughter ethnicities (identities are malleable and can easily also re-coalesce), or “language-type” rather than “dialect-type” boundaries (whatever that may mean), or loss of mutual comprehensibility (not a binary distinction anyway). A proto-language only has its strong methodological value if it is reserved for the truly common ancestor, a stage that precedes the rise of all areal variation; otherwise we lose the ability to reconstruct innovations, and can always appeal to almost any arbitrary modern variation having “already existed in the proto-language” (so, ever since humans first invented speech?). All isoglosses have a finite age, and when we seek to date a family’s break-up, we are seeking to date the oldest isogloss observable within the family — or at least, the oldest theoretically somehow dateable isogloss. And it is these roots that I believe could run quite deep compared to the conservative approximations.

[1] Really I wonder if I should start keeping a list of publications I have been credited on. Eventually this would be pointless I’m sure, but as an early-career researcher, maybe not…
[2] Comparative syntax, especially clause-level, I must admit I know roughly jack shit about (in general, not just re: Samoyedic). This is an intentional omission of effort: maybe my core subfield is comparative phonology, which does not have much overlap with syntax at all. At most there would be third-degree repercussions through morphology / classification / areal linguistics, hardly any more than from fields like paleography or folkloristics.
[3] Examples (far from an exhaustive list): PU *uwa ‘flow’, *kuwakka ‘long’ reflected only in Nganasan; *ekä ‘big; father’ only in Enets; *muja- ‘to smile’, *säńćä- ‘to stop’ only in Nenets; *kajə ‘hair’, *këččə ‘bitter’ only in Selkup; *porə- ‘to eat’, *suwďa ‘finger’ only in Mator. Works the other way too: I’ve a list of the most widespread Uralic vocabulary, and their average distribution across Samoyedic, when present, seems to be clearly lower than across any other branch.
[4] I may do a fuller post on this eventually, but I believe the supposed “Slavic loanwords in Proto-Finnic” like *pappi ‘priest’, *risti ‘cross’ well postdate the breakup of PF. Several other loanwords from essentially the same phase of Slavic already show dialect divisions existing: mainly via differing sound substitutions, such as *netäli ~ *nätäli ‘week’, *värttinä ~ *värttenä ~ *värttänä ‘spindle’, *šauki- ~ *šaukë- ‘pike’. A few cases like ‘priest’, ‘cross’ may appear uniform just due to their phonological simplicity, therefore making up a case of what I call “convergent parallel loans“.
[5] Dated more accurately actually, but I do not have the details on hand.

Tagged with: dialectology, historical linguistics, linguistics, proto-samoyedic, publications, samoyedic, sociolinguistics
Posted in Reconstruction

25 comments on “Probing the roots of Samoyedic”

David Marjanović says:

Sat 2019-02-02 at 01:38

I emphatically agree with your definition of “proto-language”. What is accessible to comparative reconstruction is the last common ancestor of attested varieties, and that’s not a “language” in the usual synchronic sense, but a “tiny little subdialect” that may (or may not) have been part of a much larger “language”; likewise, a last common ancestor in biology isn’t a “species” in most of the 150 senses of that word, but, uh, one of Keesey’s “ancestral sets” – in extreme cases a breeding pair for sexually reproducing organisms, always a single individual for asexually reproducing ones.

[1] Really I wonder if I should start keeping a list of publications I have been credited on. Eventually this would be pointless I’m sure, but as an early-career researcher, maybe not…

As long as that list isn’t noticeably shorter than the list of your own publications, do it and put it into your CV.

Reply
- j. says:
  
  Sat 2019-02-02 at 10:53
  
  There’s also that when people want “the” age of a family / proto-language, this seems to unpack to about three different things… one of which is just any metric at all for roughly comparing the ages of various families. A few additionally different datings would also result if we wanted to date things like “last common phonology” or “last common inflectional morphology”.
  
  (Note here also that “last common lexicon” does not exist, not without some additional constraints. Once we are looking at marginal minor derivatives or ideophones, every speaker always has a slightly different lexicon of their own. I’d guess even any one speaker’s lexicon right now and lexicon one year ago are always different.)
  
  Reply
David Marjanović says:

Sat 2019-02-02 at 01:40

Oops, I forgot:

Mator: *kʲ × *sʲ > k

Kentum Uralic! ^_^

Reply
Crom Daba says:

Sat 2019-02-02 at 15:40

I disagree with this approach, I think there are three “Proto-languages” to consider: 1. The unitary ancestor of the language group (Janhunen would maybe use “Pre-Proto X”). 2. The language with all common post-dialectal innovation (“Common X”). 3. And our reconstruction (usually just “Proto X”), which aims for (1) but is unavoidably more like (2).

Making hard statements about (1) based on (3) appears to lead false precision and other blunders, and when we talk about language contact or attempt to map languages to cultures, (2) seems more useful to operate with anyway since we have a time interval rather than a single point.

I am surprised to learn this definition of LCA in biology. How do you deal with cases where both species have obviously inherited standing genetic variation? Do these come up?

Reply
- David Marjanović says:
  
  Sun 2019-02-03 at 15:28
  
  I don’t think (3) is unavoidably more like (2) than like (1). In the cases I’ve seen, it seems to be just a matter of putting in conscious effort to get closer to (1).
  
  But yes, equating the versions of (3) we have with (1) uncritically has led to all sorts of blunders.
  
  How do you deal with cases where both species have obviously inherited standing genetic variation? Do these come up?
  
  These do come up, and give cases where molecular divergence dating using different genes gives different dates (and different tree topologies, too). The slow separation of humans and chimps is such a case.
  
  There’s only one species concept in which branches are necessarily species, though, and it has hardly ever been used.
  
  BTW, by “ancestral sets”, I meant the “cladogenetic sets” or “cladogens” from this paper. The issue it describes and solves hardly ever comes up, because we don’t often have that much precision in the first place.
  
  Reply
  - Crom Daba says:
    
    Sun 2019-02-03 at 16:03
    
    Why presume that all information is preserved in daughter languages? When we compare Latin to “Proto-Romance” we see that most information is actually lost, and I don’t think it is any different in other such cases. “Proto-Slavic” would be completely different if we had no OCS data, and especially if we had no comparative IE data.
    
    Or even when this information is preserved, it might not be strong enough as evidence to inform the reconstruction we would get using optimal methodologies. Let’s do some leave-one-out-cross-validation: what if Greek never existed, would the reflexes of initial laryngeals in Armenian be convincing enough to reconstruct them in PIE? Is Albanian h- good enough to convince you of the fourth laryngeal?
    
    Reply
    - David Marjanović says:
      
      Mon 2019-02-04 at 01:38
      
      Why presume that all information is preserved in daughter languages?
      
      I don’t! There will always be words and features we simply can’t reconstruct for lack of data, and changes we can reconstruct but not date with as much precision as we’d like.
      
      Let’s do some leave-one-out-cross-validation: what if Greek never existed, would the reflexes of initial laryngeals in Armenian be convincing enough to reconstruct them in PIE?
      
      That’s subjective. Probably they would be accepted, though, because Armenian vowels dropping from the sky in the exact places where Hittite has ḫ would make even less sense. Note that *h₁C- gives aC- in Hittite.
      
      Is Albanian h- good enough to convince you of the fourth laryngeal?
      
      A fourth laryngeal has been proposed to account for some cases of Albanian h-, some cases of Armenian h- and the Hittite word alpa- “cloud”. Given that the Albanian and Armenian h- don’t line up, they can’t have the same explanation; and judging from the lack of mentions in the literature I’ve seen, *Helbʰ- ~ *Halbʰ- is not attested with h- in either of them. So, for alpa- I currently prefer the explanation that it’s a loan from an IE branch that had already lost *h₂-. Albanian and Armenian need further research; h- could have spread by various kinds of analogy (e.g. how *u- was so much rarer in Pre-Greek than *hu- that all cases of the former joined the latter), but that’s just speculation on my part. I’m generally reluctant to assume completely random additions of [h]-; Basque was long thought to contain many such cases, but most or all have now been convincingly explained otherwise. But on the other hand, both Armenian and Albanian are deeply enough nested within IE that I’m also reluctant to assume several independent(/areal) losses of two different “fourth” laryngeals in a considerable number of IE branches.
      
      Reply
      - Mikhail Zhivlov says:
        
        Mon 2019-02-04 at 02:40
        
        how *u- was so much rarer in Pre-Greek than *hu- that all cases of the former joined the latter
        
        I have a crazy theory that Greek h- before u reflects PIE zero, i.e. the absence of a laryngeal. All cases of PIE anlaut *Hu- were subject to Rix’s law and yielded diphthongs, so Greek hu- reflects only PIE *u- as a weak grade of *we/o-.
        
        Reply
        
        David Marjanović says:
        
        Tue 2019-02-05 at 02:57
        
        Well, Greek h- before u reflects both *u- and *su-.
        
        Reply
  - j. says:
    
    Sun 2019-02-03 at 21:32
    
    In this terminology, I would say that reconstructing (1) is reasonably doable for subgroup proto-languages, when we have clear outgroup evidence for what’s an archaism and what’s not. Reconstructions of bottom-level proto-languages seem to come usually closer to (2) though, with many earlier faultlines only visible as “irregularites”, and many changes shared by all dialects going unidentifiable. Sometimes these could be teasable out via internal reconstruction though, or with the help of old loanwords.
    
    Modern-day “shallow chronology” dates for Uralic seem reasonable to me if for the “proto”-languages in sense (2) (Late Common Uralic at 4000 BP, Late Common Samoyedic at 2000 BP, Late Common Finnic at 1200 BP etc.), but clearly not for the proto-languages proper.
    
    Some readers, BTW, may be interested in comparing notes with Jaakko Häkkinen’s recent introductory essay “Kieltenvälinen vertailu historiallisessa kielitieteessä” (2016), which includes a similar definition of “early”, “middle” and “late” proto-languages (p. 16 on the linked pdf). His “late” is same as your (2) — but your (1) is his “middle”, and his “early” is the “point of splitting” from the closest relatives; so roughly, the corresponding dialect of stage (2) of the next-largest clade. This is basically baggage from the fact that in Finland, Proto-Finno-Samic was traditionally called “early Proto-Finnic” (which is itself is derived from a 19th-century habit of calling all known Uralic languages other than Hungarian “Finnic”; what we today call “Finnic” used to be “West Finnic”).
    
    Reply
Mikhail Zhivlov says:

Sun 2019-02-03 at 00:38

A proto-language only has its strong methodological value if it is reserved for the truly common ancestor, a stage that precedes the rise of all areal variation; otherwise we lose the ability to reconstruct innovations, and can always appeal to almost any arbitrary modern variation having “already existed in the proto-language”

I can only hope that someday all introductory courses on historical linguistics will state this as clearly as you have done. It’s difficult to say how much confusion is caused by the notion that a proto-language must be like what we call “language” in ordinary life, with dialects and so on.

Reply
Hans-Werner Hatting says:

Sun 2019-02-03 at 11:29

“We know dialect continua can fracture into multiple clearly distinct languages quite rapidly (most of Finnic was still a single dialect continuum circa 1900, and is looking headed for just a handful of surviving discrete daughter languages by 2100) — but we also know Samoyedic was “discrete” already as early as about 1800. As a conservative estimate, I’d therefore then add about 400 years more age for Samoyedic. This adds up to a minimum age of about 2600 years BP for Samoyedic, which I then round up to the accuracy of one decimal, due to the numerous uncertainties involved.”
For that argumentation to work, the elimination of intermediate dialects in a continuum by replacement by other dominant dialects or by other languages would have to depend on the age of the language family, while it actually is caused by socio-economic factors that have nothing to do with that age. If Finnish would have been replaced starting from the Middle ages by (say) Swedish and Russian on a much larger scale than actually happened and at 1800 only a handful of dialects from extreme ends of the continuum would have survived, wouldn’t that look similar to Samoyedic? Or am I misunderstanding what you are saying here?

Reply
- j. says:
  
  Sun 2019-02-03 at 21:04
  
  Good question. Answered in a separate post entirely…
  
  Reply
Howl says:

Sun 2019-02-03 at 19:36

“There’s no consensus model for the branching of Samoyedic, nor for its splitting from Uralic. Those who side with an East Uralic group will be able to find a roughly suitable archeological assignment for it; so will those who side with a Finno-Ugric group; etc.”

Has anyone ever proposed a branching model for Uralic where the Central branches (Mari, Permic and Ugric) split off first, leaving Samoyedic + West-Uralic as a separate branch?

The Uralic vowel correspondences work way better for Samoyedic and West-Uralic than for those Central branches. Consonant Gradation is another interesting isogloss between Samoyedic and West-Uralic. And all those East-Uralic isoglosses that Ugric and Samoyedic share could also be attributed to a Sprachbund.

Reply
- j. says:
  
  Sun 2019-02-03 at 20:14
  
  Quite funny. I’ve not entertained anything this provocative, only the more minor thought that maybe it’s one of the central branches like Mari or Permic that actually is an outgroup to the rest of Uralic. Also, if it were a fairly shallow split, this would be nearly impossible to demonstrate in any case.
  
  I would still think vowel correspondences “work better” across WU–Samoyedic mainly because they’ve retained the archaic bisyllabic stem structure. At least in Permic and Ugric, reduction to monosyllabicity seems to have come hand in hand with extensive metaphony.
  
  Likewise I am on board with Helimski’s theory that consonant gradation is an archaism and has in the central branches been typically lost under extensive medial lenition. FWIW even old Mansi and Khanty records up ’til the mid-1800s show some kind of non-general medial voicing that a few pioneers like Castrén connected with consonant gradation, and I’m not sure if this has ever been investigated in detail.
  
  Reply
  - David Marjanović says:
    
    Mon 2019-02-04 at 01:15
    
    It gets funnier still. Yesterday or the day before, when I woke up, I remembered your mention of a “semi-regular” voicing process in Permic. What if we* instead assumed that Permic (or “Central Uralic”) is the sister-group to the rest of Uralic, projected the *k-*g contrast straight back to PU and compared the result to IE, Yukaghir or whatever…?
    
    But now that you mention consonant gradation, maybe that is to blame. Just guessing.
    
    * The hyper-exclusive we: we without me.
    
    Reply
    - j. says:
      
      Mon 2019-02-04 at 04:12
      
      That’s exactly one feature I’ve sometimes thought could be an archaism in Permic. These are initial voiced stops, so anything gradation-related is ruled out.
      
      Reply
      - David Marjanović says:
        
        Mon 2019-02-04 at 14:04
        
        Excellent. *tries to twirl mustache*
        
        Foolhardy prediction: the Sejma-Turbino Phenomenon was carried by speakers of Proto-Peripheral Uralic.
        
        Reply
        
        j. says:
        
        Mon 2019-02-04 at 14:33
        
        As long as we’re running with this line of speculation, that squares poorly with how the Sejma-Turbinoans were getting much of their copper from the Permian foothills. Which, OK, could have been pre-Mansi rather than pre-Permic speaking at the time, but you’ll still have to adjust the idea to something like East–West / Non-Kama Uralic.
        
        My working hypothesis for the phylogeny for Uralic though is the “mittens model” where the primary split is West+Mari versus Permic+East.
        
        Reply
  - Howl says:
    
    Mon 2019-02-04 at 22:44
    
    The inspiration for this funny and provocative idea came from here:
    
    “On the Uralic side, we are lacking no less than fully bridging Mari, Permic and Ugric with the standard model of PU rooted in the comparision of West Uralic with Proto-Samoyedic” [https://www.frathwiki.com/Indo-Uralic/Roadmap]
    
    I would not phrase it that bluntly. But it aligns with my observations about how the Uralic sound-correspondences work for those branches. And when you have a set of branches with so many unexplained divergences in the sound-correspondences, I think it would only be natural to at least consider the possibility that we might be dealing with earlier split-offs.
    
    “At least in Permic and Ugric, reduction to monosyllabicity seems to have come hand in hand with extensive metaphony.”
    
    The problem I have with metaphony from unattested vowels in subsequent syllables is that it’s just obscurum per obscurius to me.
    
    Reply
    - j. says:
      
      Tue 2019-02-05 at 00:07
      
      It’s entirely possible that the West–Samoyedic comparison basis fails to uncover some PU vowel contrasts or post-PU innovations that might be retained in some central branches, but that doesn’t have to mean these features would be lost entirely in both “ends”. There is also a decent amount of vowel correspondences with Samoyedic that seem to remain unexplained as well, and also assuming West Uralic to be usually archaic may be misleading us on some counts. I already have a few cases of this sort on my plate for further investigation. Plus there’s how there is just about nothing of this sort happening beyond the vowel messes.
      
      A third problem is that if you really wanted to, you could still locate a few potential first-outgroup archaisms in any one Uralic group (e.g. the suggestion that Samic is the only branch of Uralic that retains the original word *śäčä for ‘water’, and most others roll in *wetə as a loan from IE; I’ve seen similar ideas also e.g. for Hungarian én ‘I’). But they cannot all be the first outgroup; so what grounds do we to privilege any one branch’s quirks at all? I admit it’s possible there is an “outgroup in the center”, but actually identifying one gets methodologically thorny fast.
      
      — Second-syllable vowels are not quite so unattested even in the central branches if you look closely (aside from them being easily observable in relatives), although this is for some reason not that widely known. E.g. Udmurt shows in nouns a “thematic” vowel distinction in some 3PS possessives and derivatives in /-s/ < *-ksə, which turns out to be etymological: /e/ < *-A, /ɨ/ < *-ə. Or Mari of course has the whole *-A- < *ə, *-e- < *A contrast in verbs.
      
      Reply
      - Howl says:
        
        Tue 2019-02-05 at 13:14
        
        “e.g. the suggestion that Samic is the only branch of Uralic that retains the original word *śäčä for ‘water’, and most others roll in *wetə as a loan from IE”
        
        Or *śäčä is just a less transparent borrowing of another water word from IE:
        PU *śäčä ‘water, flood’ (KhVj seč ‘flood, SaaT čɑ̄cce ‘water’) < PIE *(s)neh₂ 'to flow, to swim' (Gk nā́ma 'Running water', Arm nay 'wet, liquid', MIr snau 'stream')
        
        "Second-syllable vowels are not quite so unattested even in the central branches if you look closely"
        
        I have not seen any concrete proposals about how metaphony could fix the messy vowel correspondences for Mari or Permic. But the ablaut theories that I have seen for Khanty leave me wondering where their second-syllable vowels came from and where they went.
        
        Reply
        
        j. says:
        
        Tue 2019-02-05 at 15:10
        
        That comparison seems indeed not transparent at all: the only thing they appear to share is one sibilant that doesn’t match.
        
        I’m only really saying second-syllable vowels are still reconstructible even for central branches. Per current understanding they make only a partial dent in Permic (primarily for *i *ä *ë *o and even for these mostly in connection with syllable structure) and not much in Mari at all. What explanatory value this does have is however hardly “obscurum per obscurius”. Similarly for Khanty I mean a number of basic splits conditioned by reconstructible stem types (e.g. *ë-a > *ïï but *ë-ə > *aa), not somewhat circularly reconstructed further umlaut triggers.
        
        Reply
peljä says:

Fri 2019-03-08 at 22:57

*kuwakka “long” mentioned to only be continued in Nganasan in [3] sounds oddly similar to the reconstruction *Ku(u)KKa(s) from the supposed Pre-Finno-Ugric substrate that Aikio wrote about in “An Essay on Substrate Studies and the Origin of Saami”; that word was said to be continued in Proto-Samic *kukkē(s). Could that just be a coincidence of some kind?

Reply
- j. says:
  
  Sat 2019-03-09 at 01:43
  
  Yes, this could feasibly be the same root (though already UEW disagrees). You can find a bit more on the topic also in Aikio’s 2000 article “Suomen kauka“.
  
  Reply