Workflows in historical linguistics

A few too many of my blog posts seem to end up ballooning into mini-articles and consequently spend months if not years languishing in my drafts. Let’s see if I can keep this one brief.

An adage sometime seen in historical linguistics is “classification before reconstruction”. On one level, I agree. But, on a few others, this seems to be often abused as an excuse to skimp on proper rigor.

What this means, in my opinion:

  • It’s not possible to do comprehensive comparative reconstruction work with data from unrelated languages. Reconstruction can only be attempted once we have a reasonable amount of certainty that some particular language family exists at all.

What this does not mean:

  • Classification having to precede work in historical phonology entirely. Realiable classification cannot be done by vague casual eyeballing of data. “A reasonable amount of certainty” for the relatedness of some particular languages requires being able to locate regular sound correspondences within their shared vocabulary (preferrably non-trivial ones, but any regularity is a start). [1] In the absense of regular sound correspondences, all vocabulary comparisons can potentially be suspected to be either coincidental, or loanwords rather than strict cognates.
    In other words: sound correspondences are not reconstructions, in themselves. In the case of binary comparison, this distinction may end up blurred, since it’s possible to kind of put together an initial “trivial reconstruction” by just listing all your correspondences, and giving each of them some kind of a vague phonetic label. [2] If the family has more members, though, the bare sound correspondences typically end up looking more like networks — since sound correspondences are not transitive. If /tʃ/ in language 1 can correspond to /s/ in language 2, and /s/ in language 2 can correspond to /h/ in language 3, this does not automatically guarantee that a correspondence /tʃ/ ~ /h/ between 1 and 3 would be demonstrable, or even expected at all. Perhaps /s/ in language 2 is a merger of two separate proto-phonemes; perhaps these correspondences do continue the same proto-phoneme, but under mutually exclusive conditions; perhaps one of these correspondences indicates loanwords after all and not native vocabulary.
  • Subclassification having to precede reconstruction. On the contrary, it is reconstruction that often allows us to put together arguments in favor of subgroups, by providing a root for our sound correspondences. If we have a correspondence such as t ~ t ~ s ~ s, it’s likely that either the t-group or the s-group has innovated, and constitutes a subgroup. But it is also very possible that the other group has not, and is paraphyletic. Without reconstruction work, this is not resolvable.
  • Reconstruction being unable to inform classification. A reconstruction of the parent of a set of languages might end up coming out closer to some other language, that we may have suspected (but haven’t dared to declare) to be also related. It could even turn out that this language newly under comparison is not only related, but it is indeed a direct descendant of this same proto-language; just a very divergent one! — Or maybe the proto-language turns out to be substantially less similar to the other language being compared, and the earlier suspicion of a relationship evaporates entirely, or has to be reanalyzed as a late loanword layer.
  • Language isolates‘ history being unreconstructible. Internal reconstruction combined with loanword evidence can allow identifying probable sound changes and lexical intrusions just fine… though I suppose it will be unlikely to get especially far with this technique.

A more detailed workflow for historical linguistics, if starting from zero, would therefore look something like the following:

  1. Acquire data; sort out some initial vocabulary comparisons that look promising.
  2. Analyze sound correspondences; use these to look for more comparisons.
  3. Look at the big picture to see if some particular subset of languages should be indeed considered related.
  4. Attempt reconstructing the proto-language.
  5. Use the proto-language POV to clarify the status of issues like problematic etymologies, possible external relatives, or possible subgroups.
  6. Use modified analyses of data to improve the proto-language reconstruction.
  7. Iterate 5 and 6 until you’ve run out of insights to gain from the data.

This could also work as a kind of a typology of how far along research on a particular language family is. To date, I don’t think any language family has yet exhausted stage 7. Most are stuck in limbo somewhere around stage 3; only a few have reached stage 5, and Indo-European might be the only one to have indisputably gone through one cycle of stage 7. Big disputed hypotheses grouping well-accepted families together can probably be divided according to if they’re closer to stage 1 (e.g. Amerind, Nilo-Saharan) or stage 2 (e.g. variations of Nostratic). Smaller disputed hypotheses often seem to be either at stage 2 or stage 4, depending on who you ask (e.g. Altaic). (To which I might reply: if these really are supposed to be already at stage 4, bring on stage 5, please.)

Of course there are many major facets of historical linguistics still missing here. We also want to account for typology at some points, morphology too at others, semantics three, periodically research loanwords and then weed them out of the proto-language, maybe entertain some substrate hypotheses.

[1] Some people will claim that vocabulary is strictly optional and you can show relatedness solely on the basis of grammar. I am skeptical; but if this were to be the case — then the implication is that we will not be doing any lexical reconstruction work at any point at all.
[2] Maybe with subscripts to disambiguate overlapping sets if you’d prefer, but anything goes in principle. If your heart desires to see more wingdings in linguistics papers, there is nothing formally wrong in re-labeling a t ~ tʰ correspondence as *☕.

Advertisements
Tagged with: , ,
Posted in Methodology
50 comments on “Workflows in historical linguistics
  1. M. says:

    Some people will claim that vocabulary is strictly optional and you can show relatedness solely on the basis of grammar. I am skeptical; but if this were to be the case — then the implication is that we will not be doing any lexical reconstruction work at any point at all.

    If one were to restrict reconstruction to inflectional affixes and grammatical clitics, then doesn’t this imply that 95% of reconstruction is comparison between monosyllables (if that)? That’s a pretty precarious threshold for false positives.

    • j. says:

      That it is.

      It might be possible to similarly unpack this to some extent. Let’s say that to clearly demonstrate a relationship takes about 5 examples of each alleged sound correspondence, on average. If a relationship involves about 60 distinct sound correspondences, then we need about 300 instances of correspondences altogether. If we’re working data comprising CVC word roots, then each of these provides three instances of correspondences, and getting together 100 lexical comparisons will suffice. With CVCV roots, this drops to 75; CVCVC roots, 60. But if we’re working with affixes that are C or CV (let’s say either one on average), then we’ll need 200 affix comparisons to have a similar level of regularity… This is in principle still doable, if we’re working with strongly inflected languages and looking at both inflectional and derivational morphology, but it’s far from the default level of evidence in “morphological comparisons”.

      Most “morphological” relationship proposals do make use of lexical data as well for supporting whatever sound correspondences are proposed, though, which partially gets around this problem. And then there are issues such as that the strenght of comparisons actually also depends on semantic exactness. Morphological comparisons such as “sublative case” or “second person dual reflexive” can have here the additional benefit that they can indicate categories that lack entirely from many other languages.

      At the flipside, though, I find that non-exact morphological comparisons are 90% make-believe. A frequentative verb suffix perhaps could turn into a second person singular marker, or a collective noun suffix into a first person singular marker (both of these actual serious proposals for the development of the Hungarian indefinite verbal paradigm!), but I could easily create twenty other similar but equally unverifiable scenarios. The core problem is that semantic shifts are always “singular”: they affect one morpheme only at a time, unlike regular sound changes.

      • M. says:

        And then there are issues such as that the strenght of comparisons actually also depends on semantic exactness.

        Do you think this strictness only applies to affix comparison? I think it applies just as much to lexical comparisons, at least when one is building a core bank of evidence for sound correspondences. Once this core has been established, then it may be possible to relax semantic criteria.

  2. ኔቢያት መንገሣ says:

    I definitely enjoy the stages with which you placed the different forms of phylic classification validity — if that’s how it can be described. Because even though “Nilo-Saharan” is considered wishful thinking a lot of linguists do get confused because admittedly, it is stage 1. It does look promising but any question of interrelation even between geographically close families is hard to prove (i.e. Gumuz & Koman, Nilotic & Surmic). Although I’m not sure where Afroasiatic sits in this scheme of things because there’s no agreement on even a single branch’s proto-language let alone the phylic proto-language and then there’s Kujarge and Ongota which have resisted classification; the former being confused for Cushitic or Chadic, or something between and the latter is debatably Cushitic, Omotic, or its own branch.

    Over all this post was really informative and I really enjoyed it so much!

    • j. says:

      Everything starts at stage 1, though, including also junk like “Hungarian is closely related to Sumerian” or “Hebrew descends from Turkish”. There is no stage 0 for “truly illusory” hypotheses.

      On Nilo-Saharan in particular, some of the stage 1 work has also concluded against relatedness, e.g. the recent Starostin paper on core vocabulary. Really I think most level 1 hypotheses are not fit for further work as long as they have clear main components that remain themselves at no more than level 2 to 3.

      • ኔቢያት መንገሣ says:

        I recently had read this paper by Dimmendaal that suggests at least a relationship between Surmic and Nilotic (which is very feasible), but he’s the only linguist I’ve seen who at some points will objected to “lumping” via typology but then support a typological argument for Nilo-Saharan’s existence. I support some groupings, like maybe Komuz and Nilo-Surmic but besides that I think the rest are questionable at best. Even in geographically close groupings such as Fur, Daju, and Nubian prior to the last 2 and a half millennia.

  3. David Marjanović says:

    A few too many of my blog posts seem to end up ballooning into mini-articles and consequently spend months if not years languishing in my drafts.

    Maybe you can turn them into series? That way you wouldn’t need to finish posting a series before you started posting another.

    Smaller disputed hypotheses often seem to be either at stage 2 or stage 4, depending on who you ask (e.g. Altaic). (To which I might reply: if these really are supposed to be already at stage 4, bring on stage 5, please.)

    Hm. I’d say Altaic is indeed at stage 4, and stage 5 has begun (Mudrak’s discovery that Proto-Eskimo can almost be derived from Proto-Altaic, admittedly mostly by mergers; somebody should really get to reconstructing Proto-Eskimo-Aleut already!). However, I also think that parts of stage 4 should be redone because of certain recent discoveries that I don’t have time to list right now.

    If one were to restrict reconstruction to inflectional affixes and grammatical clitics

    The idea here seems to be to compare whole systems of morphology, as has of course been done with conjugation or declension in IE (well, “Graeco-Aryan” anyway) or Afro-Asiatic. In these nearly ideal cases, this works very well indeed; but methods for comparison of polysynthetic verb templates are still being developed, it seems, and as soon as you run into an isolating language you’re out of luck.

    • Crom daba says:

      “Mudrak’s discovery that Proto-Eskimo can almost be derived from Proto-Altaic, admittedly mostly by mergers”
      That Moscow reconstruction of Proto-Altaic is so adaptable to whatever data you throw at it that I wouldn’t be surprised if Uto-Aztecan could be derived from it.

      However that resonates well with an idea that’s been haunting me lately about how the nearest common ancestor of nuclear Altaic might be the ancestor of all M-T pronoun languages of Eurasia, Fortescue’s shared elements of Uralo-Siberian morphology for example correspond relatively well to Altaic or at least Mongolian.

      • David Marjanović says:

        so adaptable

        Perhaps, but, IIRC, Mudrak didn’t adapt it, he just took what he and his two coauthors had published in the EDAL a few years earlier.

        the nearest common ancestor of nuclear Altaic might be the ancestor of all M-T pronoun languages of Eurasia, Fortescue’s shared elements of Uralo-Siberian morphology for example correspond relatively well to Altaic or at least Mongolian.

        This kind of thing does need more attention. Lots of people seem to implicitly confuse “are these languages detectably related” with “are they more closely related to each other than to anything else”.

        I agree that Uralo-Siberian looks a lot like what you’d get if you’d try to work on Nostratic/Eurasiatic without looking at the language families Fortescue happened not to look at.

    • j. says:

      Maybe you can turn them into series?

      Yes, I’ve been doing that already, when feasible. It keeps you guys posted, but doesn’t do all that much with regard to schedule slip — since the limiting factor is not blogging time, it’s new complications arising that need to be researched in turn. Posts for which I have upcoming sequels of this sort hanging in my drafts include Problems in PIE vocalism; *ä-backing in Finnic; On the epistemology of sound change; Laterals and Palatals (going on part 3); *ś > *š in Mansi; and even Minor Mordvinic Mutations (from all the way back in early 2013!).

      I’d say Altaic is indeed at stage 4, and stage 5 has begun

      Altaic still suffers from the same phonological problem that many other tentative proto-languages do: the inventory remains a bit too close to a “trivial reconstruction” (= stage 2). The amount of proto-phonemes seems artificially elevated, and the later development of the daughter languages does not consist of much else than an inward collapse of this proto-system in various ways. It reminds me of a few early “hyperphonetic” reconstructions of Uralic, where e.g. the following velar stops were posited:

      • Permic /k/ ~ Hungarian k < *k
      • Permic /k/ ~ Hungarian h < *kʰ
      • Permic /g/ ~ Hungarian k, g < *g
      • Permic /g/ ~ Hungarian h < *gʰ

      Our current understanding, though, is reconstructing only a single velar stop *k, which is regularly retracted and spirantized to h in Hungarian before original back vowels, versus semi-regularly voiced to /g/ in Permic before medial voiced stops/affricates (< *NT clusters). Crucially, this also allows discarding some lexical correspondences that do not abide to the regular rules (even if they might show sound correspondences that are regular if taken separately). It’s this kind of insight I expect to see of stage 4 reconstructions: judgements on what is reconstructible to the proto-language, and what isn’t.

      Moscow school Proto-Altaic, by contrast, is overengineered e.g. by a system of three stop phonations when none of the descendants have more than two (sometimes involving ad hoc reshuffling such as *tʃ *dʒ > Mongolic *d *dʒ), a four-way prosodic system when again none of the descendants have more than two, and a 8×5 fully unharmonic system of root vocalism, when all descendants attest something substantially simpler.

      • David Marjanović says:

        (To keep this out of moderation, I’m putting only one link in this post; links to other references will follow.)

        The new developments I was thinking of are:

        – An old anti-Altaic argument is that, while Turkic and Mongolic have words in common that are lacking in the other supposed Altaic branches, and Mongolic and Tungusic do as well, Turkic and Tungusic don’t. That’s the picture expected from contact, while from common ancestry we’d expect random losses leading to about equal amounts of all three kinds of matches. The “preface” of the EDAL has a section near the beginning that triumphantly presents a long list of Turkic-Tungusic matches that are absent from Mongolic. Turkic and Tungusic were never in direct contact before very recent times (some words have of course traveled between Yakut/Dolgan and Ewenki/Ewen), therefore Altaic. Well? There’s that Avar-associated golden bowl from Hungary that has Greek letters on it. All attempts to read them as Turkic are very awkward. Helimski showed they make sense if you read them as something close to Manchu; and Futaky identified Manchu-like words in Hungarian. If (some of) the Avars spoke Tungusic, then we should expect a thin veneer of Tungusic words from sea to shining sea, and Turkic is not likely to have escaped that.

        (But I don’t expect this to have a devastating effect. There seem to be examples of all mathematically possible matches, including Turkic-Japanese.)

        – Similarly, there’s that one Xiongnu sentence phonetically transcribed in Chinese characters and furnished with a Middle Chinese translation. All attempts to read it as Turkic are, again, very awkward. Then Vovin read it as Yeniseian, and suddenly it makes sense. If (some of) the Xiongnu spoke Yeniseian, we should expect Yeniseian words not just in Turkic and Mongolic, but probably also in Tungusic.

        – For Proto-Turkic, the EDAL reconstructed two vowels that have occasionally been postulated before but are not mainstream: *ạ (presumably [ʌ ~ ɤ]) for an /a/ – /ɯ/ correspondence which is otherwise irregular, and *ẹ ([e] as opposed to the normal [ɛ]) for a similar case that shows up as a rare /e/ phoneme in some of the modern languages. At least *ạ can instead be explained as an umlaut phenomenon. I should point out that eliminating these two proto-phonemes would actually make things easier: in the EDAL, the Proto-Turkic outcomes of Proto-Altaic vowels are very often “*a or *ạ” respectively “*e or *ẹ” seemingly at random.

        – The Proto-Altaic *i̯o is explicitly a reconstruction faute de mieux: there was a correspondence without an assigned proto-value, and *i̯a and *i̯u had already been reconstructed but *i̯o had not, so the authors put the two gaps together even though there’s apparently nothing to suggest this approximate sound value for that correspondence. (Not even, as it turns out, in external comparison.)

        – For a long time, Proto-Mongolic and Proto-Tungusic were reconstructed as having rather Turkic-like vowel systems (with front rounded vowels) and Turkic-like vowel harmony. Logically, then, Altaicists used to project the Turkic kind of vowel harmony all the way back to Proto-Altaic. This doesn’t work well, so the EDAL threw this out altogether, reconstructed PA as lacking any kind of vowel harmony, and explained the observed harmonies as the outcomes of umlaut phenomena. Maybe that’s actually going too far. Ko Seongyeon’s thesis on vowel harmony in Mongolic, Tungusic and Korean shows that none of these ever had a Turkic-like harmony, with the single late exception of Oirat/Kalmyk; instead, they all had – and many Mongolic and Tungusic languages still have – tongue-root harmony. The thesis goes on to speculate that the Turkic frontness harmony may have developed out of tongue-root harmony the same way the Oirat/Kalmyk version later did, and states that no trace of any harmony is found in Japonic. Japonic is where the EDAL postulated the greatest number of vowel assimilation phenomena, though.

        – Speaking of Japonic, the EDAL reconstructed a four-vowel system (*/a i u ə/) for Proto-Japonic. Ryukyuists have long said that it’s necessary to reconstruct a six-vowel system with additional */e o/, though it’s not clear (as the EDAL was quick to point out) whether different Ryukyuan languages actually point at the same words. This controversy has raged on, as has the one on whether there’s evidence for */e o/ in Japanese. That’ll have to be sorted out at some point.

        – Have you noticed that the tone systems of EDAL Proto-Korean and EDAL Proto-Japonic are mirror images of each other? Elizabeth M. Boer has a book on academia.edu that reviews the entire diachrony of Japonic tone (and of the research on it) and very convincingly concludes that everyone had been reading the Middle Japanese tone marks upside-down. That shatters the mirror – and means the EDAL Proto-Altaic system is upside-down, too, because the authors made the (consciously, explicitly arbitrary) decision to base their notation on the supposed Japonic rather than the Korean tones. The same book derives the Ryukyuan “word-tone” (pitch-accent?) systems from a southern Japanese one, strongly implying that Ryukyuan is not the sister-group of Japanese but actually nested in it – which makes plenty of historico-geographical sense (reviewed in the book).

        – More is known now than in 2003 about the “Para-Mongolic” languages and on the extinct languages of the Korean peninsula.

        – Altaistics urgently needs a reconstruction of Proto-Eskimo-Aleut.

        The amount of proto-phonemes seems artificially elevated, and the later development of the daughter languages does not consist of much else than an inward collapse of this proto-system in various ways. It reminds me of a few early “hyperphonetic” reconstructions of Uralic, where e.g. the following velar stops were posited:

        Yes and no. On the “no” side, the EDAL does postulate a number of conditional developments, including a Verner-like phenomenon where one of the tones voices one of the plosive series in Mongolic.

        On the “yes” side, yeah, that’s a common phenomenon in the Moscow School. Witness this paper, where on p. 315 it says: “An additional IE fricative should probably be reconstructed for the correspondence Hittite s / Luwian t / Narrow IE 0, as proposed in Ivanov 2001: 133; 2009: 5 and (independently) in Kassian & Yakubovich 2013: 22.” Actually, the correspondence is Hittite s / Luwian t / Narrow IE *s, parallel to Hittite / Luwian k / Narrow IE *h₃, and represents a slightly bizarre but fully regular dissimilation phenomenon in Luwian. The trick is that the example word given for “Hittite s / Luwian t / Narrow IE 0” is the cognate of see (*sekʷ-) in the first two, but the cognate of eye (*h₃okʷ-) in the last.

        Moscow school Proto-Altaic, by contrast, is overengineered e.g. by a system of three stop phonations when none of the descendants have more than two

        Well. Two phonations were reconstructed by all the early Altaicists; that didn’t work out, so they shifted to three. The recent attempt by Robbeets to return to just two didn’t work out either. I agree that it’s odd that the only reflex of the three-way distinction is supposed to be the Proto-Tungusic */g/-*/k/-*/x/ contrast, and most of the other supposed developments are rather odd for the reconstructed proto-values as voiced/plain/aspirated. But then, Altaic has just five basic branches. Suppose you tried to reconstruct PIE based just on Celtic, Balto-Slavic, Iranian, Tocharian and Anatolian evidence: you’d probably try to reconstruct two phonations, but you’d get various confusing hints at a third.

        ad hoc reshuffling such as *tʃ *dʒ > Mongolic *d *dʒ

        Whoa, that’s odd indeed. I must have overlooked it; I’ll try to read up on it.

        a four-way prosodic system when again none of the descendants have more than two

        Conversely, Tungusic has a length distinction in the second syllable, not just in the first one. The EDAL treatment of this can be paraphrased as “some cases are more or less obvious contractions, but in general we don’t understand this phenomenon, so we’re ignoring it for the moment”. An obvious opportunity for future improvement. I also wonder how vowel length may have interacted with consonants.

        and a 8×5 […] system of root vocalism, when all descendants attest something substantially simpler

        Probably that should be 5×5, with the three *i̯- “diphthongs” reinterpreted as *j-. That would neatly explain why *j- was reconstructed as absent word-initially but not word-medially. Also, accidental sampling bias could be a factor again: all Old Northwest Germanic languages already had very restricted vowel systems in unstressed syllables, except for Old High German which had almost no restrictions there at all.

        • Crom daba says:

          I agree with you about *ạ, this is Doerfer’s invention I think (he writes it *ë), IIRC there’s some article where he gives the statistics of Chuvash/Yakut a~ï matches and it looks pretty bad for the theory, but he sticks to it. For all his anti-Altaic scepticism he really like over-engineering his PT reconstruction. Ramstedt and Poppe considered it secondary.

          I was more optimistic about *ẹ, but perhaps Chuvash a~i split could also be explained as an umlaut process of some sort.

          EDAL vocalism is completely irreparable, even with Japanese and Tungusic given full independence and with their infamous semantic latitude in searching for cognates, the average number of possible first vowel matches for a Mongolic word given a Tungusic and Japanese ‘cognate’ exceeds 2, and furthermore there is no sensible pattern underlying these correspondences. The situation is similar for Turkic and Korean.

          Another problem is that it ignores the [ATR] distinction in Tungusic high vowels, treating it as completely recessive.

          Even though I generally agree with the tongue-root harmony reconstruction of Proto-Mongolian (and consider only tongue-root reconstructions of Tungusic as valid) there are some problems with it:
          – *e is generally reflected as front except in Daur, Khamnigan and some Inner-Mongolian dialects (Chahar and Baarin at least), and the first two are heavily influenced by Tungusic.
          – *ü and *ö are written as ‘Ui’ in Uyghur script as in Uyghur, and generally Turkic loans have preserved ‘frontness’ and so do Mongolic loans in Turkic. There are some cases of Turkic *o being adapted as Mongolic *u, but I don’t know any cases of Turkic *u being borrowed as *ü.
          – Phags-pa script writes *ö and *ü as ‘éo’ and ‘éu’ where the same sequence seems to denote palatalization in Chinese (see http://www.babelstone.co.uk/Phags-pa/Description.html).
          – East Yugur also has features front *ö and *ü, although this could possibly be West Yugur influence (However WY has some atypical vowel rotation going on, which could perhaps be EY influence)

          • David Marjanović says:

            *ạ […] is Doerfer’s invention […] (he writes it *ë)

            Oh yes, thanks for reminding me.

            – *e is generally reflected as front except in Daur, Khamnigan and some Inner-Mongolian dialects (Chahar and Baarin at least), and the first two are heavily influenced by Tungusic.

            Ko’s thesis actually shows that [ɛ] and [ə] are each found in about half of the Mongolic varieties and argues that the former repeatedly evolved from the latter. First, we’d theoretically expect [ə] to be the -RTR counterpart to [ɑ]; second, all these vowel systems based on -RTR have a huge gap in the lower front quarter of the vowel chart, so nothing stops [ə] from drifting into it.

            Similarly, the -RTR /u/ and /o/ of modern Halha (at least the short ones) aren’t really back [u] and [o], they’re central [ʉ] and [ɵ]. The Oirat/Kalmyk development is then just the logical continuation of this drift. Also, as a native speaker of German, I hear [ʉ] and [ɵ] as closest to my own /ʏ/ and /œ/, so I’m not surprised if medieval Turks and Tibetans did the same (…and that contemporary Russians are completely confused as shown by their transcriptions).

            East Yugur also has features front *ö and *ü

            Not according to Ko, IIRC; I’ll check.

            • Crom daba says:

              Ko’s thesis actually shows that [ɛ] and [ə] are each found in about half of the Mongolic varieties and argues that the former repeatedly evolved from the latter.

              Perhaps half of variants of Mongolian proper, and even then there are many cases of *e being merged into /i/ (in Baarin for example).
              Non-central languages show a different picture:
              – In East Yugur non-first syllable high vowels (*i, *u, *ü) become [ə] while *e is reflected as /e/.
              – Bonan centralizes *i while raising (and fronting?) *e to /i/.
              – Mongghul, Mangghuer and Santa split *e into [ə] and [ie] (generally [ə] after velars and [ie] elsewhere, but with many other complications) due to Mandarin influence, this to me suggests an originally palatal sound backing due to Mandarin phonotactic constraints.
              – Mogholi mostly has [e]/[ɛ]/[ei] with [ə] in absolute final position (and some assimilatory phenomena in non-first syllables)

              First, we’d theoretically expect [ə] to be the -RTR counterpart to [ɑ]; second, all these vowel systems based on -RTR have a huge gap in the lower front quarter of the vowel chart, so nothing stops [ə] from drifting into it.

              Yeah that’s the idea, although I think the change is pre-Proto-Mongolic (or maybe slightly later than Daur splitting off)

              Similarly, the -RTR /u/ and /o/ of modern Halha (at least the short ones) aren’t really back [u] and [o], they’re central [ʉ] and [ɵ]. The Oirat/Kalmyk development is then just the logical continuation of this drift.

              /u/ is actually only very slightly fronted and it’s mostly a short-vowel reduction thing, here’s a formant graph from Svantesson et al.

              Also, as a native speaker of German, I hear [ʉ] and [ɵ] as closest to my own /ʏ/ and /œ/, so I’m not surprised if medieval Turks and Tibetans did the same

              I view centralized rounded vowels the same as front rounded, as long as a language pairs these rounded vowels by F2 it’s clearly not a [RTR] harmony situation ([-RTR] ‘tense’ vowels are supposed to be less centralised).

              One problem is how to explain just how *ö got in between *u and *ü in Khalkha without at least some fronting. One idea I have is *ö could have came about through a rounding of **ə (such rounding is common in later Mongolic history) starting the process of gradual fronting of an originally pure [RTR] vowel system (think Evenk or even Nivkh).

              (…and that contemporary Russians are completely confused as shown by their transcriptions).”

              I remember reading that Russians originally parsed Tungus [-RTR] words as containing palatalised consonants

              Not according to Ko, IIRC; I’ll check.

              The reflex of short *ü is usually transcribed [ʉ] and long *ü as [y], *ö is described as “ø” (Yunast) “rounded, front, mid” (Todaeva), “ö” (Nugteren and Roos) with [o] as an allophone.
              In any case it’s not [u] as in Daur, Khamnigan, Buryat, Khalkha, Santa, Mogholi…

              P.S. How do I make proper quotes in wordpress?

              • j. says:

                P.S. How do I make proper quotes in wordpress?

                <blockquote> HTML tags work. (I’ve added some to your previous post.)

              • David Marjanović says:

                I had misremembered. Ko’s thesis (p. 85–88) reviewed the literature from 1981 to 2005 and found little agreement in it on such things as the number of phonemes. There is agreement on the existence of “e” and front rounded vowels, but Ko conjectured that the latter could be the outcome of i-umlaut as found in several other Mongolic languages. On top of that, he reported that Junast (1981) found variation between [ø] and [e] in suffixes and stems.

                Thanks for the chart! I didn’t know we could post images here; most blogs don’t allow that.

                • David Marjanović says:

                  There’s a table on p. 237 “(modified from Svantesson et al. 2005, p. 180)” which, without comment, lines up Khalkha /a ɔ ʊ e o u i/ with EY /a ɔ ʊ e ø u ə/ – what, is there no /i/ in the language?

              • David Marjanović says:

                More confusion: this page explains that, in the Mongolian script, e has a separate version that only occurs in foreign words, suggesting that this is [ɛ] as opposed to the normal [ə]. In the preceding paragraph, it links to two videos of an Inner Mongolian “alphabet” song (which presents all the open syllables), and there, there’s a completely front mid vowel [e̞] with no trace of [ə] anywhere. Is that a feature of professional singing…?

                Ko’s thesis starts with a presentation of a vowel system that went from /a i u ə/ to /a i u/ by merging /ə/ into /i/ and apparently losing the sound [ə] completely.

                On the EDAL *i̯-, I forgot there are cases where *i̯a and *i̯o but not *i̯u have the same effect on preceding consonants…

          • David Marjanović says:

            – *ü and *ö are written as ‘Ui’ in Uyghur script as in Uyghur, and generally Turkic loans have preserved ‘frontness’ and so do Mongolic loans in Turkic. There are some cases of Turkic *o being adapted as Mongolic *u, but I don’t know any cases of Turkic *u being borrowed as *ü.

            Perhaps the Turkic *u was closer to [ʊ] than to [u]. That’s the case at least in Kazakh and Turkish nowadays, IIRC. Certainly the Turkish ı isn’t a cardinal [ɯ], but instead an unrounded version of [ʊ] or thereabouts. That would account for the treatment of Turkic *u [ʊ] as Mongolic *u [ʊ].

            And if that’s the case, the i in the Uyghur ui might indicate height rather than frontness. The Phags-pa é might then be a transcription of that. But I’m just speculating here.

        • j. says:

          the EDAL reconstructed two vowels that have occasionally been postulated before but are not mainstream

          I recently happened by random upon Vovin’s article On Accent in Chuvash (from 1994) where he cranks this up to 13, including not just these (he has *a versus *ɑ for EDAL’s *ạ versus *a), but also a Proto-Turkic *ə (for Common Turkic *ö ~ Chuvash /u/), Proto-Turkic *ɒ (CT *a ~ Chuv. *u) and Proto-Turkic *ɯ (CT *u ~ Chuv. /ɨ/). This is on top of a high/low tone distinction that triggers vowel reduction in Chuvash. This all is a prime example of a stage 2 “reconstruction” that probably should be eventually reducible to a bunch of conditional changes — and which would then also line up much better with the position to Chuvash as a part of the Volga-Kama language area, where there has been extensive vowel rotation also in Mari, Permic, Bashkir-Tatar, and to a lesser extent Mordvinic. In all of these, it is moreover vowel quality that triggers stress, not stress that triggers vowel reduction. (The same pattern extends also to Ugric and partly Eastern Iranian.)

          Two phonations were reconstructed by all the early Altaicists; that didn’t work out

          Well, yes: assuming that there was a Proto-Altaic, I would suppose that it had a more complex stop system than just the basic 2×4 (*p *b *t *d *tʃ *dʒ *k *g). But I would rather explore more complexity in places of articulation, e.g. *q for Tungusic /x/ ~ elsewhere /k/, or palatalized *tʲ *dʲ for dental/palatal interchanges (be they original, or secondary in some branches).

          Suppose you tried to reconstruct PIE based just on Celtic, Balto-Slavic, Iranian, Tocharian and Anatolian evidence

          To be slightly contrarian, I’m not convinced PIE had three stop phonation series: I think a reconstruction with two series plus a prosodic feature that triggers stuff like breathy voice or vowel shortening might be a good idea to explore (for just one thing, it would completely explain the attested root structure contrasts).

          Probably that should be 5×5, with the three *i̯- “diphthongs” reinterpreted as *j-.

          This sounds like a poor idea, considering that (1) this would mean *Cj clusters but no others word-initially, and 2) they trigger an extensive amount of vowel developments but almost no consonant developments. Whatever there is here, it clearly needs to be treated as a part of the vowel system.

          • David Marjanović says:

            I’m surprised Vovin did this, actually. Tones for Proto-Turkic, at the expense of making it look more like EDAL Proto-Altaic? :-)

            But I would rather explore more complexity in places of articulation, e.g. *q for Tungusic /x/ ~ elsewhere /k/, or palatalized *tʲ *dʲ for dental/palatal interchanges (be they original, or secondary in some branches).

            Good point.

            To be slightly contrarian, I’m not convinced PIE had three stop phonation series: I think a reconstruction with two series plus a prosodic feature that triggers stuff like breathy voice or vowel shortening might be a good idea to explore (for just one thing, it would completely explain the attested root structure contrasts).

            Concerning breathy voice, this has been proposed before, in several different ways (e.g. from a glottalist angle where implosives inhibit the spread of breathy voice…), but I’m not convinced it works well at least for synchronic PIE. Just recently I read a paper with a statistical argument that the constraint on *T-Dʰ roots doesn’t even exist – there are enough such roots in the LIV that either the constraint was no longer active in PIE or we should look for a constraint on syllables rather than roots. There’s definitely research to be done here.

            this would mean *Cj clusters but no others word-initially

            Many kinds of Chinese have Cj and Cw clusters but no others syllable-initially. And Proto-Altaic is reconstructed without a w.

            2) they trigger an extensive amount of vowel developments but almost no consonant developments.

            The former could be due to pressure to lose the only kind of word-initial consonant cluster. (I’m reminded of the non-initial development *-ja- > OHG -ē- in the absence of any other umlaut phenomena that worked backwards.) The latter… well, there are some, and perhaps some (or a lot) have been overlooked! Lots of fun for future research. ^_^

          • David Marjanović says:

            Oh, more on Altaic vowel systems that the EDAL didn’t take into account: several Tungusic languages and apparently Chahar distinguish -RTR /i/ from +RTR /ɪ/. Pages 178–179 of Ko’s thesis present evidence that the Middle Korean /i/ resulted from a merger of two such phonemes: some stems with /i/ take -RTR suffixes, some take +RTR suffixes, some are inconsistent. I wonder if all these */ɪ/ line up…

            • Crom daba says:

              There’s a table on p. 237 “(modified from Svantesson et al. 2005, p. 180)” which, without comment, lines up Khalkha /a ɔ ʊ e o u i/ with EY /a ɔ ʊ e ø u ə/ – what, is there no /i/ in the language?

              /ə/ surfaces as [i] when long or after palatal consonants (same is true for /u/ and [y]) and the length distinction is unstable which made or will make /i/ phonemic.

              Oh, more on Altaic vowel systems that the EDAL didn’t take into account: several Tungusic languages and apparently Chahar distinguish -RTR /i/ from +RTR /ɪ/. Pages 178–179 of Ko’s thesis present evidence that the Middle Korean /i/ resulted from a merger of two such phonemes: some stems with /i/ take -RTR suffixes, some take +RTR suffixes, some are inconsistent. I wonder if all these */ɪ/ line up…

              I mentioned the Tungusic RTR high vowels in some previous comments, it’s not just /i/ and /ɪ/, there’s also /ʊ/ and /u/ and there are even some cases of North /ɪ/ ~ South /ʊ/ implying a +RTR *ü, but it may be influence of neighbouring +RTR vowels (there are some words containing only /ʊ/ and /ɪ/ however).

              Chahar is a different case, pre-vowel loss /i/ was recessively -RTR /ɪ/ appears in exactly those cases where /i/ was or is followed by a +RTR vowel, the same cases where Khalkha gave /ʲa/. In contrast with Tungusic, there are possibly no words originally consisting only of /i/ and being +RTR.

              I seem to remember some Vovin article about pharyngeal harmony in Korean being contact induced, can’t find it now though. Other languages in the region also have vowel harmony: -Nivkh has /a/ /e/ /o/ vs. /ə/ /i/ /u/ which conditions velar/uvular distinction.
              -Yukaghir has /a/ /o/ /i/ vs /e/ /ö/ /i/ /u/ with the same guttural synharmony, which looks like a RTR system turning palatal.
              -Chukotko-Kamchatkan has /a/ /o/ /e/ vs /æ/ /u/ /i/ with no guttural synharmony.
              -I remember someone (the amaravati guy?) comparing Old Chinese type A/type B syllables with Altaic harmony.

          • David Marjanović says:

            I’ve reread the relevant parts of the EDAL “preface”. *tʃ *dʒ > Mongolic *d *dʒ is indeed proposed, and I agree that reconstructing *tʲ *dʲ looks promising.

            I had forgotten that the aspirated and unaspirated obstruents differ in that only the former are voiced by a following “high” (actually low) tone in Mongolic (labial only) and Japanese.

            Given the presence of a lateral fricative in the system, as assured by Wanderwort evidence at least for its Proto-Turkic reflex, I also wonder about lateral affricates. The mysterious, rare *š for example becomes *č in one of the branches…

            Thanks for all the information on East Yughur, Chahar, Yukaghir and Chukotko-Kamchatkan, and for the thesis draft on Tungusic; I’ve started to read it. :-)

            • j. says:

              Resolving *lʲ versus *š with a compromise reconstruction *ɬ is hardly “given” after just a few papers, though, it’s just one new proposal dropped into a prolonged debate. If Japanese can get /s/ from *l₂, that still does not tip the scales; this could involve something similar to Spanish ll, which drifts in parts of Latin America from /ʎ/ to /ɟ/ to /dʒ/ to /ʒ/. My impression is cautiously in favor of something lateral, but that’s assured already by how we have multiple lateral proposals (*ɬ, *lʲ) but only one non-lateral (*š).

              The problem with loanword evidence is that there’s evidence from both camps. Instead of trying to dismiss one side of the evidence, I am already on record as stating that this probably signifies a period of etymological nativization where an Oghur *l₂ could be substituted as Common Turkic *š, and vice versa (probably similarly also with *r₂ ~ *z).

              • David Marjanović says:

                Very good point. However, a word coming from the south and/or east should reach East Turkic before West Turkic, so we’d probably have to assume that [ɬ] was borrowed directly as [ʃ] and then etymologically nativized as [l]… by no means impossible, but less parsimonious.

                Something I forgot to mention is that the Old Turkic runes for the reflexes of *ń, *l₂ and *r₂ are derived from those for n + back vowel, l + front vowel and r + back vowel by the addition of a horizontal to slanted stroke. I think this argues that the Old Turkic reflexes of all three were not palatalized, but merely not velarized: perhaps a plain palatal [ɲ], a [ɬ] (which I find harder to velarize than [l], so perhaps it wasn’t velarized in back-vowel words), and something like the Czech ř.

                Also: in Old Korean, certain words that have /l/ today were rendered with Chinese characters pronounced with Middle Chinese “š” (I guess that’s /ʂ/ rather than /ɕ/, if only because the author is Polish, but I don’t know). Unfortunately, Old Korean was completely left out of consideration in the EDAL.

                While I’m already posting more than one link, here’s some argumentation against *ạ (pp. 22–28, 33–35, 39, 41, 44–45, ?98) and *ẹ (57, 60, 69, 74, ?107–108, ?110).

                BTW, ll = y is voiceless [ʃ] in most of Argentina.

                • j. says:

                  a word coming from the south and/or east should reach East Turkic before West Turkic

                  It’s not clear if the Oghur / CT division has always been on a west / east axis: there are Oghur-type (rhotic / lambdaic) loanwords in Samoyedic, and many alleged Altaic cognates have also been explained as Oghur-type loanwords into Mongolic (and thence Tungusic). They might be just archaisms pointing to original rhotacism / lambdacism; but they also might not.

                  In my uninformed impression: the deepest divisions within Common Turkic seem to be in the northeast (Yakut vs. varieties of South Siberian; the latter itself also just an areal unit AFAI gather, as its supposed defining isoglosses *j > č, *j-N > ń-N are shared with southern Samoyedic). So perhaps the original setup was Common Turkic to the north of (pre-)Oghuric. This applies perhaps especially if the whole “Common Turkic” unit is itself areal: it doesn’t seem to have much other defining features than zetacism and sigmatism, which can definitely be analyzed as either archaisms or as expansive areal innovations. So the dialects ancestral to some modern Common Turkic languages might have been “secondarily de-Oghurized” at a later date, perhaps due to the prestige influence of whatever dialect was currently spoken by the dominant Turkic groups, and the original locus of zetacism and sigmatism could have been more limited.

                  How well is the historical phonology of smaller Siberian varieties like Chulym, Shor or Tofa known anyway? I keep seeing in overview sources references to significant lexical diversity plus complicated prosodic systems, and that suggests that there could be other kinds of diversity lurking there as well.

                • David Marjanović says:

                  Interesting points.

                  many alleged Altaic cognates have also been explained as Oghur-type loanwords into Mongolic (and thence Tungusic)

                  AFAIK, that has only been done by people who have already concluded on lambdatism and rhotacism on other grounds and therefore posit a thick fat layer of Oghur loans, down to surprisingly basic vocabulary that would not normally come from a superstrate, in Mongolic (and thence Tungusic), despite the lack of any other evidence for this scenario. Seems rather unparsimonious to me.

        • j. says:

          Conversely, Tungusic has a length distinction in the second syllable, not just in the first one.

          On this topic:FWIW we have a guy at Helsinki who just finished last year his Master’s thesis on vowel length in Proto-Tungusic (draft up on Academia.edu) — in Russian, of course. I don’t recall him doing any Altaic comparison, but this will hopefully shed some light on that problem anyway.

          • Crom daba says:

            Nice paper, bookmarked his academia.edu page.
            It clears up why some sources reconstruct only a single */i/ for PTM, supposedly all *Ci roots are -RTR and CiCi roots are +RTR.
            This is also the first time I heard about Orok ө, materials I have don’t note it, I thought Even ө was secondary but if Orok supports it I guess it’s legit Proto-Tungusic.

          • David Marjanović says:

            And the second thing he does (the first being to define the languages) is to establish and explain a whole new system of transcription, “where possible close to” the one used in the famous first dictionary, of which “variants” are used in the “majority” of Tungusological works in Russian. *headdesk* Naturally, the explanation is too short; it doesn’t tell whether
            [ъ э ъ̇] are [ɘ ə ɐ], [ə ɜ ä] or whatever, or actually even whether [ɵ] is [o] or [ɤ]. Maybe that’ll turn out to be irrelevant as I read on, but sometimes phonetic pedantry is crucial. At least the tongue root is mentioned!

            (Helpfully, this system suggests that the ъ ъʷ of the EDAL are [ɘ ɵ]. But who knows.)

          • David Marjanović says:

            I just finished reading it. No Altaic comparison, but plenty of stuff that will be useful for that, including criticism of at least one Proto-Tungusic root reconstructed in the EDAL. Tongue root harmony is indeed reconstructed, with */iː/ counting as -RTR in the first syllable but as neutral elsewhere.

            I wonder, of course, about some of the phonetic details. For instance, */uə/ becomes /oː/ in Ewen and Orok, which strikes me as backwards; maybe */oː/ and */uə/ were really *[oə] and *[oː], respectively…

          • David Marjanović says:

            And here’s the pdf of a paper Talvitie cites, in English this time (Ko, Joseph & Whitman 2014); it argues for RTR harmony in Proto-Tungusic and specifically against its harmony-free reconstruction in the EDAL.

        • David Marjanović says:

          I didn’t remember that the EDAL position on Proto-Turkic vowels is more complex. IE is so much easier…

          EDAL 138–139:

          >>
          9. One of the most complicated problems in Turkic reconstruction is the distinction of open/close *e vs. *ẹ, *a vs. *ạ.
          Close *ạ was reconstructed by O. Mudrak (see Мудрак 1993, Мудрак Дисс.) for the correspondence Turk. a – Chuv. ɨ, Yak. ɨ. Let us mention that Yak. can also have a secondary -ɨ- < *a in front of -j-, cf. ɨj ῾moon’, kɨj̃at ῾wing’, ɨj- ῾show, describe’.
          As to the reconstruction of *e and *ẹ, no final agreement has been reached so far. In the dictionary we have adopted the reconstruction of O. Mudrak (as proposed in Мудрак 1993, Мудрак Дисс.), but A. Dybo still keeps her own views, presented in Дыбо Дисс., РР 39-44. Both researchers agree that the Oghuz distinction of open *e : close *ẹ is not original. The distribution of e (=ä) and ẹ (=e) in Azerbaidzhan is complementary, e occurring after j-, in front of š, č and the Common Oghuz *j (not in front of the secondary j Yak. ie, Chuv. a; *ẹ̄ > Yak. ī, Chuv. i. O. Mudrak additionally introduces a “labialized” e, which yields complicated reflexes in Chuvash (in particular, i in front of l), while the Oghuz languages reflect it as e independent of neighbouring consonants; examples of this eʷ are few and this phoneme has not been adopted in the dictionary.
          According to A. Dybo, the opposition of *ē vs. *ẹ̄ in Oghuz goes back to Common Turkic and is additionally reflected in Khalaj:
          *ē : Oghuz *ē, Khal. ǟ, Yak. ie, Chuv. a
          *ẹ̄ : Oghuz *ẹ̄, Khal. īe (ä after initial h-), Yak. ie, Chuv. a
          For a small number of examples where Oghuz, Yakut and Chuvash have a variation of close and open reflexes (and Chuvash sometimes j+vowel) she reconstructs PT *e (or *ẹ) followed by *-j- as the first element of a consonant cluster. In Chuvash initial *ej- of this type apparently gave rise to a rising diphthong; the following reconstructions are proposed:
          *ẹj : Oghuz *ẹ, Yak. e, Chuv. -i-/jə-, i-, Khal. ä
          *ēj : Oghuz *ē, Yak. ie, Chuv. -i-, Khal. īe
          *ẹ̄j: Oghuz *ẹ̄, Yak. ī, Chuv. -i-/ja-, Khal. īe.
          The details of the reconstruction, as well as precise origins of this Proto-Turkic distinction are yet to be established.
          <>
          Like Turkic, Proto-Mongolian and Middle Mongolian possessed vowel harmony, which has to a large extent disintegrated in modern languages, especially in Southern Mongolian. All words were subdivided into two types: “front” (with the vowels *i, *e, *ü, *ö) and “back” (with the vowels *i, *u, *o, *a): the vowel *i, therefore, was neutral in respect to vowel harmony.
          In the chart below we give only correspondences of the vowels of the first syllable: although the non-initial vowels are well enough recorded in MMong. and preserved in WMong., in all modern languages they became hopelessly reduced, and their quality may for the most part only be restored on the basis of the behaviour of the initial vowel.
          <>
          All vowels except *o could occur both in the first and the following syllables. Unlike Turkic and Mongolian, Proto-Tungus-Manchu appears to have had no vowel harmony. Some restrictions on the coexistence of different vowels in adjacent syllables were, however, present: the back vowels *a, *o could not be combined with the front vowel *e; *u could not follow *o, *ü could not follow *i.
          All modern languages have developped a specific variety of vowel harmony (probably under Mongolian influence): every word may be characterized as “back” or “front”, depending on the particular combination of vowels. Words with -a- or -o- in the first or second syllable are always “back”; words with -e- in the first or second syllable are always “front”. The -i- and -u- vowels are neutral, i. e. they may occur both in “back” and “front” words (but frequently have different allophones, depending on the row of the word). The *-ü- vowel usually occurs in “front” words, but combinations *aCü and *oCü seem also to be attested. Velars shift to uvulars in “back” words, but are preserved in “front” words. It should be mentioned that the combinations of the neutral vowels -u- and -i- are usually treated as “back”, with velars shifting to uvulars in combinations *CiCi, *CuCu, *CiCu and *CuCi, although there may be occasional variation.
          <<

          Yeah, I like Talvitie's explanation for "seem also to be attested" and "there may be occasional variation" better. Also, the correspondence table in the EDAL doesn't mention the Manchu /ʊ/ at all, perhaps misinterepreting it as [uː] because of the silly traditional transcription ū.

          P. 165:

          >>
          The phonetic nature of ə and ă is debatable: it is most probable that ə was originally a front *e, while ă was a mid-high vowel like ə or ʌ (it is also worth mentioning that ă is the only MKor. vowel that did not occur word-initially). Throughout the dictionary we use the traditional transcription.
          Like Turkic and Mongolian, Middle Korean possesses vowel harmony. Within a polysyllabic word only the vowels a/ă/o or ə/ɨ/u could be combined with each other (with a few orthographic variations); the vowel i was neutral and could occur in any of the word types. This information can be used for trying to interpret the Proto-Korean system: one of the possible interpretations is, e.g., treating o as *u, ă as *o, ə as *e, ɨ as *ö and u as *ü. Such a treatment, however, would be only speculative: while rendering of Chinese characters gives indeed good reason to think that ə goes back to *e, there is no evidence from Sino-Korean that ă and ɨ were labialized. In many cases, ă and ɨ do indeed go back to Altaic labialized vowels (see above), but by no means always: ɨ can also go back to *i, and ă to *ia, see above. It is thus best to regard the MKor. (and PKor.) system as a result of a number of different phonetic processes and restructurings, and we preserve the above system of symbols for “Proto-Korean”.
          <>
          There may be some indications in Ryukyu (basically Okinawa) dialects
          of the existence in PJ of a vocalic length distinction; the problem is,
          however, far from clear and requires further investigation.
          <<

          The questions of *e and *o are not mentioned in the EDAL at all, but in the reply to Vovin's review; I had misremembered.

          • Crom daba says:

            It seems odd to talk about *ẹ and *e without mentioning Old Turkic, where putative *ẹ is written as i in Orkhon and has its own special character in Yenisei inscriptions. I guess the main question is does it line up with Chuvash i/a split.
            However there are some claims that *ẹ is merely long *ē, Erdal’s “A grammar of Old Turkic” has a nice writeup on the details.

            • David Marjanović says:

              The table of “basic” vowel correspondences (p. 147–148) has columns for “OUygh” and “Karakh”. In both of them, the reflexes of long and short *ẹ are “(i)”, while those of long and short *e are “(e)”; the parentheses are nowhere explained and don’t occur anywhere else in the vowel or consonant tables. Two “Orkh.” words are cited on p. 140, but they don’t have any of these vowels.

              • David Marjanović says:

                It is supposed to line up with Chuvash; indeed, long and short *ẹ are given as the only source of Chuvash /i/, and long and short *e as the only source of Chuvash /a/.

                • Crom daba says:

                  If it does, then a case for *ẹ is much stronger than for *ạ being attested in the oldest inscriptions of the language.

  4. As Mary Haas once wrote, classification is a by-product of reconstruction (sorry, I can’t verify the exact quote now). Maybe I am mistaken, but it seems to me that the idea that one must somehow prove the relationship before one starts reconstructing is rather new, and was popularized mainly by Johanna Nichols. Her main argument is that there are contradictory reconstructions of, e.g., Proto-Afroasiatic, so reconstruction proves nothing. But in fact different reconstructions may have different degrees of certainty: compare Redei’s Proto-Uralic with Janhunen’s Proto-Uralic. Of course, it is possible to compile a table of spurious phonetic correspondences, “confirmed” by a relatively large amount of poorly analyzed data (a recent hypothesis of a Basque-Indoeuropean relationship shows this quite clearly). What is more important, such a spurious reconstruction can be done even if languages in question are indeed related (this is the case of at least one of competing Proto-Afroasiatic reconstructions). So, a quasi-reconstruction can have a core of essentially correct etymologies and sound correspondences. But are we really unable to distinguish between 1) completely spurious “reconstructions”, 2) quasi-reconstructions with a core of correct etymologies, and 3) real reconstructions?

    • j. says:

      We (and also me vs. Nichols) may mean slightly different things by “reconstruction”, where I am closer to the last distinction you draw.

      It’s just about trivial to assert proto-forms and connecting developments between any words whatsoever (Proto-Forexamplese *bʰara ‘skill, expertise’ > *pʰaa > *faː >> Mandarin 夫 ‘skill, effort’; *bʰara > *barə >> English bar ‘licence for a skilled occupation’). But to claim that this is a reconstruction is a stronger claim than the existence of a vague connectability, or even any vague roundabout actual connection: it is a claim of historical reality. Since zillions of competing reconstructions can be asserted, of which only one can be historical, any assertion has to come with strong arguments for its correctness before we can say that it in fact is (even approximately) a reconstruction. “Reconstructions” of Proto-Afro-Asiatic have been asserted, but as long as none of them withstand methodological scrutiny, they would better be called something like “correspondence pools” or “pseudo-reconstructions”. (This comes close to my division of individual reconstructed items as being *asterisked reconstructions proper, #hashed pseudo-reconstructions and **double-asterisked falsified non-reconstructions.)

      A typical part of this argumentation involves the usual Neogrammarian framework of regularity of sound correspondences. Another though is the semantic and morphological reliability of the etymologies involved. Your 2015 article with Kassian and Starostin on Indo-Uralic makes this point quite clear, I think: any demonstration of a disputed relationship should be spearheaded by the strongest evidence, i.e. items whose occurrence in the families’ prospective proto-languages is inferrable already without external evidence, and whose semantic matches are impeccable.
      On Forni’s Basque-IE comparison, I would say it fulfills the requirement of regular sound correspondences reasonably well, but where he really cheeses the data is the morphological side, by cherry-picking secondary formations from IE daughter languages and projecting them to the PIE level.

      • David Marjanović says:

        Just yesterday I read Kassian’s review of Forni’s paper. The morphology is indeed atrocious; and while the sound correspondences are regular, most of them are just “everything disappears in Basque, and the Basque /h/ comes out of nowhere”, so they don’t constrain the situation much.

        That said, I do wonder if Basque *hargi and PIE *h₂argʲ- (as in argentum) are really old loanwords one way or the other.

      • “This comes close to my division of individual reconstructed items as being *asterisked reconstructions proper, #hashed pseudo-reconstructions and **double-asterisked falsified non-reconstructions.”

        I find very useful Peiros’s fivefold typology of reconstructed protoforms (see his paper “Macro Families: Can a Mistake be Detected?”, pp. 274-275 http://starling.rinet.ru/Texts/mistake.pdf), especially his term “pre-reconstructions” (these are “not based on a proper set of phonological correspondences but only on the intuition of the linguist who introduced them”).
        As for our (in)ability to tell real reconstructions from various kinds of pseudo-reconstructions, especially if the latter are based on explicit phonological correspondences, I think that there are several additional criteria. You already mentioned semantic and morphological reliability. There are also some quite trivial considerations that are nevertheless often ignored in proposed long- (and short-)range reconstructions. For example, phonetic correspondences must form a system, i.e. compared items must contain several correspondences from the proposed set. This may seem evident, but consider the following protoforms: *=Vt ‘leave’, *cV ‘one’, *kV ‘hand, arm’, *…ƛ’ / *… ƛƛ’ ‘yoke’, *…ƛ / *… ƛƛ’ ‘wild boar; pig’, *…š… / *…s… ‘year’, *=šš ‘weave’, *D=Vk’ ‘burn’, *(=D=)Vc’ ‘know’ (D is not a proto-phoneme, but a symbol for gender affix) [J. Nichols “The Nakh-Daghestanian consonant correspondences.” // Current trends in Caucasian, East European and Inner Asian linguistics: papers in honor of Howard I. Aronson. Amsterdam, 2003, pp. 207-264].
        One more useful crirerion is the presence of correspondences for morpheme structure, especially root structure. This means that 1) words from compared languages are morphologically analysed, and 2) we can state, e.g., how the canonical root shape of language A corresponds to the canonical root shape of language B. For example, Forni’s Basque-Indo-European hypothesis is unable to explain how the well-known constraints on the shape of PIE roots are connected with (also known) constraints on the shape of Proto-Basque roots, or where do the latter come from.

        • David Marjanović says:

          That’s a useful paper!

        • j. says:

          For example, phonetic correspondences must form a system, i.e. compared items must contain several correspondences from the proposed set. This may seem evident, but consider the following protoforms:

          I see your point, though I would think that this does not completely torpedo our ability to work with language families featuring monophonemic roots. In principle, in these situations we can take a step back and consider correspondences as acting on features rather than phonemes. A proto-phoneme like *ƛʼ can be decomposed to at least two features: lateral affricate and ejective (maybe also e.g. short or unpalatalized), which we would hope to show consistent correspondences already on their own. We can then consider the co-occurrence of these feature-level correspondences to build up a system of sorts.

          This type of “rule symmetry” can be found often enough even in relatively small phonological subsystems (say, Eastern Finnish aa ää > oa eä), but the larger a phonological system is, the more this approach can improve the reliability of the assumed sound changes. If there is, for example, a 2×3×8 (labialization × phonation × POA) grid of stop/affricate consonants, we would hope to find 2+3+8 = 13 basic feature-level correspondences, plus maybe a handful of “correction factors”, instead of a full 2×3×8 = 48 completely independent correspondences for each.

          (In a sense, this does not strike me as being too different from the analysis of syllables into phonemes. A coda or a medial cannot occur in isolation any more than a phonation or a secondary articulation can…)

          — The Peiros paper is looking interesting, more comments once I’ve read it.

    • David Marjanović says:

      that the idea that one must somehow prove the relationship before one starts reconstructing

      It certainly doesn’t occur to me as a biologist. But then, the single origin of all known life is plain obvious: there are a lot of universal features that could be otherwise (some alternatives have even been made in the lab) but aren’t. We can take for granted that everything is discoverably related at some time depth.

      Also, only creationists and historical linguists even use the word “prove” anymore when talking about science. This black-or-white thinking doesn’t do any good.

  5. David Marjanović says:

    With Talvitie’s thesis at the latest, Proto-Tungusic is in stage 5 or 6: he often presents sets of comparanda (some of which were accepted by Cincius), finds irregularities and then attributes them to borrowing from Mongolic or Pre-Yukaghir.

  6. Blasius B. Blasebalg says:

    It reminds me of a few early “hyperphonetic” reconstructions of Uralic…

    That’s an intriguing example!
    And it very clearly demonstrates that the step towards a more realistic reconstruction requires hard work.

    I have long been convinced that proof of relatedness requires reconstruction. It may not only be “trivial”, it may even be implicit (given languages so closely related that you might even chose one of them as a trivial proto-stage). Such a reconstruction needn’t claim to come close to the actual proto-language; I view it more as a demonstration that such a reconstruction is indeed possible, which makes the relatedness of the material more likely.
    As far as I understand, this corresponds to stage 2 in your workflow model.

    Not to put too sharp a point to it, I don’t need to see Proto-Bantu or Proto-Athabaskan to believe in these families – they come with obvious implicit reconstruction possibilities.
    But I really haven’t seen much convincing reconstruction material for East-Sudanic, or Altaic. And while the existence of Niger-Kongo seems firmly established, it gets fuzzy at the edges – Does Mande really belong there? Songhai? Is Kordofanian even a family? Strict reconstruction helps to either confirm such connections, or otherwise uncovers them as unreal.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

%d bloggers like this: