Workflows in historical linguistics

A few too many of my blog posts seem to end up ballooning into mini-articles and consequently spend months if not years languishing in my drafts. Let’s see if I can keep this one brief.

An adage sometime seen in historical linguistics is “classification before reconstruction”. On one level, I agree. But, on a few others, this seems to be often abused as an excuse to skimp on proper rigor.

What this means, in my opinion:

  • It’s not possible to do comprehensive comparative reconstruction work with data from unrelated languages. Reconstruction can only be attempted once we have a reasonable amount of certainty that some particular language family exists at all.

What this does not mean:

  • Classification having to precede work in historical phonology entirely. Realiable classification cannot be done by vague casual eyeballing of data. “A reasonable amount of certainty” for the relatedness of some particular languages requires being able to locate regular sound correspondences within their shared vocabulary (preferrably non-trivial ones, but any regularity is a start). [1] In the absense of regular sound correspondences, all vocabulary comparisons can potentially be suspected to be either coincidental, or loanwords rather than strict cognates.
    In other words: sound correspondences are not reconstructions, in themselves. In the case of binary comparison, this distinction may end up blurred, since it’s possible to kind of put together an initial “trivial reconstruction” by just listing all your correspondences, and giving each of them some kind of a vague phonetic label. [2] If the family has more members, though, the bare sound correspondences typically end up looking more like networks — since sound correspondences are not transitive. If /tʃ/ in language 1 can correspond to /s/ in language 2, and /s/ in language 2 can correspond to /h/ in language 3, this does not automatically guarantee that a correspondence /tʃ/ ~ /h/ between 1 and 3 would be demonstrable, or even expected at all. Perhaps /s/ in language 2 is a merger of two separate proto-phonemes; perhaps these correspondences do continue the same proto-phoneme, but under mutually exclusive conditions; perhaps one of these correspondences indicates loanwords after all and not native vocabulary.
  • Subclassification having to precede reconstruction. On the contrary, it is reconstruction that often allows us to put together arguments in favor of subgroups, by providing a root for our sound correspondences. If we have a correspondence such as t ~ t ~ s ~ s, it’s likely that either the t-group or the s-group has innovated, and constitutes a subgroup. But it is also very possible that the other group has not, and is paraphyletic. Without reconstruction work, this is not resolvable.
  • Reconstruction being unable to inform classification. A reconstruction of the parent of a set of languages might end up coming out closer to some other language, that we may have suspected (but haven’t dared to declare) to be also related. It could even turn out that this language newly under comparison is not only related, but it is indeed a direct descendant of this same proto-language; just a very divergent one! — Or maybe the proto-language turns out to be substantially less similar to the other language being compared, and the earlier suspicion of a relationship evaporates entirely, or has to be reanalyzed as a late loanword layer.
  • Language isolates‘ history being unreconstructible. Internal reconstruction combined with loanword evidence can allow identifying probable sound changes and lexical intrusions just fine… though I suppose it will be unlikely to get especially far with this technique.

A more detailed workflow for historical linguistics, if starting from zero, would therefore look something like the following:

  1. Acquire data; sort out some initial vocabulary comparisons that look promising.
  2. Analyze sound correspondences; use these to look for more comparisons.
  3. Look at the big picture to see if some particular subset of languages should be indeed considered related.
  4. Attempt reconstructing the proto-language.
  5. Use the proto-language POV to clarify the status of issues like problematic etymologies, possible external relatives, or possible subgroups.
  6. Use modified analyses of data to improve the proto-language reconstruction.
  7. Iterate 5 and 6 until you’ve run out of insights to gain from the data.

This could also work as a kind of a typology of how far along research on a particular language family is. To date, I don’t think any language family has yet exhausted stage 7. Most are stuck in limbo somewhere around stage 3; only a few have reached stage 5, and Indo-European might be the only one to have indisputably gone through one cycle of stage 7. Big disputed hypotheses grouping well-accepted families together can probably be divided according to if they’re closer to stage 1 (e.g. Amerind, Nilo-Saharan) or stage 2 (e.g. variations of Nostratic). Smaller disputed hypotheses often seem to be either at stage 2 or stage 4, depending on who you ask (e.g. Altaic). (To which I might reply: if these really are supposed to be already at stage 4, bring on stage 5, please.)

Of course there are many major facets of historical linguistics still missing here. We also want to account for typology at some points, morphology too at others, semantics three, periodically research loanwords and then weed them out of the proto-language, maybe entertain some substrate hypotheses.

[1] Some people will claim that vocabulary is strictly optional and you can show relatedness solely on the basis of grammar. I am skeptical; but if this were to be the case — then the implication is that we will not be doing any lexical reconstruction work at any point at all.
[2] Maybe with subscripts to disambiguate overlapping sets if you’d prefer, but anything goes in principle. If your heart desires to see more wingdings in linguistics papers, there is nothing formally wrong in re-labeling a t ~ tʰ correspondence as *☕.

Advertisements
Tagged with: , ,
Posted in Methodology
31 comments on “Workflows in historical linguistics
  1. M. says:

    Some people will claim that vocabulary is strictly optional and you can show relatedness solely on the basis of grammar. I am skeptical; but if this were to be the case — then the implication is that we will not be doing any lexical reconstruction work at any point at all.

    If one were to restrict reconstruction to inflectional affixes and grammatical clitics, then doesn’t this imply that 95% of reconstruction is comparison between monosyllables (if that)? That’s a pretty precarious threshold for false positives.

    • j. says:

      That it is.

      It might be possible to similarly unpack this to some extent. Let’s say that to clearly demonstrate a relationship takes about 5 examples of each alleged sound correspondence, on average. If a relationship involves about 60 distinct sound correspondences, then we need about 300 instances of correspondences altogether. If we’re working data comprising CVC word roots, then each of these provides three instances of correspondences, and getting together 100 lexical comparisons will suffice. With CVCV roots, this drops to 75; CVCVC roots, 60. But if we’re working with affixes that are C or CV (let’s say either one on average), then we’ll need 200 affix comparisons to have a similar level of regularity… This is in principle still doable, if we’re working with strongly inflected languages and looking at both inflectional and derivational morphology, but it’s far from the default level of evidence in “morphological comparisons”.

      Most “morphological” relationship proposals do make use of lexical data as well for supporting whatever sound correspondences are proposed, though, which partially gets around this problem. And then there are issues such as that the strenght of comparisons actually also depends on semantic exactness. Morphological comparisons such as “sublative case” or “second person dual reflexive” can have here the additional benefit that they can indicate categories that lack entirely from many other languages.

      At the flipside, though, I find that non-exact morphological comparisons are 90% make-believe. A frequentative verb suffix perhaps could turn into a second person singular marker, or a collective noun suffix into a first person singular marker (both of these actual serious proposals for the development of the Hungarian indefinite verbal paradigm!), but I could easily create twenty other similar but equally unverifiable scenarios. The core problem is that semantic shifts are always “singular”: they affect one morpheme only at a time, unlike regular sound changes.

      • M. says:

        And then there are issues such as that the strenght of comparisons actually also depends on semantic exactness.

        Do you think this strictness only applies to affix comparison? I think it applies just as much to lexical comparisons, at least when one is building a core bank of evidence for sound correspondences. Once this core has been established, then it may be possible to relax semantic criteria.

  2. I definitely enjoy the stages with which you placed the different forms of phylic classification validity — if that’s how it can be described. Because even though “Nilo-Saharan” is considered wishful thinking a lot of linguists do get confused because admittedly, it is stage 1. It does look promising but any question of interrelation even between geographically close families is hard to prove (i.e. Gumuz & Koman, Nilotic & Surmic). Although I’m not sure where Afroasiatic sits in this scheme of things because there’s no agreement on even a single branch’s proto-language let alone the phylic proto-language and then there’s Kujarge and Ongota which have resisted classification; the former being confused for Cushitic or Chadic, or something between and the latter is debatably Cushitic, Omotic, or its own branch.

    Over all this post was really informative and I really enjoyed it so much!

    • j. says:

      Everything starts at stage 1, though, including also junk like “Hungarian is closely related to Sumerian” or “Hebrew descends from Turkish”. There is no stage 0 for “truly illusory” hypotheses.

      On Nilo-Saharan in particular, some of the stage 1 work has also concluded against relatedness, e.g. the recent Starostin paper on core vocabulary. Really I think most level 1 hypotheses are not fit for further work as long as they have clear main components that remain themselves at no more than level 2 to 3.

      • I recently had read this paper by Dimmendaal that suggests at least a relationship between Surmic and Nilotic (which is very feasible), but he’s the only linguist I’ve seen who at some points will objected to “lumping” via typology but then support a typological argument for Nilo-Saharan’s existence. I support some groupings, like maybe Komuz and Nilo-Surmic but besides that I think the rest are questionable at best. Even in geographically close groupings such as Fur, Daju, and Nubian prior to the last 2 and a half millennia.

  3. David Marjanović says:

    A few too many of my blog posts seem to end up ballooning into mini-articles and consequently spend months if not years languishing in my drafts.

    Maybe you can turn them into series? That way you wouldn’t need to finish posting a series before you started posting another.

    Smaller disputed hypotheses often seem to be either at stage 2 or stage 4, depending on who you ask (e.g. Altaic). (To which I might reply: if these really are supposed to be already at stage 4, bring on stage 5, please.)

    Hm. I’d say Altaic is indeed at stage 4, and stage 5 has begun (Mudrak’s discovery that Proto-Eskimo can almost be derived from Proto-Altaic, admittedly mostly by mergers; somebody should really get to reconstructing Proto-Eskimo-Aleut already!). However, I also think that parts of stage 4 should be redone because of certain recent discoveries that I don’t have time to list right now.

    If one were to restrict reconstruction to inflectional affixes and grammatical clitics

    The idea here seems to be to compare whole systems of morphology, as has of course been done with conjugation or declension in IE (well, “Graeco-Aryan” anyway) or Afro-Asiatic. In these nearly ideal cases, this works very well indeed; but methods for comparison of polysynthetic verb templates are still being developed, it seems, and as soon as you run into an isolating language you’re out of luck.

    • Crom daba says:

      “Mudrak’s discovery that Proto-Eskimo can almost be derived from Proto-Altaic, admittedly mostly by mergers”
      That Moscow reconstruction of Proto-Altaic is so adaptable to whatever data you throw at it that I wouldn’t be surprised if Uto-Aztecan could be derived from it.

      However that resonates well with an idea that’s been haunting me lately about how the nearest common ancestor of nuclear Altaic might be the ancestor of all M-T pronoun languages of Eurasia, Fortescue’s shared elements of Uralo-Siberian morphology for example correspond relatively well to Altaic or at least Mongolian.

      • David Marjanović says:

        so adaptable

        Perhaps, but, IIRC, Mudrak didn’t adapt it, he just took what he and his two coauthors had published in the EDAL a few years earlier.

        the nearest common ancestor of nuclear Altaic might be the ancestor of all M-T pronoun languages of Eurasia, Fortescue’s shared elements of Uralo-Siberian morphology for example correspond relatively well to Altaic or at least Mongolian.

        This kind of thing does need more attention. Lots of people seem to implicitly confuse “are these languages detectably related” with “are they more closely related to each other than to anything else”.

        I agree that Uralo-Siberian looks a lot like what you’d get if you’d try to work on Nostratic/Eurasiatic without looking at the language families Fortescue happened not to look at.

    • j. says:

      Maybe you can turn them into series?

      Yes, I’ve been doing that already, when feasible. It keeps you guys posted, but doesn’t do all that much with regard to schedule slip — since the limiting factor is not blogging time, it’s new complications arising that need to be researched in turn. Posts for which I have upcoming sequels of this sort hanging in my drafts include Problems in PIE vocalism; *ä-backing in Finnic; On the epistemology of sound change; Laterals and Palatals (going on part 3); *ś > *š in Mansi; and even Minor Mordvinic Mutations (from all the way back in early 2013!).

      I’d say Altaic is indeed at stage 4, and stage 5 has begun

      Altaic still suffers from the same phonological problem that many other tentative proto-languages do: the inventory remains a bit too close to a “trivial reconstruction” (= stage 2). The amount of proto-phonemes seems artificially elevated, and the later development of the daughter languages does not consist of much else than an inward collapse of this proto-system in various ways. It reminds me of a few early “hyperphonetic” reconstructions of Uralic, where e.g. the following velar stops were posited:

      • Permic /k/ ~ Hungarian k < *k
      • Permic /k/ ~ Hungarian h < *kʰ
      • Permic /g/ ~ Hungarian k, g < *g
      • Permic /g/ ~ Hungarian h < *gʰ

      Our current understanding, though, is reconstructing only a single velar stop *k, which is regularly retracted and spirantized to h in Hungarian before original back vowels, versus semi-regularly voiced to /g/ in Permic before medial voiced stops/affricates (< *NT clusters). Crucially, this also allows discarding some lexical correspondences that do not abide to the regular rules (even if they might show sound correspondences that are regular if taken separately). It’s this kind of insight I expect to see of stage 4 reconstructions: judgements on what is reconstructible to the proto-language, and what isn’t.

      Moscow school Proto-Altaic, by contrast, is overengineered e.g. by a system of three stop phonations when none of the descendants have more than two (sometimes involving ad hoc reshuffling such as *tʃ *dʒ > Mongolic *d *dʒ), a four-way prosodic system when again none of the descendants have more than two, and a 8×5 fully unharmonic system of root vocalism, when all descendants attest something substantially simpler.

      • David Marjanović says:

        (To keep this out of moderation, I’m putting only one link in this post; links to other references will follow.)

        The new developments I was thinking of are:

        – An old anti-Altaic argument is that, while Turkic and Mongolic have words in common that are lacking in the other supposed Altaic branches, and Mongolic and Tungusic do as well, Turkic and Tungusic don’t. That’s the picture expected from contact, while from common ancestry we’d expect random losses leading to about equal amounts of all three kinds of matches. The “preface” of the EDAL has a section near the beginning that triumphantly presents a long list of Turkic-Tungusic matches that are absent from Mongolic. Turkic and Tungusic were never in direct contact before very recent times (some words have of course traveled between Yakut/Dolgan and Ewenki/Ewen), therefore Altaic. Well? There’s that Avar-associated golden bowl from Hungary that has Greek letters on it. All attempts to read them as Turkic are very awkward. Helimski showed they make sense if you read them as something close to Manchu; and Futaky identified Manchu-like words in Hungarian. If (some of) the Avars spoke Tungusic, then we should expect a thin veneer of Tungusic words from sea to shining sea, and Turkic is not likely to have escaped that.

        (But I don’t expect this to have a devastating effect. There seem to be examples of all mathematically possible matches, including Turkic-Japanese.)

        – Similarly, there’s that one Xiongnu sentence phonetically transcribed in Chinese characters and furnished with a Middle Chinese translation. All attempts to read it as Turkic are, again, very awkward. Then Vovin read it as Yeniseian, and suddenly it makes sense. If (some of) the Xiongnu spoke Yeniseian, we should expect Yeniseian words not just in Turkic and Mongolic, but probably also in Tungusic.

        – For Proto-Turkic, the EDAL reconstructed two vowels that have occasionally been postulated before but are not mainstream: *ạ (presumably [ʌ ~ ɤ]) for an /a/ – /ɯ/ correspondence which is otherwise irregular, and *ẹ ([e] as opposed to the normal [ɛ]) for a similar case that shows up as a rare /e/ phoneme in some of the modern languages. At least *ạ can instead be explained as an umlaut phenomenon. I should point out that eliminating these two proto-phonemes would actually make things easier: in the EDAL, the Proto-Turkic outcomes of Proto-Altaic vowels are very often “*a or *ạ” respectively “*e or *ẹ” seemingly at random.

        – The Proto-Altaic *i̯o is explicitly a reconstruction faute de mieux: there was a correspondence without an assigned proto-value, and *i̯a and *i̯u had already been reconstructed but *i̯o had not, so the authors put the two gaps together even though there’s apparently nothing to suggest this approximate sound value for that correspondence. (Not even, as it turns out, in external comparison.)

        – For a long time, Proto-Mongolic and Proto-Tungusic were reconstructed as having rather Turkic-like vowel systems (with front rounded vowels) and Turkic-like vowel harmony. Logically, then, Altaicists used to project the Turkic kind of vowel harmony all the way back to Proto-Altaic. This doesn’t work well, so the EDAL threw this out altogether, reconstructed PA as lacking any kind of vowel harmony, and explained the observed harmonies as the outcomes of umlaut phenomena. Maybe that’s actually going too far. Ko Seongyeon’s thesis on vowel harmony in Mongolic, Tungusic and Korean shows that none of these ever had a Turkic-like harmony, with the single late exception of Oirat/Kalmyk; instead, they all had – and many Mongolic and Tungusic languages still have – tongue-root harmony. The thesis goes on to speculate that the Turkic frontness harmony may have developed out of tongue-root harmony the same way the Oirat/Kalmyk version later did, and states that no trace of any harmony is found in Japonic. Japonic is where the EDAL postulated the greatest number of vowel assimilation phenomena, though.

        – Speaking of Japonic, the EDAL reconstructed a four-vowel system (*/a i u ə/) for Proto-Japonic. Ryukyuists have long said that it’s necessary to reconstruct a six-vowel system with additional */e o/, though it’s not clear (as the EDAL was quick to point out) whether different Ryukyuan languages actually point at the same words. This controversy has raged on, as has the one on whether there’s evidence for */e o/ in Japanese. That’ll have to be sorted out at some point.

        – Have you noticed that the tone systems of EDAL Proto-Korean and EDAL Proto-Japonic are mirror images of each other? Elizabeth M. Boer has a book on academia.edu that reviews the entire diachrony of Japonic tone (and of the research on it) and very convincingly concludes that everyone had been reading the Middle Japanese tone marks upside-down. That shatters the mirror – and means the EDAL Proto-Altaic system is upside-down, too, because the authors made the (consciously, explicitly arbitrary) decision to base their notation on the supposed Japonic rather than the Korean tones. The same book derives the Ryukyuan “word-tone” (pitch-accent?) systems from a southern Japanese one, strongly implying that Ryukyuan is not the sister-group of Japanese but actually nested in it – which makes plenty of historico-geographical sense (reviewed in the book).

        – More is known now than in 2003 about the “Para-Mongolic” languages and on the extinct languages of the Korean peninsula.

        – Altaistics urgently needs a reconstruction of Proto-Eskimo-Aleut.

        The amount of proto-phonemes seems artificially elevated, and the later development of the daughter languages does not consist of much else than an inward collapse of this proto-system in various ways. It reminds me of a few early “hyperphonetic” reconstructions of Uralic, where e.g. the following velar stops were posited:

        Yes and no. On the “no” side, the EDAL does postulate a number of conditional developments, including a Verner-like phenomenon where one of the tones voices one of the plosive series in Mongolic.

        On the “yes” side, yeah, that’s a common phenomenon in the Moscow School. Witness this paper, where on p. 315 it says: “An additional IE fricative should probably be reconstructed for the correspondence Hittite s / Luwian t / Narrow IE 0, as proposed in Ivanov 2001: 133; 2009: 5 and (independently) in Kassian & Yakubovich 2013: 22.” Actually, the correspondence is Hittite s / Luwian t / Narrow IE *s, parallel to Hittite / Luwian k / Narrow IE *h₃, and represents a slightly bizarre but fully regular dissimilation phenomenon in Luwian. The trick is that the example word given for “Hittite s / Luwian t / Narrow IE 0” is the cognate of see (*sekʷ-) in the first two, but the cognate of eye (*h₃okʷ-) in the last.

        Moscow school Proto-Altaic, by contrast, is overengineered e.g. by a system of three stop phonations when none of the descendants have more than two

        Well. Two phonations were reconstructed by all the early Altaicists; that didn’t work out, so they shifted to three. The recent attempt by Robbeets to return to just two didn’t work out either. I agree that it’s odd that the only reflex of the three-way distinction is supposed to be the Proto-Tungusic */g/-*/k/-*/x/ contrast, and most of the other supposed developments are rather odd for the reconstructed proto-values as voiced/plain/aspirated. But then, Altaic has just five basic branches. Suppose you tried to reconstruct PIE based just on Celtic, Balto-Slavic, Iranian, Tocharian and Anatolian evidence: you’d probably try to reconstruct two phonations, but you’d get various confusing hints at a third.

        ad hoc reshuffling such as *tʃ *dʒ > Mongolic *d *dʒ

        Whoa, that’s odd indeed. I must have overlooked it; I’ll try to read up on it.

        a four-way prosodic system when again none of the descendants have more than two

        Conversely, Tungusic has a length distinction in the second syllable, not just in the first one. The EDAL treatment of this can be paraphrased as “some cases are more or less obvious contractions, but in general we don’t understand this phenomenon, so we’re ignoring it for the moment”. An obvious opportunity for future improvement. I also wonder how vowel length may have interacted with consonants.

        and a 8×5 […] system of root vocalism, when all descendants attest something substantially simpler

        Probably that should be 5×5, with the three *i̯- “diphthongs” reinterpreted as *j-. That would neatly explain why *j- was reconstructed as absent word-initially but not word-medially. Also, accidental sampling bias could be a factor again: all Old Northwest Germanic languages already had very restricted vowel systems in unstressed syllables, except for Old High German which had almost no restrictions there at all.

        • Crom daba says:

          I agree with you about *ạ, this is Doerfer’s invention I think (he writes it *ë), IIRC there’s some article where he gives the statistics of Chuvash/Yakut a~ï matches and it looks pretty bad for the theory, but he sticks to it. For all his anti-Altaic scepticism he really like over-engineering his PT reconstruction. Ramstedt and Poppe considered it secondary.

          I was more optimistic about *ẹ, but perhaps Chuvash a~i split could also be explained as an umlaut process of some sort.

          EDAL vocalism is completely irreparable, even with Japanese and Tungusic given full independence and with their infamous semantic latitude in searching for cognates, the average number of possible first vowel matches for a Mongolic word given a Tungusic and Japanese ‘cognate’ exceeds 2, and furthermore there is no sensible pattern underlying these correspondences. The situation is similar for Turkic and Korean.

          Another problem is that it ignores the [ATR] distinction in Tungusic high vowels, treating it as completely recessive.

          Even though I generally agree with the tongue-root harmony reconstruction of Proto-Mongolian (and consider only tongue-root reconstructions of Tungusic as valid) there are some problems with it:
          – *e is generally reflected as front except in Daur, Khamnigan and some Inner-Mongolian dialects (Chahar and Baarin at least), and the first two are heavily influenced by Tungusic.
          – *ü and *ö are written as ‘Ui’ in Uyghur script as in Uyghur, and generally Turkic loans have preserved ‘frontness’ and so do Mongolic loans in Turkic. There are some cases of Turkic *o being adapted as Mongolic *u, but I don’t know any cases of Turkic *u being borrowed as *ü.
          – Phags-pa script writes *ö and *ü as ‘éo’ and ‘éu’ where the same sequence seems to denote palatalization in Chinese (see http://www.babelstone.co.uk/Phags-pa/Description.html).
          – East Yugur also has features front *ö and *ü, although this could possibly be West Yugur influence (However WY has some atypical vowel rotation going on, which could perhaps be EY influence)

          • David Marjanović says:

            *ạ […] is Doerfer’s invention […] (he writes it *ë)

            Oh yes, thanks for reminding me.

            – *e is generally reflected as front except in Daur, Khamnigan and some Inner-Mongolian dialects (Chahar and Baarin at least), and the first two are heavily influenced by Tungusic.

            Ko’s thesis actually shows that [ɛ] and [ə] are each found in about half of the Mongolic varieties and argues that the former repeatedly evolved from the latter. First, we’d theoretically expect [ə] to be the -RTR counterpart to [ɑ]; second, all these vowel systems based on -RTR have a huge gap in the lower front quarter of the vowel chart, so nothing stops [ə] from drifting into it.

            Similarly, the -RTR /u/ and /o/ of modern Halha (at least the short ones) aren’t really back [u] and [o], they’re central [ʉ] and [ɵ]. The Oirat/Kalmyk development is then just the logical continuation of this drift. Also, as a native speaker of German, I hear [ʉ] and [ɵ] as closest to my own /ʏ/ and /œ/, so I’m not surprised if medieval Turks and Tibetans did the same (…and that contemporary Russians are completely confused as shown by their transcriptions).

            East Yugur also has features front *ö and *ü

            Not according to Ko, IIRC; I’ll check.

            • Crom daba says:

              Ko’s thesis actually shows that [ɛ] and [ə] are each found in about half of the Mongolic varieties and argues that the former repeatedly evolved from the latter.

              Perhaps half of variants of Mongolian proper, and even then there are many cases of *e being merged into /i/ (in Baarin for example).
              Non-central languages show a different picture:
              – In East Yugur non-first syllable high vowels (*i, *u, *ü) become [ə] while *e is reflected as /e/.
              – Bonan centralizes *i while raising (and fronting?) *e to /i/.
              – Mongghul, Mangghuer and Santa split *e into [ə] and [ie] (generally [ə] after velars and [ie] elsewhere, but with many other complications) due to Mandarin influence, this to me suggests an originally palatal sound backing due to Mandarin phonotactic constraints.
              – Mogholi mostly has [e]/[ɛ]/[ei] with [ə] in absolute final position (and some assimilatory phenomena in non-first syllables)

              First, we’d theoretically expect [ə] to be the -RTR counterpart to [ɑ]; second, all these vowel systems based on -RTR have a huge gap in the lower front quarter of the vowel chart, so nothing stops [ə] from drifting into it.

              Yeah that’s the idea, although I think the change is pre-Proto-Mongolic (or maybe slightly later than Daur splitting off)

              Similarly, the -RTR /u/ and /o/ of modern Halha (at least the short ones) aren’t really back [u] and [o], they’re central [ʉ] and [ɵ]. The Oirat/Kalmyk development is then just the logical continuation of this drift.

              /u/ is actually only very slightly fronted and it’s mostly a short-vowel reduction thing, here’s a formant graph from Svantesson et al.

              Also, as a native speaker of German, I hear [ʉ] and [ɵ] as closest to my own /ʏ/ and /œ/, so I’m not surprised if medieval Turks and Tibetans did the same

              I view centralized rounded vowels the same as front rounded, as long as a language pairs these rounded vowels by F2 it’s clearly not a [RTR] harmony situation ([-RTR] ‘tense’ vowels are supposed to be less centralised).

              One problem is how to explain just how *ö got in between *u and *ü in Khalkha without at least some fronting. One idea I have is *ö could have came about through a rounding of **ə (such rounding is common in later Mongolic history) starting the process of gradual fronting of an originally pure [RTR] vowel system (think Evenk or even Nivkh).

              (…and that contemporary Russians are completely confused as shown by their transcriptions).”

              I remember reading that Russians originally parsed Tungus [-RTR] words as containing palatalised consonants

              Not according to Ko, IIRC; I’ll check.

              The reflex of short *ü is usually transcribed [ʉ] and long *ü as [y], *ö is described as “ø” (Yunast) “rounded, front, mid” (Todaeva), “ö” (Nugteren and Roos) with [o] as an allophone.
              In any case it’s not [u] as in Daur, Khamnigan, Buryat, Khalkha, Santa, Mogholi…

              P.S. How do I make proper quotes in wordpress?

              • j. says:

                P.S. How do I make proper quotes in wordpress?

                <blockquote> HTML tags work. (I’ve added some to your previous post.)

              • David Marjanović says:

                I had misremembered. Ko’s thesis (p. 85–88) reviewed the literature from 1981 to 2005 and found little agreement in it on such things as the number of phonemes. There is agreement on the existence of “e” and front rounded vowels, but Ko conjectured that the latter could be the outcome of i-umlaut as found in several other Mongolic languages. On top of that, he reported that Junast (1981) found variation between [ø] and [e] in suffixes and stems.

                Thanks for the chart! I didn’t know we could post images here; most blogs don’t allow that.

                • David Marjanović says:

                  There’s a table on p. 237 “(modified from Svantesson et al. 2005, p. 180)” which, without comment, lines up Khalkha /a ɔ ʊ e o u i/ with EY /a ɔ ʊ e ø u ə/ – what, is there no /i/ in the language?

        • j. says:

          the EDAL reconstructed two vowels that have occasionally been postulated before but are not mainstream

          I recently happened by random upon Vovin’s article On Accent in Chuvash (from 1994) where he cranks this up to 13, including not just these (he has *a versus *ɑ for EDAL’s *ạ versus *a), but also a Proto-Turkic *ə (for Common Turkic *ö ~ Chuvash /u/), Proto-Turkic *ɒ (CT *a ~ Chuv. *u) and Proto-Turkic *ɯ (CT *u ~ Chuv. /ɨ/). This is on top of a high/low tone distinction that triggers vowel reduction in Chuvash. This all is a prime example of a stage 2 “reconstruction” that probably should be eventually reducible to a bunch of conditional changes — and which would then also line up much better with the position to Chuvash as a part of the Volga-Kama language area, where there has been extensive vowel rotation also in Mari, Permic, Bashkir-Tatar, and to a lesser extent Mordvinic. In all of these, it is moreover vowel quality that triggers stress, not stress that triggers vowel reduction. (The same pattern extends also to Ugric and partly Eastern Iranian.)

          Two phonations were reconstructed by all the early Altaicists; that didn’t work out

          Well, yes: assuming that there was a Proto-Altaic, I would suppose that it had a more complex stop system than just the basic 2×4 (*p *b *t *d *tʃ *dʒ *k *g). But I would rather explore more complexity in places of articulation, e.g. *q for Tungusic /x/ ~ elsewhere /k/, or palatalized *tʲ *dʲ for dental/palatal interchanges (be they original, or secondary in some branches).

          Suppose you tried to reconstruct PIE based just on Celtic, Balto-Slavic, Iranian, Tocharian and Anatolian evidence

          To be slightly contrarian, I’m not convinced PIE had three stop phonation series: I think a reconstruction with two series plus a prosodic feature that triggers stuff like breathy voice or vowel shortening might be a good idea to explore (for just one thing, it would completely explain the attested root structure contrasts).

          Probably that should be 5×5, with the three *i̯- “diphthongs” reinterpreted as *j-.

          This sounds like a poor idea, considering that (1) this would mean *Cj clusters but no others word-initially, and 2) they trigger an extensive amount of vowel developments but almost no consonant developments. Whatever there is here, it clearly needs to be treated as a part of the vowel system.

          • David Marjanović says:

            I’m surprised Vovin did this, actually. Tones for Proto-Turkic, at the expense of making it look more like EDAL Proto-Altaic? :-)

            But I would rather explore more complexity in places of articulation, e.g. *q for Tungusic /x/ ~ elsewhere /k/, or palatalized *tʲ *dʲ for dental/palatal interchanges (be they original, or secondary in some branches).

            Good point.

            To be slightly contrarian, I’m not convinced PIE had three stop phonation series: I think a reconstruction with two series plus a prosodic feature that triggers stuff like breathy voice or vowel shortening might be a good idea to explore (for just one thing, it would completely explain the attested root structure contrasts).

            Concerning breathy voice, this has been proposed before, in several different ways (e.g. from a glottalist angle where implosives inhibit the spread of breathy voice…), but I’m not convinced it works well at least for synchronic PIE. Just recently I read a paper with a statistical argument that the constraint on *T-Dʰ roots doesn’t even exist – there are enough such roots in the LIV that either the constraint was no longer active in PIE or we should look for a constraint on syllables rather than roots. There’s definitely research to be done here.

            this would mean *Cj clusters but no others word-initially

            Many kinds of Chinese have Cj and Cw clusters but no others syllable-initially. And Proto-Altaic is reconstructed without a w.

            2) they trigger an extensive amount of vowel developments but almost no consonant developments.

            The former could be due to pressure to lose the only kind of word-initial consonant cluster. (I’m reminded of the non-initial development *-ja- > OHG -ē- in the absence of any other umlaut phenomena that worked backwards.) The latter… well, there are some, and perhaps some (or a lot) have been overlooked! Lots of fun for future research. ^_^

          • David Marjanović says:

            Oh, more on Altaic vowel systems that the EDAL didn’t take into account: several Tungusic languages and apparently Chahar distinguish -RTR /i/ from +RTR /ɪ/. Pages 178–179 of Ko’s thesis present evidence that the Middle Korean /i/ resulted from a merger of two such phonemes: some stems with /i/ take -RTR suffixes, some take +RTR suffixes, some are inconsistent. I wonder if all these */ɪ/ line up…

            • Crom daba says:

              There’s a table on p. 237 “(modified from Svantesson et al. 2005, p. 180)” which, without comment, lines up Khalkha /a ɔ ʊ e o u i/ with EY /a ɔ ʊ e ø u ə/ – what, is there no /i/ in the language?

              /ə/ surfaces as [i] when long or after palatal consonants (same is true for /u/ and [y]) and the length distinction is unstable which made or will make /i/ phonemic.

              Oh, more on Altaic vowel systems that the EDAL didn’t take into account: several Tungusic languages and apparently Chahar distinguish -RTR /i/ from +RTR /ɪ/. Pages 178–179 of Ko’s thesis present evidence that the Middle Korean /i/ resulted from a merger of two such phonemes: some stems with /i/ take -RTR suffixes, some take +RTR suffixes, some are inconsistent. I wonder if all these */ɪ/ line up…

              I mentioned the Tungusic RTR high vowels in some previous comments, it’s not just /i/ and /ɪ/, there’s also /ʊ/ and /u/ and there are even some cases of North /ɪ/ ~ South /ʊ/ implying a +RTR *ü, but it may be influence of neighbouring +RTR vowels (there are some words containing only /ʊ/ and /ɪ/ however).

              Chahar is a different case, pre-vowel loss /i/ was recessively -RTR /ɪ/ appears in exactly those cases where /i/ was or is followed by a +RTR vowel, the same cases where Khalkha gave /ʲa/. In contrast with Tungusic, there are possibly no words originally consisting only of /i/ and being +RTR.

              I seem to remember some Vovin article about pharyngeal harmony in Korean being contact induced, can’t find it now though. Other languages in the region also have vowel harmony: -Nivkh has /a/ /e/ /o/ vs. /ə/ /i/ /u/ which conditions velar/uvular distinction.
              -Yukaghir has /a/ /o/ /i/ vs /e/ /ö/ /i/ /u/ with the same guttural synharmony, which looks like a RTR system turning palatal.
              -Chukotko-Kamchatkan has /a/ /o/ /e/ vs /æ/ /u/ /i/ with no guttural synharmony.
              -I remember someone (the amaravati guy?) comparing Old Chinese type A/type B syllables with Altaic harmony.

        • j. says:

          Conversely, Tungusic has a length distinction in the second syllable, not just in the first one.

          On this topic:FWIW we have a guy at Helsinki who just finished last year his Master’s thesis on vowel length in Proto-Tungusic (draft up on Academia.edu) — in Russian, of course. I don’t recall him doing any Altaic comparison, but this will hopefully shed some light on that problem anyway.

          • Crom daba says:

            Nice paper, bookmarked his academia.edu page.
            It clears up why some sources reconstruct only a single */i/ for PTM, supposedly all *Ci roots are -RTR and CiCi roots are +RTR.
            This is also the first time I heard about Orok ө, materials I have don’t note it, I thought Even ө was secondary but if Orok supports it I guess it’s legit Proto-Tungusic.

  4. As Mary Haas once wrote, classification is a by-product of reconstruction (sorry, I can’t verify the exact quote now). Maybe I am mistaken, but it seems to me that the idea that one must somehow prove the relationship before one starts reconstructing is rather new, and was popularized mainly by Johanna Nichols. Her main argument is that there are contradictory reconstructions of, e.g., Proto-Afroasiatic, so reconstruction proves nothing. But in fact different reconstructions may have different degrees of certainty: compare Redei’s Proto-Uralic with Janhunen’s Proto-Uralic. Of course, it is possible to compile a table of spurious phonetic correspondences, “confirmed” by a relatively large amount of poorly analyzed data (a recent hypothesis of a Basque-Indoeuropean relationship shows this quite clearly). What is more important, such a spurious reconstruction can be done even if languages in question are indeed related (this is the case of at least one of competing Proto-Afroasiatic reconstructions). So, a quasi-reconstruction can have a core of essentially correct etymologies and sound correspondences. But are we really unable to distinguish between 1) completely spurious “reconstructions”, 2) quasi-reconstructions with a core of correct etymologies, and 3) real reconstructions?

    • j. says:

      We (and also me vs. Nichols) may mean slightly different things by “reconstruction”, where I am closer to the last distinction you draw.

      It’s just about trivial to assert proto-forms and connecting developments between any words whatsoever (Proto-Forexamplese *bʰara ‘skill, expertise’ > *pʰaa > *faː >> Mandarin 夫 ‘skill, effort’; *bʰara > *barə >> English bar ‘licence for a skilled occupation’). But to claim that this is a reconstruction is a stronger claim than the existence of a vague connectability, or even any vague roundabout actual connection: it is a claim of historical reality. Since zillions of competing reconstructions can be asserted, of which only one can be historical, any assertion has to come with strong arguments for its correctness before we can say that it in fact is (even approximately) a reconstruction. “Reconstructions” of Proto-Afro-Asiatic have been asserted, but as long as none of them withstand methodological scrutiny, they would better be called something like “correspondence pools” or “pseudo-reconstructions”. (This comes close to my division of individual reconstructed items as being *asterisked reconstructions proper, #hashed pseudo-reconstructions and **double-asterisked falsified non-reconstructions.)

      A typical part of this argumentation involves the usual Neogrammarian framework of regularity of sound correspondences. Another though is the semantic and morphological reliability of the etymologies involved. Your 2015 article with Kassian and Starostin on Indo-Uralic makes this point quite clear, I think: any demonstration of a disputed relationship should be spearheaded by the strongest evidence, i.e. items whose occurrence in the families’ prospective proto-languages is inferrable already without external evidence, and whose semantic matches are impeccable.
      On Forni’s Basque-IE comparison, I would say it fulfills the requirement of regular sound correspondences reasonably well, but where he really cheeses the data is the morphological side, by cherry-picking secondary formations from IE daughter languages and projecting them to the PIE level.

      • David Marjanović says:

        Just yesterday I read Kassian’s review of Forni’s paper. The morphology is indeed atrocious; and while the sound correspondences are regular, most of them are just “everything disappears in Basque, and the Basque /h/ comes out of nowhere”, so they don’t constrain the situation much.

        That said, I do wonder if Basque *hargi and PIE *h₂argʲ- (as in argentum) are really old loanwords one way or the other.

      • “This comes close to my division of individual reconstructed items as being *asterisked reconstructions proper, #hashed pseudo-reconstructions and **double-asterisked falsified non-reconstructions.”

        I find very useful Peiros’s fivefold typology of reconstructed protoforms (see his paper “Macro Families: Can a Mistake be Detected?”, pp. 274-275 http://starling.rinet.ru/Texts/mistake.pdf), especially his term “pre-reconstructions” (these are “not based on a proper set of phonological correspondences but only on the intuition of the linguist who introduced them”).
        As for our (in)ability to tell real reconstructions from various kinds of pseudo-reconstructions, especially if the latter are based on explicit phonological correspondences, I think that there are several additional criteria. You already mentioned semantic and morphological reliability. There are also some quite trivial considerations that are nevertheless often ignored in proposed long- (and short-)range reconstructions. For example, phonetic correspondences must form a system, i.e. compared items must contain several correspondences from the proposed set. This may seem evident, but consider the following protoforms: *=Vt ‘leave’, *cV ‘one’, *kV ‘hand, arm’, *…ƛ’ / *… ƛƛ’ ‘yoke’, *…ƛ / *… ƛƛ’ ‘wild boar; pig’, *…š… / *…s… ‘year’, *=šš ‘weave’, *D=Vk’ ‘burn’, *(=D=)Vc’ ‘know’ (D is not a proto-phoneme, but a symbol for gender affix) [J. Nichols “The Nakh-Daghestanian consonant correspondences.” // Current trends in Caucasian, East European and Inner Asian linguistics: papers in honor of Howard I. Aronson. Amsterdam, 2003, pp. 207-264].
        One more useful crirerion is the presence of correspondences for morpheme structure, especially root structure. This means that 1) words from compared languages are morphologically analysed, and 2) we can state, e.g., how the canonical root shape of language A corresponds to the canonical root shape of language B. For example, Forni’s Basque-Indo-European hypothesis is unable to explain how the well-known constraints on the shape of PIE roots are connected with (also known) constraints on the shape of Proto-Basque roots, or where do the latter come from.

        • David Marjanović says:

          That’s a useful paper!

        • j. says:

          For example, phonetic correspondences must form a system, i.e. compared items must contain several correspondences from the proposed set. This may seem evident, but consider the following protoforms:

          I see your point, though I would think that this does not completely torpedo our ability to work with language families featuring monophonemic roots. In principle, in these situations we can take a step back and consider correspondences as acting on features rather than phonemes. A proto-phoneme like *ƛʼ can be decomposed to at least two features: lateral affricate and ejective (maybe also e.g. short or unpalatalized), which we would hope to show consistent correspondences already on their own. We can then consider the co-occurrence of these feature-level correspondences to build up a system of sorts.

          This type of “rule symmetry” can be found often enough even in relatively small phonological subsystems (say, Eastern Finnish aa ää > oa eä), but the larger a phonological system is, the more this approach can improve the reliability of the assumed sound changes. If there is, for example, a 2×3×8 (labialization × phonation × POA) grid of stop/affricate consonants, we would hope to find 2+3+8 = 13 basic feature-level correspondences, plus maybe a handful of “correction factors”, instead of a full 2×3×8 = 48 completely independent correspondences for each.

          (In a sense, this does not strike me as being too different from the analysis of syllables into phonemes. A coda or a medial cannot occur in isolation any more than a phonation or a secondary articulation can…)

          — The Peiros paper is looking interesting, more comments once I’ve read it.

    • David Marjanović says:

      that the idea that one must somehow prove the relationship before one starts reconstructing

      It certainly doesn’t occur to me as a biologist. But then, the single origin of all known life is plain obvious: there are a lot of universal features that could be otherwise (some alternatives have even been made in the lab) but aren’t. We can take for granted that everything is discoverably related at some time depth.

      Also, only creationists and historical linguists even use the word “prove” anymore when talking about science. This black-or-white thinking doesn’t do any good.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

%d bloggers like this: