Some things rotten in the history of Tungusic

On a whim, I’ve started to investigate the lexicon of Proto-Tungusic, which the Moscow school of Nostraticists maintain a handy database of (as they do for pretty much all Eurasian language families).

I am currently about 10% in, having looked thru (and transferred into a spreadsheet for further analysis) all roots beginning with *a, *ā, *b and maybe half of *č. Interestingly though, there are already a couple of clear signs that the analysis is not exactly reliable, even without me knowing anything about any Tungusic language in detail. In some aspects things appear to be even, quite simply, terribly wrong.

In particular, one obvious argument stands out against the Altaic hypothesis, at least in the strongest form as advanced by the authors: around 98% of the words in the database (so far, 235 out of 240) are traced in some form back to Proto-Altaic.

So, Tungusic, despite being a family bordering unrelated languages on several sides (Nivkh, Chukotko-Kamchatkan, Yukaghir, Sinitic), and distinct enough also from its supposed relatives that no generally accepted protolanguage has been so far reconstructed — is regardless supposed to contain less than 3% non-inherited material in its reconstructible vocabulary! All substrate loans, all proto-language era loans, all areally widespread loans, all coinages, all onomatopoeias, all words that have semantically diverged so far that their ancestry has become opaque: these categories are supposed to wholly fit among the five allegedly non-Altaic word roots I have down so far. I guess the people responsible for this project haven’t grasped the idea that there even is a typology of etymology that they are violating.

Now, sure, if “Proto-Altaic” was brought forward as a synchronic grab-bag of word roots and typological features that are just found in some shape across a wide area in central to northeastern Eurasia, this all would not necessarily be a problem. We’d just call it a work in progress, and hope for eventually sorting out which words indicate Mongolic loans in Tungusic, which ones Para-Tungusic loans in Korean, which a mutual substrate in Turkic and Tungusic, etc. Yet as far as I can tell, no self-professed Altaicist takes this stance.

It somehow gets worse from here yet. A preposterous amount of the time, words are reconstructed to Proto-Tungusic on the basis of only a single language, plus external parallels. The typical language in the family seems to retain about 40-60% of the original vocabulary (you may wish to compare this against the previous number). If the vocabulary had later only been subject to random loss, we’d expect that words surfacing only in one language (out of ten, as per the database’s analysis: Evenki, Even, Negidal, Manchu, Ulcha, Orok, Nanai, Oroch, Udighe, Solon) occurred about 0.5^10 ≈ 0.1 % of the time. Guess how many actual cases the current sample includes? 34, ie. about 14%. An additional 17 roots (~7%) are then limited to a single sub-branch of the family, e.g. Northern Tungusic.

This kind of a discrepancy might still be excusable, if this were a Turkic-type situation — a family where one of the main branches is currently only represented by a single language. In such a case, any word that had been lost in the “main” branch could well have been still retained in the “minor” branch. But that won’t work here: the isolated vocabulary is scattered over several languages, including especially both far ends of the family (Evenki in the north, Manchu in the south), and occasional cases from most other languages as well.

Whether there are issues in the actual raw lexical data though, I couldn’t tell, but it’s cited from a decent variety of sources… so at least there should be no reason to suspect a systematic heterodox methodological bias.

Of course, knowing that there is a problem is not equivalent to knowing how it should be fixed. The latter will take a bit more work than a single blog post, I am sure. One path would be the traditional etymological approach: to just wade in and start noting comparisons that are phonetically or semantically dubious, and see how much that takes care of. But, there are other options as well that might turn out more effective. E.g. zooming in on material that phonologically stands out (possible loanword phonemes and similar features) would perhaps lead to something. I moreover have in mind, one step more quantitative yet, a relatively simple statistical check-up: correlating the internal Tungusic distribution of the word roots to the external distribution of their Altaic parallels. E.g. if a substantial number of loans to/from Mongolic have been here misinterpreted as inherited, I’d expect a language such as Manchu (neighboring Mongolia) to contain more of these than a language such as Negidal (by the Sea of Okhotsk coast)? We’ll see. I will have to do a separate sweep of the Altaic database later to log this info, and I still have quite a while to go here as well.

Advertisements
Tagged with: , , , , ,
Posted in Commentary, Methodology
16 comments on “Some things rotten in the history of Tungusic
  1. To begin with, the Tungusic database on the “Tower of Babel” site is simply an electronic version of the relevant part of “An Etymological Dictionary of Altaic Languages” (EDAL) by S. Starostin, A. Dybo and O. Mudrak. Evidently, it does not contain words that do not have Altaic etymology in EDAL. The only exception are words from the Swadesh 100-word list (or rather the modified Starostin 110-word list) that are systematically included in the database irrespective of their etymology for technical reasons.
    Suppose that the Altaic hypothesis is true (I personally do believe it is). How many roots going back to Proto-Altaic can be expected to be preserved in a daughter branch like Tungusic? The comparison with uncontroversial families suggests a figure of several hundred roots at best, perhaps around five hundred (Indo-European and Semitic may constitute an exception due to abundance of early written records). Actually, the Altaic etymological database contains 2414 reconstructions with a reflex in Tungusic out of overall number of 2805 Proto-Altaic roots (I use offline version of the database, available for download on the “Tower of Babel” site). This is an unacceptably high number. So, if, say, six hundred of them really have Altaic pedigree, what about ≈ 1800 others? Are most of them borrowed from Mongolic? I would rather think that, although some of these words can be loans, most of them are just chance resemblances due to too permissive system of sound correspondences accepted in EDAL; the main problem seems to be multiple unconditioned splits of vowel reflexes (compare the situation with UEW). What is desperately needed is sifting of available Altaic etymologies (not only those found in EDAL) with a stricter approach both to phonetic, semantic and distributional side of comparisons.

    • j. says:

      Evidently, it does not contain words that do not have Altaic etymology in EDAL. The only exception are words from the Swadesh 100-word list (or rather the modified Starostin 110-word list) that are systematically included in the database irrespective of their etymology for technical reasons.

      I consider this explanation too, yes. The words without an Altaic etymology do not seem to be all Swadesh list entries, though — these include e.g. *apa- ‘to attack; to fight’, or *bāku-ńa ‘seal; young bear’. 2400 roots moreover sounds like a decent number already for the common Tungusic lexicon. I doubt that the Comparative Dictionary of Tungus-Manchu languages (the database’s main source) can contain too much more roots that this, even with Pokorny-style maximal inclusivity. But I guess I’d have to check the actual dictionary to know if there are also Tungusic-only roots that have been omitted, and if so, how many.

      So, if, say, six hundred of them really have Altaic pedigree, what about ≈ 1800 others? Are most of them borrowed from Mongolic? I would rather think that, although some of these words can be loans, most of them are just chance resemblances due to too permissive system of sound correspondences accepted in EDAL

      Sure. But I am fairly sure that an inclusionist approach will also catch at least some loan strata or the like. The UEW has a nice precedent for this, in listing in its Finno-Permic section at least a few words that have by now been analyzed as or proposed to be Permic loanwords in Mari; Finnic loanwords in Permic; or adstrate loans across Finnic ~ Mordvinic ~ Mari ~ Permic.

  2. David Marjanović says:

    There’s also the “cultural” issue that the Moscow School follows Pokorny in consistently and knowingly erring on the side of inclusion, including everything in their etymological dictionaries that might conceivably go back to the protolanguage they’re trying to reconstruct, even if irregular developments have to be assumed.

    What is desperately needed is sifting of available Altaic etymologies (not only those found in EDAL) with a stricter approach both to phonetic, semantic and distributional side of comparisons.

    What do you think of Martine Robbeets’s work in that respect?

    • I am not an Altaicist, much less a specialist in Japanese, so my remarks below can be only of a general methodological character. While it seems at the first sight that M. Robbeets’ book “Is Japanese Related to Korean, Tungusic, Mongolic and Turkic?” is just the kind of work I am advocating here, actually I have some doubts about whether it constitutes real progress in the field of Altaic comparison. There are many problems with Robbeets’ approach, but for the lack of time I will single out the most serious one: the system of sound correspondences proposed in her book actually does not account for many well-known facts of historical phonology of Altaic languages. Thus, nothing in Robbeets’s “Proto-Altaic” accounts for Tungusic words with initial *ŋ-, and word-internal *-ŋ- is also left unexplained. Nothing is said about where does vowel length in Turkic and Tungusic come from. The triple opposition of initial Tungusic *x- vs. *k- vs. *g- (unlike Moscow school, Robbeets reconstructs only *k- and *g- for Proto-Altaic) is accounted for by a hypothetic sporadic “lenition” *k- > *x- in Tungusic. The most glaring example is the way she dismisses out of hand the Turkic portion of the Ramstedt-Pelliot law, i.e. the shift of initial (aspirated) *p- to Proto-Turkic *h-, preserved only in Khalaj. *h is missing from the table of Proto-Turkic consonant phonemes on p. 75. Nevertheless, on the same page (!) we can read that “[w]ord initial *p- developed over a bilabial fricative into h- and finally disappeared in most of the contemporary Turkic languages”. However, in the final table of correspondences we see that both Proto-Altaic initial *p- and *b- yield *b- in Proto-Turkic. So, where does Khalaj (and Proto-Turkic) h- comes from? And what to make of words where this *h- corresponds to Proto-Mongolic *h- (< *p-) and Proto-Tungusic *p-? Here I must add that the development of pre-Proto-Turkic *p- into Proto-Turkic *h- is accepted both by pro-Altaicists and anti-Altaicists. In fact, it was the eminent anti-Altaicist Gerhard Doerfer who discovered the Khalaj reflexation.

  3. David Marjanović says:

    is accounted for by a hypothetic sporadic “lenition” *k- > *x- in Tungusic

    Ew, that’s disgusting. :-)

    However, in the final table of correspondences we see that both Proto-Altaic initial *p- and *b- yield *b- in Proto-Turkic.

    That’s really strange, too!

    • j. says:

      The alleged treatment of the Proto-Altaic stop consonant system seems to be rather strange in general. I only really know what has been cited of the EDAL on e.g. Wikipedia, but their system with a three-grade distinction (*Pʰ : *P : *B) that survives systematically nowhere, and is instead merged into a two-way contrast with various conditioning, feels like an artificial solution.

      I have a recollection of seeing an article some time back that argued for a shift *q > *x in pre-Proto-Tungusic; this might be a slightly better explanation for PTg *x than assuming a separate aspirate series that merged with the tenues aside from *kʰ > *x. But I’d still want to see e.g. loanwords to or from Nivkh to further back this idea up.

      • David Marjanović says:

        a separate aspirate series that merged with the tenues aside from *kʰ > *x

        That’s not what the EDAL proposed; instead it says the series merged in different ways not only in different branches, but also in different environments, with Mongolic even having a tone-dependent development reminiscent of Verner’s law. I have a pdf of the whole 270-page “introduction”; drop me an e-mail if you’re interested.

        The PIE three-grade distinction doesn’t survive completely systematically anywhere either, except maybe P-Italic – OK, I don’t know enough about Armenian. Greek and Indic obviously come very close, but there’s Grassmann’s law and, in Indic, clusters with laryngeals becoming aspirates; Iranian has mergers in consonant clusters and again the issue with laryngeal clusters; Latin has a positional merger; Germanic has Verner’s law; Winter’s law is a trace in Balto-Slavic, but its outcome has merged with the outcomes of several other processes….

        • j. says:

          Well, yes. I mean that the non-velar aspirates, per Wikipedia, allegedly are reflected tenues in Tungusic specifically. And by “systematically” I mean that Tungusic has in at least some positions the triple distinction *x : *k : *g, where each member is suggested to by default come from a different Altaic segment. But nothing similar exists at the other places of articulation, nor in Turkic, Mongolic, Japonic or Korean, all of which only have 1-2 stop series and no spirant series.

          By contrast, Grassmann’s Law, Verner’s Law and lenition in Latin and Armenian may all reshuffle the PIE series under some conditions, but all of the affected languages still maintain a systematical phonological distinction between three different obstruent series, found in the core lexicons of the languages and not explainable as later individual developments. This kind of a thing is a clear sign that their ancestor once had an at least threefold set of obstruents as well.

          (I incidentally consider it less clear whether early PIE also had a three-series system, seeing how only the Core IE group has some of these threeway-distinction languages. Anatolian has been suggested to be mostly compatible with just two stop series, and Tocharian only has one series, although there are some conditional developments.)

          • nydwracu says:

            I’m a bit late here, but Tocharian had at least two series for long enough to have different outcomes for PIE *d than for *t *dh. It may also have had Grassmann’s Law.

            There’s also good reason to suspect that it split off later than is generally assumed — i.e. was part of the ‘northwestern group’. See here. The general picture that emerges is one of similarity to both Germanic (development of the syllabic resonants, for one, though there are some differences in the treatment of initial syllabic resonants and ṇ) and Greek. Hannes Fellner seems to have argued [in a talk that doesn’t appear to be online] against the nominal morphology-based case for an early Tocharian split. And then there are the Tarim tartans…

            Of course, if it was part of the northwestern group, then it was part of Core IE, so it’s not relevant to the series question.

            • j. says:

              Yes, I would assume that “Core IE” is not so much a genetic subgroup as much as a grouping of dialects that remained areally close-by after Anatolian and Tocharian had already moved south and east. Early dialect isoglosses could well have been in place within it.

              There has to be something in early PIE that the *t/*d/*dʰ contrast comes from, but if we allow the possibility that it was not simply a similar phonation contrast, it’s possible that in Tocharian it never developed into one, and it rather triggered the development “*d” > ∅/ś in some other fashion.

              I may need to look more into Tocharian altogether, though.

              • nydwracu says:

                Possibly relevant to the split question is that PIE *ih1 seems to have had a different outcome than *ih2 *ih3. How late were the laryngeals maintained in ‘core IE’? Of course, this could have been an early dialectal development, like nonrhoticity in some dialects of American English…

                (Also, it was *d > ts in non-palatalizing contexts and ś in palatalizing ones, with some possibly irregular shifts of *d > ∅ thrown in. It seems that an intermediate *dz is posited, though I don’t know if this is for any more reason than that it’s plausible. For all I know, the voiced plosives could have become voiceless aspirates, with *tʰ > ts as I’ve seen claimed for Danish, and then loss of aspiration. But I doubt it. Another curious thing is that *ty *dʰy became PToch *ts, whereas *t *dʰ were palatalized by vowels to *ś… though this could just be the result of two different waves of palatalization, with palatalization of dentals [but not PIE *d] occurring before vocalic palatalization. Are there any sequences of *tye *dʰye in PIE that show up in Tocharian? In case it isn’t obvious, I know very little about Indo-European.)

                • *t *dʰ were palatalized to *c; *ś is the result of palatalization of velars.

                • j. says:

                  How late were the laryngeals maintained in ‘core IE’?

                  Seems like an underresearched topic to me, but there are various indications that it was in part fairly late. There are a few loanwords in Finnic that still reflect laryngeals as /k-/ or /-h-/ despite having semantics or morphology that’s particular to Germanic, and there’s also Cowgill’s Law (*Rh₃w > *Rgʷ > *Rkʷ); people like Kortlandt claim that the Latvian broken tone directly continues *VH, as if implying that in Balto-Slavic long vowels merged into *VH rather than the other way around; there’s a regular shift *kh₂ > *x in Slavic; Indo-Iranian of course also does *TH > *tʰ, and moreover several other complications like *VHD > *VD. There might be some particulars in Greek/Latin/Celtic too that require a late maintenance of consonantal laryngeals, but I know less about those three.

        • j. says:

          I have a pdf of the whole 270-page “introduction”; drop me an e-mail if you’re interested.

          More interestingly yet, Anna Dybo seems to have just uploaded the whole EDAL (1500+ pages) on Academia.edu. I guess I will be looking into it over the coming months…

  4. “as if implying that in Balto-Slavic long vowels merged into *VH rather than the other way around”
    No, the Leiden school assumes that genuine PIE long vowels have Balto-Slavic circumflex and therefore Latvian falling tone.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

%d bloggers like this: