Five Shortcuts to Writing a Heavyweight Etymological Dictionary

Minor apologies for the clickbait-satire title (I do not actually enumerate any shortcuts in this post), but the arriving summer is making me jocular I guess. :)

My current stop on what seems to be turning into an unofficial world tour of etymological research is Dravidian. I was prompted to look into this after noticing a copy of Elli Marlow’s PhD thesis A Comparison of Uralic and Dravidian Etymological Vocabularies (1974, University of Texas) at the University of Helsinki library, and picking it up for examination. This proposed relationship is a curious one in that it always seems to have generated more interest on the Dravidologist than on the Uralicist side. Marlow, too, is a Dravidologist who seems to have little other experience in Uralic studies aside from having easy access to specialist literature by virtue of her Finnish roots.

Her work at least appears to dodge one of the worse pitfalls of comparison between language families, i.e. ignoring pre-existing etymological research and instead data-stripmining individual languages of a family for lookalikes. She refers strictly only to the data of the most up-to-date sources of her day of the old vocabulary in Uralic and Dravidian: Collinder’s Fenno-Ugric Vocabulary (1st edition 1955) and Comparative Grammar of the Uralic Languages (1960); Toivonen, Joki & Itkonen’s Suomen kielen etymologinen sanakirja (1st volume 1958, 4th 1969); and Burrow & Emeneau’s Dravidian Etymological Dictionary (1st edition 1961, followed by a “supplement” in 1968).

The end result is regardless that working mainly on about 1000 etymologies from Collinder, she finds Dravidian parallels for no less than 786 of them. Most of them retain decent semantic and phonetic resemblance, too. I have considered doing a full count, but e.g. taking items #200-#205 at random, we have the following comparisons (with approximate Dravidian reconstructions by me, to save some space; Uralic reconstructions Collinder’s):

  • Uralic *ńärpä- ‘thin, sparse’ ~ Dravidian #ñēr- ‘(to grow) thin’
  • U *jäpće ‘roasting spit’ ~ D #cappū ‘thorn’
  • U *ńola- ‘to crawl’ ~ D #neḷi- ‘to crawl’ / #nur̤ai- ‘to creep’
  • U *ńoma- ‘to creep’ ~ D #nāmpu ‘climbing plant’
  • U *ńowŋa ‘a salmonid fish’ ~ D #naŋku ‘a fish species’
  • U *ńorɜ ‘type of moss’ ~ D #ñir ‘water’

The second of these I would say is entirely unconvincing, and the sixth seems like a stretch as well, but the other comparisons are tolerable at face value. I could further criticize several of them on inter-Uralic grounds, of course. But this altogether still seems to leave a situation where, if not quite in the ballpark of 75%, then still some 30-50% of the best-established Uralic material would have fair-looking parallels in the DED.

The plot twist though is that the DED and its addenda include a bit over 5500 etymologies — a rather bigger pool of data, and it’s obvious that this significantly increases the odds of finding something that appears similar by pure chance.

But instead of continuing to a detailed assessment of the Uralo-Dravidian hypothesis, I have a different question I’d like to ask. Namely: how does one manage to assemble that many Proto-Dravidian etymologies? The family is supposed to be relatively old and diverse, on the same order as Uralic (though clearly less so than Indo-European). Should I be afraid that the dictionary it is an “anything goes” hodgepodge? Or that they are including younger Indo-Iranian loanwords all over the place?

Burrow & Emeneau’s work is currently available online, so this matter is relatively simple to investigate. Again, rather than going in for the full grind, I’ve taken a small sample: the etymologies implicitly reconstructed as beginning with word-initial short *a-, 338 of them altogether. And yes, it turns out that there are a few issues.

The foremost problem appears to be that Proto-Dravidian vocabulary and vocabulary of more limited distribution are mixed at will. Of the 338 words, only 199 appear in at least two main branches of Dravidian; 139 of them are restricted to a single sub-branch — in 118 cases South Dravidian, and even of these a significant portion are found in nowhere else but in Tamil and its close relative Malayalam! It would be easy to get the impression that one is reading not a “Dravidian Etymological Dictionary” but an “Etymological Dictionary of the Native Tamil Lexicon”. And if that were the case, 5500 words starts sounding entirely normal.

Or perhaps slightly more generally: “Tamil, Kannada and Telugu”. Within the wider-ranging comparative material, almost all of the data is represented in two or three of these old major literary languages of southern India. I moreover count 83 etymons that seem to also feature Telugu as the only non-south-Dravidian language to have a reflex, which makes me rather suspicious on if they all should be assumed to descend from the common ancestor of SD and Telugu; especially when the only lesser non-SD language to reach a vocabulary retention rate of more than 20% is Gondi. Even this is apparently actually a highly dialectally diverse grouping, as the DED and also e.g. Glottolog separate it into more than half a dozen sub-varieties.

Now, sure, we could attribute this terrible track record of all the unwritten central and northern Dravidian varieties to simply underdocumentation + massive Indo-Aryan influence. But whenever a DED entry does turn up with a reflex in one of these, it usually appears in quite a few others just as well. The distribution appears to be a bit too neatly divided between “poorly attested” and “well-attested” cases. So my suspicion would be that additionally quite a few of the DED entries represent old cultural loanwords that have diffused between the written Dravidian languages, and which do not go back to Proto-Dravidian (possibly not even Proto-South Dravidian)…

Another possible issue yet is that the known main Dravidian subgroups might still form some intermediate units. In doing some related research, I’ve run into some interesting claims from David W. McAlpin (perhaps best known for the Elamo-Dravidian hypothesis) along these lines — e.g. a suggestion that “Northern Dravidian” does not exist, and instead Brahui (the northwesternmost Dradivian language, spoken in southern Pakistan) and Kurukh-Malto (the northeasternmost Dravidian languages, spoken in East India) are primary branches of Dravidian, along with everything else a third branch. Geographically, this would certainly make sense; we know that the Indo-Aryan languages have entered Southern Asia from the north, and surely they must have in this process taken over at least some amount of territory that had formerly been Dravidian-speaking. McAlpin even has a paper where he suggests that Brahui might be closer related to Elamite than to Dravidian proper; though I do not know if he has held on to this idea.

In any case it looks like that distilling together some kind of a core Dravidian lexicon from the DED would be recommendable for any wider comparative purposes. Reaching all the way down to e.g. vocab found just in Tamil and its closes neighbors, or Telugu and its closest neighbors, is exactly the “data-stripmining” problem I mentioned Marlow having seemingly dodged. But, apparently, not one of her main sources.

2 comments on “Five Shortcuts to Writing a Heavyweight Etymological Dictionary
  1. Warren Maguire says:

    Nice post, some interesting points. It’s best not to think of DED as an etymological dictionary of Proto-Dravidian though, but as an etymological dictionary of the Dravidian languages, which includes forms shared by any subset of the languages (at least in theory).

  2. David Marjanović says:

    I finally read McAlpin’s little paper. It’s quite interesting.

