Sometimes I feel I’d like to see an anti-etymological dictionary.
Given two or more different etymological dictionaries, especially for an entire group of languages, typically one of them (usually from the older end) is going to end up being less critical, while another one (usually from the newer end) is going to end up being more critical. If we want to know what is known so far about a word’s etymology (cognates, reconstruction, etc.), we’d look in the more modern dictionary, of course. But if we want to know what is not known about a word’s etymology — i.e. what research questions are still open? neither of these sources is really going to work. What’s needed for this is, at a pinch, the difference between them.
Sometimes older separate etymological groups get combined into a single one, and sometimes older single etymological groups will turn out to comprise unrelated words and will be disassembled into various different ones (maybe under different native roots, but maybe also as loans or derivatives). This is all no major problem so far, especially if newer research will bother to mention that earlier, zoop was considered cognate with foop, but per current understanding it is actually cognate with doop.
But etymologies can also simply vanish from the literature record without comment, or with minimal comments along the lines of “strike this” (this latter type I’ve seen in erranda or in “update notes” to new editions). This I find unsatisfying. Even when an explicit reason has been given (”the correspondence z ~ f is irregular”), if this merely renders the compared words without etymology, then we are again back to square one on what the words’ origin actually is. Or, for that matter, on why the earlier observed similarity exists at all?
It is possible for similarity to exist for reasons other than by proper common inheritance or pure random chance: loans between related languages, loans in parallel from a third source, common inherited morphology applied to different roots, contamination between semantically nearby words, universal onomatopoetic patterns… Traditional etymological dictionaries I’ve only seen commonly apply the last of these with any consistency. The first is usually invoked only in cases of obvious, long since established layers of loanwords (in Uralic context e.g. Finnic → Samic, Komi → Ob-Ugric). The second thru the fourth are rarely explored at all.
So I would hope for truly thorough etymological dictionaries to also include a discard pile of words and comparisons from earlier literature that remain without an adequate explanation, something which would definitely make future etymologists’ work slightly easier.
I am currently doing some “antietymological” groundwork myself: charting how much content there is in Collinder’s Fenno-Ugric Vocabulary that is not reproduced also by later sources (mainly the UEW on one hand, Janhunen’s Samojedischer Wortschatz on the other). It is not a lot, and most of the omissions are clearly dregs, but some small part of the material remains interesting. It is even possible to find examples that have later reappeared again: one is the comparison of Mari *lüðä- ‘to fear’ with Samoyedic *lër(ə)- ‘to be afraid’, rediscovered by Ante Aikio in his paper on new Mari etymologies from a few some years back.
A much bigger amount of work, however, would entail somehow bridging the still largely aligned FUV and UEW etymological corpora with the more heavily pruned ones in Janhunen 1981 and Sammallahti 1988. For most of the comparisons rejected by the latter two authors as insufficiently regular, this has been done quietly, without any arguments given at all. This may very well have allowed in increases in historical phonology, but at the cost of what seems like a hefty step back in how much we can claim to know about Uralic etymology.
Even further observations could be perhaps made by taking a look at even earlier etymological compendia: Budenz’ Magyar–ugor összehasonlitó szótár (1873–1881), Donner’s Vergleichendes Wörterbuch der finnisch-ugrischen Sprachen (1874–1888), as well as the extensive material quoted in the major historical phonology overviews that followed in their wake, such as Paasonen’s “Beiträge zur finnischugrisch-samojedischen Lautgeschichte” (1913). I again know of some recently rediscovered etymologies that have first been suggested already around this time or even earlier. Especially the first two include etymological comparisons still more boldly than FUV and UEW though (which were at least constrained by mainly compiling etymologies from already published literature), so the junk to real forgotten goodies ratio would surely be still lower.
There’s also another sense in which “anti-etymologies” could be compiled from this period, however. This far back it is not difficult at all to find comparisons that have been rendered firmly obsolete by now, not just left into a limbo of “irregularity”. These might be illustrative in showing how has etymological progress been achieved over the last 100+ years. Have they been superceded by new native comparisons enabled by new data? by loanword etymologies? by new morphological analyses? something else? … and the results of such a survey could perhaps be then used as a roadmap for future research as well, to work out what’s likely and what’s not likely to continue to provide new results.