The lacuna between phonology and etymology

If you spend any substantial time researching or reading up on etymology and historical phonology, you might notice that one topic in their common neighborhood often tends to be left with surprizingly little attention. At least I have.

This is what I could call etymological phonology: the analysis of in which particular words any given sound change has actually occurred. You’d think that this falls firmly within the scope of both etymology and historical phonology. And yes, well-written original articles proposing new sound changes usually do not shy away from the work of treating all relevant evidence available to the author.

The situation often gets murkier though, the more time passes. A researcher who invents a sound law is usually far from the last one to contribute examples. But follow-up or review articles, handbooks, textbooks, etc. are typically content to only enumerate a couple of examples per a sound change, even when proposing new instances of it, or proposing alternate explanations for words in which the change has been assumed. I also know of no type of standard linguistic publication or database dedicated to keeping track of the total evidential base for any given sound change, in the way etymological dictionaries exist for meticulously collecting lexical comparisons that have been proposed.

(This is also one reason why I like Hakulinen’s Suomen kielen rakenne ja kehitys quite a bit. While its section on historical phonology is relatively sparse and includes e.g. no reference to cognate forms in other Uralic languages, no reference to developments in the other Finnic languages, and rather few attempts at detailing overall chronology, all of its four main editions still provide for most sound changes covered a complete list of words where they are assumed to have occurred. This makes it easy to hunt down further details from etymological literature if required.)

For many sound changes this situation is probably not a problem. Any “canonical”, highly distinctive, unconditional or very clearly conditioned sound change (say, Grimm’s Law; *š > *h in Finnic; or *-a/-ä > *-ā > *-ē in Samic) will be clearly enough evident to any reader who’s up to speed on the basics of a language (group)’s history, as soon as it appears in an etymological comparison. Listing five or ten textbook examples of such a sound change, rather than all 68 or 373 known cases, is also definitely convincing enough already on its own for showing that the change exists.

Things however get more problematic when we consider conditional sound changes.

The conditioning of a sound change is not directly observable, unlike its output (and occasionally even the input). It must be inferred from an analysis of the total set of examples. As etymological data its itself open to multiple analyses, and as a given sound correspondence might be attributable to more than one sound change, these issues are often subject to debate and competing analyses. It’s not difficult to find cases where the existence of a sound change has been long established, but its precise conditioning remains an open question. An example I’ve blogged about before is *o > õ in Southern Finnic. More widely-known cases from Indo-European studies, at variable points on the research curve, might include e.g. Winter’s law; Brugmann’s law; the resolution of syllabic resonants in Balto-Slavic as *iR / *uR; depalatalization of palatovelars in various Satem languages; or *o > a in Latin.

A direct corollary is also that gains in historical phonology (often with further implications for etymology) can still be made after all relevant data has already been gathered and all sound developments per se observed. Much of my own research in historical phonology has indeed been focused on reanalysing the conditioning of known changes, e.g. *Nś > *js, *Nć > *jc in Finnic or *ś > *š in Mansi.

However, if all relevant data is not available in a common location, and especially if it hasn’t been for a good while, even relatively obvious observations could end up going unmade. Worse yet, people attempting to determine conditioning based on only a sample of the data (whether they realize they’re doing this or not) can also lead to wrong conclusions entirely rising to popularity.

One larger if rare risk of this sort might also lie within seemingly trivial sound developments. An author who dutifully reports a long list of cases of e.g. Estonian õ going back to *o might still not pay any especial attention to the cases that have o < *o. [1] This is again not necessarily a problem — phonemes have their “inertia” after all, being generally preserved by default during language transmission. Of course this only holds though if cases of a development such as o < *o truly are retentions. But if no clear account of a sound change’s conditioning has been found, it can be suspected that the data contains complications such as back-developments, blocking conditions beside enabling conditions, or interdialectal loaning; or even that the wrong ancestral state has been assumed, and the “innovative” cases are rather retentions altogether and the “retained” cases innovations.

If a sound development truly has the shape “*X > X under condition C, else *X > Y”, in principle I suppose it is still possible to work out the conditioning just from a list of the cases with *X > Y. A simple mechanistic way to get started will be to enumerate all the non-C environments (which will be also a finite set, after all); and with a bit of introspection, it is also possible to reach the conclusion that defining C as a conditioning environment is actually more parsimonious than defining ¬C as a conditioning environment. But I wonder how many researchers will be willing to take one last step, and conclude that what has happened has instead been *Y > X under condition C (regardless of if *Y still goes back to earlier *X). And I’m sure this possibility will be more easily realized by someone who also has a list of all the examples with *X > X, and directly observes that they seem to all show a similar environment C.

This all furthermore ties in with one far-off wish of mine. There will be, I’m sure, eventually a paradigm shift in linguistics from etymological dictionaries (edited once and outdated after that, costly to acquire) to etymological databases (continuously updateable, easily accessed online) as the primary means of coordinating etymological research across a language family. At that point though, an additional bonus I’m hoping for would be to also incorporate tools for analyzing etymological phonology into the same platform(s). (And why not also other similar topics of interest — e.g. statistics on things like semantic change or root structure.) I hope my discussion above suffices to outline what kind of benefits this could likely bring about.

[1] This particular example taken from Alo Raun (1971): Problems of the number and grouping of Proto-Finnic dialects, in: Essays in Finno-Ugric and Finnic Linguistics, Indiana University Publications: Uralic and Altaic Series 107.

