Last fall I blogged about a possible project on charting the distribution of reconstructed Proto-Indo-European terms in the descendants languages. Some discussion on here focused on the likely unreliability of the data, sourced for my initial survey from a conveniently available but unreferenced Wiktionary appendix.

This was not a choice out of ignorance as much as out of availability. To my knowledge, no public database of reasonably up-to-date etymological Indo-European data is currently available anywhere.

There is no reason though for us to resign to an inequal access to information, with easily found free data being of poor quality vs. “proper” data being locked away in exorbitantly expensive dead-tree-format publications. Data and theories, per se, are uncopyrightable, after all.

I am therefore happy to announce having digitized a list of PIE verb roots, as recorded in the LIV + in its online Addenda und Corrigenda. [1] A basic version is available at the English Wiktionary. You may also be interested in taking a look at the fully tabulated data, in spreadsheet form. The notes in my master file on word derivation and distribution are sketchy at best though, and will require further work to fill in. [2]

While this file is probably necessarily public domain, if anyone reading ends up using or referencing it somewhere, I would appreciate a shoutout or similar.

As comes to actual analysis, at this point the data mainly allows a look at root structure. I might as well note in this post some basic facts that stick out.

For starters, the usual stop phonation constraints (against **D-D, **T-Dʰ, **Dʰ-T) surface reliably. A more interesting related pattern emerges too: I’ve sometimes seen it suspected that the unusual PIE cluster *wr- could come from earlier *br-, therefore tying together with the lack of stem-initial *b-. (Not a lack altogether: at least in the preliminary data, *b still occurs often enough in stem-final position.) However, if this was assumed, we would end up with quite a large number of pre-PIE stems of the shape *b-D; 5 of the 12 roots with *wr- show a stem-final voiced stop; as in *wreg- ‘einer Spur folgen’. So either we’d need to also assume the reconstructible voicing constraints to have emerged only later; or to fine-tune this hypothesis to some kind of a chainshift like *bʰ- > *w-, *b- > *bʰ-.

I would be content to abandon the idea though and to instead assume that most cases of *wr- have rather arisen either thru the reduction of a 1st syllable of earlier roots (in PIE-internal terms ≈ as zero-grade derivatives of some root shaped *(C)wer-, *Cewr-), or thru some Schwebeablaut-ish metathesis process.

There is more interesting stuff going on with resonants. I do not recall seeing this discussed in the context of PIE root structure anywhere before (which of course could be ignorance on my behalf), but several non-trivial constraints on their distribution are apparent. Here are some quick observations on this topic:

  1. No roots — or perhaps better: “sonorant cores” of a shape **-R₁eR₁- occur. This is a fairly trivial application of the universal principle of Similar Place Avoidance, though.
  2. No cores of a shape **-ler-, **-rel- occur either. Again, this is fairly simple to understand as similar consonant avoidance.
  3. The core **-nel- is also absent: this seems less expected, but may have the same motivation as the above. It could also be an accidental gap, though, as onset *n- is relatively rare altogether, and *-len- is well attested. Perhaps it is rather the abundance of *-ney- and *-new- roots that should be questioned.
  4. *m in the onset does not appear to quite count as a sonorant. There are just about no roots beginning with a cluster *Tm-, where *T would be a stop consonant (the lone example is *dʰmeH- ‘blasen’). We do find *sm-, *Hm-, but then again, *sT- and *HT- are possible just as well.
    This also lines up well with how a few cases of *mR- occur as well. Historically, they seem likely to be mostly “zero-grade clusters” again; but this etymological explanation does not suffice to explain the absense of other sonorant-sonorant clusters such as **nR-, **lR-.
  5. Sonorant cores of a shape *-yeR- seem unexpectedly rare altogether. No examples with **-yel-, **-yer-, **-yen- occur at all, and only a single example of *-yem-.
  6. Conversely, even when looking at roots with stem-final obstruents only, onset *-y- is curiously common preceding a stem-final back consonant (velar, laryngeal or *w): 29 cases out of 33, or 88%, show this environment! I wonder if we could assume that such roots reflect some specific pre-PIE front vowel, which was diphthongized to *ye before back consonants. It would likely have to be separate from the source of PIE *-ey- though, which does not seem to have any aversion against occurring before velars and laryngeals.
  7. Initial *h₂w- appears to be more common than all other laryngeal + glide clusters altogether, and it is also quite common stem-finally (i.e. as *-h₂w-, not *-wh₂-!). I wonder if this should be assumed to represent an earlier single phoneme such as *[ħʷ], created even further back from the ancestor of *h₂ by the same processes that led to the rise of the PIE labiovelar series?

I could extend my discussion to onset and stem-final consonant clusters as well, but they do not seem to show anything especially interesting for me to raise up just yet.

[1] Two corrections on reconstruction remain mysterious to me: an alleged removal of a root **meyH- ‘lang werden’ (the two roots I’ve recorded with this shape do not seem to have such a meaning), and the adjustment of a root *kelh₁- to *k¹elh₁- (no such root occurs in the original data; although the root *kel- ‘antreiben’ is adjusted to *kelh₁- in another correction).
[2] I have at the moment no recollection what the column labeled “st” signifies, but I am leaving it in for possible further elaboration.
edit: On re-checking the data, apparently this indicates the number of branches with verbal reflexes given by LIV in the running text. However, footnotes often list nominal derivations, and closer checking also shows that some entries even list a few additional uncertain verbal reflexes in footnotes… meaning that this will be not quite an actual measure of the distribution of the reflexes. Perhaps I will remove this in later editions.

  1. David Marjanović says:

    Awesome. The compilation will be very useful, and I expect you to get at least one paper out of the patterns you’ve found. 🙂

    Not a lack altogether: at least in the preliminary data, *b still occurs often enough in stem-final position.

    I wonder how many of these cases will disappear once Kluge’s law goes mainstream.

