Similar Place Avoidance in language history

An interesting paper I’ve found a couple days ago: Pozdniakov, Konstantin & Segerer, Guillaume (2007). Similar Place Avoidance: A Statistical Universal. In: Linguistic Typology 11:2.

The main thesis is relatively simple: most languages of the world disfavor word roots where the word-initial and word-medial consonants have the same place of articulation; and, more generally, word roots combining two peripheral (labial, velar) or two “central” (dental, alveolar, postalveolar, retroflex, palatal) [1] consonants.

I have also independently discovered this principle some time ago in my exploration of statistical properties of phonotactics in the Uralic languages. Unlike P&S, though, my first reaction was not to assume status as a defining characteristic of Uralic in general. Certainly its occurrence in well-separated branches of the family seems to require its occurrence in Proto-Uralic as well… but who knows how much further back does it go? I do not recall seeing very many word roots shaped anything like √kag- or √bomp- in almost any Eurasian language at all, really. I have had an impression they’d be slightly more common in some Niger-Congo languages — but apparently not. (Seeing what the results are for Japanese might be also interesting; the language seems to be quite rife with words like tatami, tsunami, kami, fuku, fugu. But I am not sure if Internet Japanese™ constitutes a representative sample.)

Some further observations on the topic:

The maintenance of SPA

A question that I did not see covered in the paper is that the maintenance of SPA in languages requires a degree of diachronic stability of consonant POA classes. Now indeed, as a first approximation, while fluctuations between different types of e.g. coronals (ts > tθ > θ > ð > d > l > …) or velars (k > x > ɣ > g >…) are commonplace sound changes, it’s much rarer to see consonant evolutions such as *p >> *d or *d >> *x.

But the boundaries are still not impermeable. Quite a few relatively general sound changes are known across the world [2] that convert consonants from peripherals to centrals, or vice versa:

  • Labial > palatal: e.g. *w > *j in Hebrew
  • Coronal > labial: e.g. *θ, *ð > /f/, /v/ in Latin and the other Italic languages; similarly *t > *θ > /f/ in Rotuman
  • Coronal > velar (or uvular): e.g. *š > *x in Finnic, Spanish, Pashto…; *t > *k in Oceanic languages such as Samoan, Hawai’ian; *r > ʀ/ʁ in continental Western Europe; *ɫ >ʁ in Armenian
  • Velar > palatal: *k, *g > *c, *ɟ > tɕ, dʑ — a frequent change: Satemic IE, Romance, etc.

This raises the question of how the strength of SPA evolves in languages. Changes of the above sort, applied to a language that follows SPA, will necessarily decrease its SPA-compliance. If *š frequently co-occurs with velars, and rarely co-occurs with coronals, then a change *š > *x will introduce a larger number of velar-velar roots than velar-coronal roots. It follows that there must also exist some mechanisms that increase the SPA-compliance of a language.

A naive assumption that P&S summarily dispatch would be sound changes running in the opposite direction: place dissimilation to re-establish SPA, a la (? *kaša >) *kaxa > *taxa. Yet this is not a commonly attested type of change at all (the only example I can think of of is *t > *k only when a 2nd *t follows; attested IIRC from one of the Oceanic languages [3]), and it clearly cannot be a relevant factor.

My hypothesis is that lexical loss is not random. Suppose a language had two synonyms /maba/ and /suba/ for expressing a given concept; then over time, as the language splits into descendants, SPA-violating /maba/ would be more likely to be lost than the SPA-compliant /suba/. A motivation for this could be that SPA-violating roots are generally found to be more “childish” or “non-serious” in sound, and that they’d be more likely to go “out of fashion”. (Pop quiz: which of the sets {boob, dude, google}; {duty, goop, boogie}; {duke, good, butte} do you find the funniest-sounding?)

This is, in principle, a testable proposition. Take for example the interdental > labial shift in Latin. I would predict that PIE roots that display the change *dʰ >> /f/ ~ /b/ are more likely to be lost or marginalized in Latin (both in early Latin and later on in Vulgar Latin) when there is an original labial or velar consonant in the root as well. Or, in the other direction: I would predict to be able to trace the ancestry of words like duke, on average, a longer way back than that of words like dude.

Affricate co-occurrence

P&S further divide the SPA principle into a couple statements of different strength. The “general” version is that peripherals avoid any other peripherals, and centrals any other centrals; while the “strict” version is the rarity of, especially, word roots with two consonants of the same exact POA. They discover, however, one major divergence from even the last: the Bantu languages apparently feature a high number of word roots with two palatal consonants. I’d guess this represents an assimilation development of some sort. Perhaps the palatal series represents the merger of former palatalized alveolar and palatalized velar series? This relatively frequent development would easily leverage the apparently universal abundance of TK and KT roots to produce instead an abundance of CC roots.

— In Uralic we find no evidence for an especially strong co-occurrence of palatals. However, the postalveolar affricate *č has a strong tendency to “repeat”. There is a remarkable number of  old Uralic roots (some of these more, some less secure) such as:

  • *čača- ‘to be born’
  • *čača- ‘to walk’
  • #čEnčä ‘back’ ~ ‘tail’?
  • *čänčä ‘goose, duck’ (from Baltic *džans- < PIE *ǵʰans-)
  • *čëčə ‘duck’ (perhaps also from the above PIE root somehow)
  • #čečə(kä) ‘moment’
  • #či(n)čä ‘little bird’
  • *čoča- ‘to sweep’
  • *čo(n)čə ‘netstring’
  • *čučkə ‘block of wood’

Perhaps a partial explanation would be some sort of consonant assimilation phenomena. At least the 3rd word seems to have involved an assimilation *č-s > *č-č. And a couple of these roots are reflected in Finnic and Samic as if coming from original *ć-č  — yet not all, as shown by Finnic *häntä ‘tail’, *hetki ‘moment’, Samic *cōccë ‘netstring’ (provided the Uralic etymologies for these are valid: they all involve some irregularities). And maybe the “dissimilating” roots should hence be similarly reconstructed as dissimilar to begin with.

We could also wonder if this should be taken as evidence for an origin of some cases of *č via palatalization from earlier velars.

…and other reduplications

P&S also find, though, that at least some languages can have a tendency to favor “reduplicated” roots (their example is Wolof), with the exact same consonant in the root-initial and root-medial positions. Obviously in a language with several consonants per POA, this effect will be overshadowed by the numerous other combinations possible — so /b-b/ could end up relatively frequent, but cases like /b-m/, /b-v/, /b-f/, /b-ɓ/ etc. will still remain rare.

From my initial observations, though, this does not keep up in Uralic, where classes like “labials” are frequently limited to only a single obstruent *p, the nasal *m and the glide *w or *v. The Proto-Sami lexicon, [4] for example, contains less than two dozen PP roots, and most of them are either of the shape *m-v, *p-v; *v-m, *v-p; or *v-v. There is only one root of the shape *p-m; none of *p-p, *m-m or *m-p.

The occurring cases incidentally can be shown to be in large part secondary innovations. E.g. the 2nd class contains *vāpsē ‘blade of mitten’; *vipsë ‘skein’; *vēpsēs ‘wasp’; *vōmë ‘width’; *vōmē ‘woods’; *vōmā- ‘to notice’; *vōmtë *body cavity’; *vōmtē- ‘to sell’; *vōpējē ‘narrow bay’; *vōpērēs ‘three-year-old reindeer bull’; *vōppë ‘father-in-law’; *vōppō- ‘to pluck’; *vōpsë ‘mesh in a fishtrap’; *vōptë ‘hair. Most of the roots here seem to have involved the PS development *a-, *o- >> *vō-. All the rest involve the cluster *-ps-, though I’m not sure what to make of that fact.

Cluster complications

Another question the paper does not address is how should one analyze heterorganic consonant clusters. Most languages of the world prefer a simple CVCV syllable shape over CVCCV. The latter type is regardless fairly popular in some languages. E.g. my index of the Proto-Sami lexicon contains about 920 roots with clusters, about 600 without. So are clusters to be counted as “medial consonant preceded by a coda”, or as “medial consonant followed by another medial consonant”? Is a word such as PS *tolkē ‘feather’ more or less SPA-compliant than PS *kōlkë ‘hair’? The second does have a neat alternating POA structure; but both the syllable onsets are velars. Which of these is more relevant?

From a preliminary look, it stands out that the relative frequencies of 1st members of clusters resemble quite closely the relative frequencies of single medial consonants — while the relative frequencies of 2nd members of clusters closer resemble the relative frequencies of onset consonants. This would seem to suggest that we should indeed be comparing the first two consonants. But the details could fare differently.

Let’s take a sneak peek at velar/velar combinations for example:

  • *kVkV: severely underrepresented (predicted 18, attested 4)
  • *kVŋV: severely underrepresented (predicted 7, attested 1)
  • *kVkCV: underrepresented (predicted 19, attested 12)
  • *kVŋCV: severely underrepresented (predicted 6, attested 2)
  • *kVCkV: underrepresented (predicted 43, attested 31)
  • *kVCŋV: overrepresented (predicted 3, attested 5)

It seems to be here indeed the case that at least word roots like *kōsŋë- ‘to touch’ are patterning as POA-alternating (= not in violation of SPA). But the underrepresentation of *kVCkV does not fit this hypothesis. Though… the data could also be confounded by one of the most frequent -Ck- clusters being the homorganic *ŋk. I’d need to crunch more numbers here to say for sure.

There’s clearly much to be made of this topic; I am only scratching the surface so far.

[1] They actually use the term “medial”, but I will not, as this seems likely to be confused with “word-medial”.
[2] That is, discounting cases of local assimilation such as np > mp, mt > nt.
[3] I recall Robert Blust covering this topic in his paper __. I seem to have displaced my copy of it, though.
[4] Again, as per Juhani Lehtiranta’s Yhteissaamelainen sanasto (1989/2001).

8 comments on “Similar Place Avoidance in language history
  1. David Marjanović says:

    Pop quiz: which of the sets {boob, dude, google}; {duty, goop, boogie}; {duke, good, butte} do you find the funniest-sounding?

    Actually, because of the meanings of these words*, I find it hard to decide between the first and the second set…

    * I note you were careful to use butte rather than butt. But then, but would have worked just as well.

    • j. says:

      The main point against the inclusion of butt (or but) is of course not the semantics, it’s being pronounced /bʌt/ and not /bʊt/.

      On second thought though, I have no idea why I didn’t just use boot

  2. Alex Fink says:

    Perhaps regular place dissimilation is too infrequent to make a difference in bringing about SPA. But why do you dismiss irregular place dissimilation as a contribution, individual instances accreting over the years?

    As another anecdatum, I think /but/ and /gut/ sound less silly than /dut/, but /dup/ and /duk/ sound just as silly as /dut/. The forms with /ju/ are dramatically less silly, though. (Perhaps because my native lect is yod-dropping, so the affect conveyed by e.g. /djuk/ is only the posh stereotypical-Brit-to-an-American one.)

    • j. says:

      Well, depends on what you mean by ‘dismiss’… Less regular cases that add up to dissimilation certainly exist, but they don’t seem to be especially frequent in the comparative data to be a particularly strong counter to regular changes that gradually homogenize consonant distribution. Also it is usually difficult to tell what the motivation behind an irregular chance was exactly. For example, take the Finnic 3rd person singular pronoun, which irregularly shifted from *sän to *hän. The standard explanation is lenition originating in unstressed prosodic positions (since *-s- > *-h- is regular after unstressed syllables). But we could also choose to look at this as a shift from an alveolar-alveolar consonant skeleton to a glottal-alveolar one: i.e., dissimilation?

  3. Alex Fink says:

    I just ran into something that reminded me of your mechanism for SPA. Viz. Kuryłowicz’s hypothesis, regarding the limited (Gothic and?) Old Saxon textual evidence for Kluge’s law, that words with geminate voiceless stops might’ve been avoided in Christian texts ’cause they sounded too informal i.e. funny, this on account of their prevalence in nicknames.

    • j. says:

      I’ve seen that one as well, yes. Maybe similar effects could be found elsewhere too. E.g. as you may have heard, in Finnish short ö is coded as typical of affective vocabulary, so maybe the few neutral words where it appears (e.g. köhä ‘cough’, pölli ‘short log’, pöly ‘dust’, öljy ‘oil’) might also have been avoided in some “serious” literature.

      (At the least, Genesis and similar passages have Adam and the rest of humanity built out of the tomu of the earth, using a less common synonym. But it’d take a closer investigation than this.)

  4. crculver says:

    Does SSA provide an etymology for Fi. tomu? I think I can do something with this word but I unfortunately won’t have access to SSA for another 10 days.

    • j. says:

      SSA suggests onomatopoetic origin, from the same stem as the verb tomahtaa ‘to go *thump*; to hit and rouse up dust’ (in modern Finnish usually “descriptified” to tömähtää). Tomu itself is found in Finnish, Ingrian, Karelian, Ludic, Veps, and “some” dialects of Estonian. They cite some attempt to link this with words in Swedish (damb ‘dust’), Mari (lommuž ‘dust’) or Samoyedic (not mentioned).

