Consonant clusters growing, wilting and syllabic

From a Uralicist perspective, one thing that I find goes underappreciated in Indo-European studies is the extensive phonotactic complexity of most IE languages. Certain types of studies on PIE consonant clusters can be found these days in abundance, yes… but these mostly focus on the resolution of the most extreme things that the morphology of PIE, with its abundant zero-grade morphemes, can come up with: monstrosities like *HHR-, *CRH-, *RHC-, *-CHCR-. The fate of the more common, though still remarkable on a worldwide scale, consonant clusters like *bʰl-, *sp-, *tw-, *-zd-, *-ktj- appears to be considered basically trivial. (I am open for reading suggestions, though: IE studies is a big field and I expect I am still missing out on many specifics.)

Within Europe, at least the fate of simple two-consonant initial clusters really is at least mostly trivial, though. The Germanic and Balto-Slavic languages retain most PIE initial clusters fairly well, incidental changes in the individual consonants aside (as in *tw- > English thw-, Lithuanian tv-). Latin and Greek are not far behind, though they mostly get rid of *sR clusters (as in e.g. slime ~ līmus; snow ~ nix). We would have to look at Albanian and the more eastern languages (Armenian, modern Indo-Iranian) before seeing major cluster simplification or transformation trends. As for Celtic, Tocharian and Anatolian, I can’t say I have much of a handle on the big picture at all… which is one reason why a detailed overview of phonotactics issues in the IE languages, either from the perspective of particular classes of clusters or particular languages’ overall histories, would sound appealing to me.

To be fair, it’s not as if this kind of a thing has been done much in Uralic studies either. There have been a few phonotactic analyses of the cluster stock in various reconstructed proto-languages, though with naïvely synchronic methodology. From a more firmly diachronic angle, a few interesting topics that may require more detailed investigation could be

  • the nearly complete cluster simplification trends in Permic, Hungarian and Enets, transforming the inherited *(C)V(C)CV root structure into roughly √(C)V(C)(V). To a lesser extent similar things happen also in e.g. Mari and Proto-Samoyedic.
  • the rise of numerous complex clusters in Mordvinic, e.g. in initial position, Erzya kši ‘bread’, kšna ‘strap’, pśkiźems ‘to have diarrhea’, promo ‘gadfly’. This seems to run a bit too deep-set to be blamed just on late Russian influence: the first two are earlier Baltic or Balto-Slavic loanwords (~ Fi. kyrsä ‘loaf’, hihna ‘strap’), the last two native Uralic (~ Fi. paskoa ‘to shit’, paarma ‘gadfly’).
  • the slightly less daunting but still strong expansion of consonant cluster complexity in Finnic (as I’ve briefly covered before) and Samic, probably mainly due to Indo-European loanwords.

But back to IE, for a few scattered observations.

At least one of the initial consonant clusters reconstructed for Proto-Indo-European is an exception of sorts to any retention tendencies, even from an European perspective. This is *sr-: the cluster is alien to most European languages today, even ones that may otherwise allow sibilant+/r/: English shr-, German schr- from earlier *skr-. (The Slavic languages do have newly created examples though, generated after syncope; e.g. Polish srebro ‘silver’ < *sьrebro.) Given the wide palette of word-initial clusters of the type CR- and even sTR- tolerated in IE languages, this is a notable hole in the system.

In Greek *sr- is simplified the usual way, through *s-aspiration, yielding word-initial ῥ- /rʰ/. Elsewhere, however, special developments seem to kick in.

Germanic and Balto-Slavic share here a non-trivial isogloss: *sr (of any position) is resolved by epenthesis of *t, generating correspondences such as stream, Latvian straume, Polish strumień ~ Greek ῥεῦμα (< *srew-m-os, *srew-m-eh₂). The change has however not reached standard Lithuanian, which still has e.g. sraumuo; [1] therefore showing that this is a relatively late diffused sound change, not a data point in favor of a Germano-Balto-Slavic proto-dialect. Perhaps even one that has been innovated multiple times in parallel: homorganic stop epenthesis in clusters of continuant+glide is commonplace after all (æmyrge > *emrə > ember in English surely requires no especial connection with hominem > *homre > hombre in Spanish), and while the phonetic development is less trivial here, the prior existence of *str- has probably helped to motivate *t-epenthesis.

This sound change likely also accounts for the intrusive -t- in ‘sister’ in Germanic (sister etc.) and the relevant parts of Balto-Slavic (OCS сестра, Old Prussian swestro, but again, Lithuanian sesuo; and as I’m looking these up, I am also learning that Latvian has apparently lost this word entirely!). This was probably generalized from the genitive, *swesrés or *susrés. Some degree of analogical support from the mother, father, brother, daughter group surely has played a part as well, but I would think the fact that this only occurs in languages that also show *sr > *str as a general sound change is not a coincidence.

This development also seems to have interesting interaction with the PIE syllabic consonants. Some time ago I ran across a small article by Krzysztof Witczak (1991), “Indo-European *sr̥C in Germanic“, which proposes that this epenthesis also took place before syllabic *r̥. The evidence is scarce but looks believable. Interestingly, this then demonstrates that at some point an actual syllabic [r̩] must have indeed occurred in Germanic (contra some of my earlier suspicions that some kind of an epenthetic schwa might have been hanging around all along in here).

Also, returning to ‘sister’: while I have no ready means to see if this checks out in the other older Germanic languages, Wiktionary actually gives a PGmc genitive *swesturz > Gothic swistrs, which looks more like pre-Gmc *swesr̥s.

Even more interestingly, there seems to be some evidence for similar business also in Baltic.

The word for ‘roe deer’ in Latv. and Lith. is stirna, corresponding to Slavic *sьrna. These look like derivatives from the ‘horn’ root, *ḱer(h₂)-, or in particular the derivative *ḱr̥(h₂)nos, as reflected also in e.g. Germanic horn. Derksen’s etymological dictionary of Baltic (2015) has no comment other than that “the anlaut is problematic”… I suspect however that the Baltic words could be explained by a development *šr̥ > *str̥, taking place before the breaking *r̥ > *ir. [2] This all will also have to be later than *ḱ > *š, but this is already assured to be quite early by the evidence of loanwords in Finnic.

On the other hand, there are more than enough other words, even derivatives from this same root, that show no such epenthesis, e.g. Old Prussian sirwis ‘roe deer’ < *šr̥wis (whence also Fi. hirvi ‘elk’); Latvian sirsenis, Lithuanian širšė ‘hornet’ < *šr̥Hšō (whence also Fi. herhiläinen). To get around this issue, we would probably need to assume either dialect mixture of some kind — as will be already required to explain why we have *t-epenthesis now showing up in Lithuanian also. An irregular shift from *šr̥nos to *sr̥nos might also work. (Or as long as I’m fucking around with relative chronology, even the regular shift of *š to *s in Latvian?)

This is moreover complicated by how all these words must be, to some degree, analogical anyway. The reason for this is “Weise’s Law”: [3] the neutralization of *Ḱr- and *Kʷr- as *Kr-, common to all Satem languages. We would again not expect this to distinguish between syllabic *r̥ and non-syllabic *r, and apparently the Sanskrit data indeed confirms this. Thus Balto-Slavic *šr̥nas and other such derivatives (including, from Sanskrit, śiraḥ ‘top’ < *ćr̥Has) would have to be assumed to get their palatal onset by analogy with the abundant other derivatives of *ḱer(h₂)-. So… another possibility is then that stirna is the earliest word where *ḱ > *š was restored in this way, followed by epenthesis, followed by the remaining cases of analogical *š-restoration.

Or maybe this is all barking down the wrong root entirely. Something that also looks worth further investigation is if the Baltic words for ‘roe deer’ might be actually rather cognate with German Stirn?


A different angle on getting rid of *sr- is exhibited in Italo-Celtic: > *θr- > fr-, reflected at least in Brythonic (e.g. Welsh ffrwd ‘stream’) and in Latin (the best examples seem to be word-medial and have an expected further development to -br-, e.g. crābrō < *kr̥Hsrō ‘hornet’). Irish has what looks like retained sr- (e.g. sruth ‘stream’). Schrijver proposes that this is a reversal from the *θr stage, [4] but given the situation in Baltic, I would not bet on it. Note that reversal in Lithuanian is clearly not possible, since inherited *str- remains. Again, it seems plausible that the first stages of the Goidelic/Brythonic split go far back enough that the latter could have still participated in common developments with Italic.

Irish also seems to have a general shift *st- > s- (ser ‘star’, sab ‘staff’, etc.), so actually even an earlier development of the Germanic-Balto-Slavic flavor is theoretically possible.

A quick scan-over of IE etymological sources at my disposal reveals no special developments of *sr̥- in Celtic or Latin. LIV has two Latin examples that seem to have retained s-: sariō ‘I hoe’ < *sr̥h₃yé-, sarciō ‘I mend’ < *sr̥kyé-. Witczak’s article gives Latin fariō ‘salmon trout’, compared with the Germanic sturgeon word family and derived from *sr̥Hyón-; but this also seems to come from Old Latin sariō, thus aligning with the previous group. That these all have -ar- rather than the usual -or- as the reflex of *r̥ however probably indicates a relatively early epenthesis of *ə > *a. Schriver reconstructs a rule *CCCC- > *CaCCC- being already common Italo-Celtic (argued in full in The Reflexes of the Proto-Indo-European Laryngeals in Latin).


At any rate, the moral is that simplifications or epentheses in consonant clusters of the shape *CR might make a more general opening for investigating the history of the PIE syllabic sonorants.

I’ve another example as well, though probably less illustrative. Sticking still to the European languages, there is perhaps something to be made of PIE *Tl-. Word-initially this was a rare cluster, but one established example is *dl̥h₁gʰos ‘long’ (> e.g. Slavic *dьlgъ, Greek δολιχός, Sanskrit dīrgha-). Now, the Baltic languages are known to have word-medially eliminated *-tl-, *-dl- by dissimilation to *-kl-, *-gl-. So would we find a similar initial development here?

We do not; but we do find something unusual: wholesale loss of the initial consonant, resulting in Lith. ilgas, Latv. ilgs! Perhaps this could be again explained by assuming word-initial *Tl-, *Tl̥- > *l-, *l̥-, already before *l̥ > *il? A previously known case with non-syllabic *Tl- is Lith. lokys, Latv. lācis ‘bear’ ~ Old Prussian clokis ‘bear’ (which would then show that this simplification is Eastern Baltic specifically). Unfortunately, there are again also several counterexamples with *Tl̥- > *Til-, e.g. Lith. tiltas, Latv. tilts ‘bridge’ < *tl̥h₂tós. Go figure…

[0] This post has been prompted by me resuming work for a little while on constructing a reference table on the fate of PIE consonant clusters on Wikipedia.
[1] Jānis Endzelīns (1973), Comparative Phonology and Morphology of the Baltic Languages: 73 informs that other dialects of Lithuanian, however, do have this change, and so we can also rule out this as a datapoint in favor of a Latvian-Slavic grouping (as has sometimes been suggested). Interestingly even Old Prussian has this epenthesis, so this all could instead testify for the Latvian-Lithuanian split, maybe even some of the inter-Lithuanian dialect splits, going quite a while back. — Most evidence I’ve seen in favor of the East Baltic group in fact looks quite easy to reinterpret as more or less areal: e.g. the sound change bundle *ai > *ei > *ē > ie is basically trivial, and has parallels in most neighboring languages (the first in Slavic, Scandinavian and core Finnic; the second in Swedish and Livonian, as well as Slavic in a different form; the last in Western Slavic and in most of Finnic).
[2] I’m not going to start probing the issue, but a sound change or two along the lines of *št > *st might also help in explaining the famously inconsistent application of RUKI in Baltic; e.g. Lith. pisti (not ˣpišti) ‘fucks’ ← PIE √peis- ‘to crush, push’.
— It also just now occurs to me that western Uralic *pisə- ‘to put, stick (in)’ (Samic, Finnic, Mordvinic, Mari) is probably derived from this last-mentioned IE root. This contrasts with widespread native Uralic counterparts: #pënə- ‘to put’ (absent only from Samic and Hungarian), #texə- (maybe *tejwä-??) ‘to push’ (F, P, Hu, Ms, Kh), *puskə- ‘to poke’ (S, F, Ms, Kh), which is usually a good indication for an innovation of some sort.
[3] An old idea, but only recently named and reviewed by Kloekhorst. — I would suggest though that his group of six counterexamples involving derivatives of the type *CeḰ-ro- should not be accounted by “phonetically regular analogy”: they might rather indicate Weise’s Law applying only to syllable-initial palatovelars (*Ḱr-, *-Ḱr̥-) but not to syllable-final ones (*-Ḱ.r-). This would also cover his three counterexamples of the shape *CeḰ-ru-, in which case there is then no need to date the law as any older than common Satemic.
[4] Schrijver, Peter (2015): “Pruners and trainers of the Celtic family tree“.

Advertisements
Tagged with: , , , , ,
Posted in Reconstruction
46 comments on “Consonant clusters growing, wilting and syllabic
  1. Kathryn Spence says:

    On Gothic swistrs*, the ending can be an inner-Gothic innovation due to analogy with the consonant stems (cf. gen.sg. alhs*, dat.sg. alh*, etc.), while the t* can simply be levelled in from the other cases (already in PGmc, though)

  2. David Marjanović says:

    Germanic and Balto-Slavic share here a non-trivial isogloss: *sr (of any position) is resolved by epenthesis of *t,

    Promptly reapplied in Czech, where “silver” is střebro and “middle” is středa.

    In Germanic, however, this is only the outcome under Grimm conditions; the Verner outcome is just *r.

    More later.

    • David Marjanović says:

      I keep forgetting to link to the source for the Grimm and Verner outcomes of *-sr-; also, the *z disappeared with compensatory lengthening of a preceding short vowel.

  3. Daniel N. says:

    Modern Albanian, apparently no sr-; Romanian, no sr-. But South Slavic (Slovene, Croatia/Serbian etc.) are happy with sr- (e.g. sretati ‘meet’, sreća ‘happiness’, sretan/srećan/srečen ‘happy’, sredina ‘middle’… but an earlier word: struja ‘stream’ (no sr-).

    How come? Apparently, sr- was not strange to some people in this area. But Albanians and Romanians don’t have it…

    • j. says:

      The same way as the Polish example I gave above: on one hand through the Fall of the Yers (the weak vowels *ь, *ъ) around the 12th century I believe; and on the other, through the breaking of “liquid diphthongs” (syllables ending in *r or *l) slightly earlier in late Common Slavic, around the 9th century. (Derksen’s etymological dictionary of the Slavic inherited vocabulary gives for Proto-Slavic *serdà ‘middle’, *sъrěsti ‘to encounter’.) This is likely long after the original epenthesis *sr > *str.

      So to be clear, when I say “the Slavic languages have newly created examples”, I refer here to the Slavic languages as a whole, not just Polish.

  4. David Marjanović says:

    CR- and even sTR-

    That may be the same thing: presigmatized stops have been postulated for (P)IE.

    Compare (starting on p. 138) the prelateralized stops in a Tibetan language spoken in Ladakh, and likely also in an ancestor of Central Tibetan.

    • j. says:

      Yep, I know the proposal, though I’d like to see at least a response from the people who instead analyze these by proposing that IE languages allow extrasyllabic /s/ at the beginning or the end of a word (foremost Byrd re PIE, but also e.g. Kobayashi re Sanskrit, and dozens of phonologists working with modern European languages). That seems to have some advantages; e.g. it also covers the existence of word-final -ξ -ψ but no other clusters in Greek. I could add that if things like /ˢt/ were to be truly monophonemes, why are there no PIE roots of the shape *CeˢT?

      • David Marjanović says:

        The exception for the beginning of the word wouldn’t be needed under that proposal, while that at the end is already needed for other things, and not just for /s/. (But of course I’d like to see a response, too.)

        Some Cretan inscriptions have verbs in -ns. Other than /s/, which consonants could end up behind another consonant at the end of a word in the first place in Greek?

        I guess *CeˢT- would be prone to reanalysis as *Ces- followed by a suffix with *t-… and isn’t ˢT supposed to be restricted to initial position anyway, where it’s an emergency solution to the sonority hierarchy?

      • David Marjanović says:

        A sort of response is here (open access).

  5. Blasius B. Blasebalg says:

    “At least one of the initial consonant clusters reconstructed for Proto-Indo-European is an exception of sorts to any retention tendencies, even from an European perspective. This is *sr-: the cluster is alien to most European languages today, even ones that may otherwise allow sibilant+/r/: … Given the wide palette of word-initial clusters of the type CR- and even sTR- tolerated in IE languages, this is a notable hole in the system.”

    A very modest observation: /sr/ seems to be harder to pronounce than either (stop) + /r/, /sw/, /sv/, /sl/, /sm/ or /sn/, simply because of the similarity/proximity between the two consonants. I don’t really understand why, but /sr/ also seems intrinsically (independently from phonotactic acquaintance) harder than /rs/. So among conbinations of two consonants, it makes sense to ditch /sr/ as one of the first.
    (Of course, it can work out differently: Greek -σϑ- seems at least as hard as -sr-, for a very similar reason, and also in the same way harder than the converse order as found in English ‘booths’.)

    I is not very clear how that relates to keeping old or constructing new monstrosities. But I think that this “proximity complexity” is a rather separate motivation for cluster reform than sheer length of clusters.

    • j. says:

      Oh yes, I agree it’s not random. There’s something oddly phonetically tricky in going from a sibilant to a rhotic. FWIW Finnish has an optional allophone [ɹ] of /r/ in this cluster, even though it only occurs in compound words and in the proper name Israel.

      It is not very clear how that relates to keeping old or constructing new monstrosities.

      Probably not a whle lot, I just suspect longer clusters are more likely to attract research.

  6. M. says:

    *θr- > fr-, reflected at least in Brythonic (e.g. Welsh ffrwd ‘stream’) […] Irish has what looks like retained sr- (e.g. sruth ‘stream

    It might be relevant that *sp– has almost the same range of outcomes in Celtic as *sr-: s- in Irish, ff- in British. (Cf. Irish seir “heel”, Welsh ffer “ankle”, cognate with e.g. English spur.)

    If this reflects the development *sp– > Celtic *– > Irish * / British ɸɸ-, then perhaps the development of *sr- in Celtic (prior to breakup) was actually *sɸr-?

    On the other hand, there is some evidence of at least sporadic *sr > *str-/*sθr- in British: cf. Irish srón “nose” vs. Welsh trwyn “nose”, alongside ffroen “nostril, muzzle”.

    Another issue is that Irish retains traces of the older labial articulation in *sp-: the s- of e.g. seir becomes f- when lenited. By contrast, there is no evidence (that I know of ) of the s- in sruth etc. being lenited to f-; instead, it lenites to h-, like simple s- with no adjacent consonant.

    —-

    On another topic:

    PIE √peis- ‘to crush, push’.
    — It also just now occurs to me that western Uralic *pisə- ‘to put, stick (in)’ (Samic, Finnic, Mordvinic, Mari) is probably derived from this last-mentioned IE root.

    Crushing/pushing is clearly a different action from sticking, and the number of matching phonemes involved here (three) is close to triviality. No disrespect, but how does this allow us to jump from “maybe” to “probably”?

    (You may be able to find a handful of cross-linguistic examples of the semantic change “push” -> “put”, but that doesn’t make the case for better-than-neutral probability.)

    This contrasts with widespread native Uralic counterparts: #pënə- ‘to put’ (absent only from Samic and Hungarian), #texə- (maybe *tejwä-??) ‘to push’ (F, P, Hu, Ms, Kh), *puskə- ‘to poke’ (S, F, Ms, Kh), which is usually a good indication for an innovation of some sort.

    Not the first time I’ve said this, but: if only about 400 solidly-reconstructible roots have been discovered for Uralic, then how does the slightly restricted distribution of a root (in this case, *pis-) give any strong indication that it is non-Uralic?

    Your description implies that *pis- is found in all the main branches of West Uralic except Permic, which is no sparser a representation (by the metric you seem to be using) than that of the ”poke”-root you mention alongside it.

    • j. says:

      Thanks, very interesting comments about Celtic. I really should look more into those languages at some point.

      Crushing/pushing is clearly a different action from sticking, and the number of matching phonemes involved here (three) is close to triviality.

      Yes, the bare PIE/PU match doesn’t look too convincing. Glossing the root as ‘to crush’ seems to be based more on the reflexes in the “classical branches” (Latin, Greek, Indo-Iranian) though. Slavic *pьxa- means mainly ‘to push, shove’ (some reflexes also ‘to prick’), and the Lithuanian would probably require an even more specific earlier meaning ‘to push in, put in’. The Balto-Slavic forms also seem to be consistently in zero grade. At this point we have a just about exact match with Uralic.

      if only about 400 solidly-reconstructible roots have been discovered for Uralic, then how does the slightly restricted distribution of a root (in this case, *pis-) give any strong indication that it is non-Uralic?

      It doesn’t, no. The point is instead that we have quite a few unrelated Uralic word groups crowding in this semantic area, so it would be good news if one of them turns to be a loan instead of inherited. I think distribution is relevant mainly in that *pisə- is conveniently localized in exactly the groups where we know of numerous Balto-Slavic loanwords. If it had reflexes in some Siberian branch, that would surely make the loan etymology more dubious.

      • Ante Aikio says:

        The root *pisi- appears to have quite regular cognates in Ob-Ugric and Samoyed, too: Proto-Khanty *päl- ‘stick, sting, stab’ (VVj pel-, etc.), *pälǝt- / *pältǝ- (Irt pettǝ- ‘stick (with something)’, O pelǝt- ‘strike (flint)’); Proto-Mansi *pätt- ‘shoot; kill’; Proto-Samoyed *pǝtǝl- (Tundra Nenets pǝdǝl- ‘stick upright; put up (a tent)’), *pǝtmä (Tundra Nenets pǝʔḿa ‘sharp’, etc.).

        Previously Khanty *päl- has been compared to Mansi *piil- (and also Mordvin pel’ems, North Saami beađđa-), but this etymology is not phonologically regular. So *pisi- is clearly a Proto-Uralic root. Moreover, Khanty *pälǝt- / *pältǝ- and Mansi *pätt- correspond regularly to Finnish pistä- ‘sting’ and Saami basti- ‘be sharp’, so a Proto-Uralic derivative *pis-tä- can also be reconstructed.

        • j. says:

          Very interesting. That would be quite a bit of almost perfectly regular reflexes. (And perhaps Mansi *peel- can then be analyzed as a loan from Khanty.) Checking now also Hungarian, we could also consider adding fejel ‘to push, headbutt’ (Fi. ‘puskea, pukata’), if < *fe.el- < *fɪhəl- < *pisə-lə-, and so = Samoyedic *pətəl-? — But if this has /ɛ/ and not /e/, which I cannot check right now, it’s then probably better derived from fej ‘head’.

          Maybe this also actually tips the scales right over, and we should analyze the Balto-Slavic words as loans from Uralic? The semantic shift PIE ‘crush’ > Balto-Slavic ‘push’ seems possible enough, but not trivial. Also, in this direction the absense of RUKI in Uralic will be clear enough, while going in the other direction, it would require very early loaning.

    • David Marjanović says:

      British ɸɸ-

      The Welsh spelling system uses f for /v/ and ff for /f/ (…much like off and of in English, though that’s a coincidence, I think). There is no consonant length.

      PIE √peis-

      “Stick a pestle into a tall, narrow mortar”…?

  7. M. says:

    The Welsh spelling system uses f for /v/ and ff for /f/ (…much like off and of in English, though that’s a coincidence, I think). There is no consonant length.

    I know. The gemination was meant to express Celtic *sɸ- assimilating to the second consonant of the cluster (in British). This would probably have been clearer if I had added another step: *sɸ- > *ɸɸ- > *ɸ-.

    On the other hand, when I wrote “Irish *sɸ-“, the ɸ was supposed to be in superscript, but when I wrote “sup” between brackets (just as I do with bold and italic, successfully), the page ignored this instruction completely. I’m not sure if this is a general WordPress issue, or if the blog owner has turned this feature off.

    • j. says:

      No idea: I cannot see any settings for fiddling with what HTML is allowed in comments and what is not, but allowed HTML tags should by my understanding include <sup> (superscript) and <sub> (subscript).

      • David Marjanović says:

        I think these are the blog owner’s special privileges. Test: superscript subscript

        • David Marjanović says:

          Yeah. Most blog software doesn’t allow commenters to use super- and subscript. Whoever came up with this just didn’t think anyone would need them.

          • M says:

            Bizarre. Lots of blogs use mathematical/linguistic/etc. notation; why wouldn’t commenters to these blogs want to use these features as well?

            • David Marjanović says:

              WordPress (and Google/Blogger, and others) seem to think that blogs are just superficial fun, and that comments are only for saying “keep up the good work”. You should experience the comment system of Scientific American for a while – it’s horrible!

              • j. says:

                A reasonable hypothesis might be that only HTML tags that have equivalents in WordPress’ Markdown formatting are allowed. (I could enable that; it probably wouldn’t make it any easier for passing-by readers to format comments, but it might stop the bug where comments containing > and < signs have chunks of them discarded as “unknown html tags”.)

  8. David Marjanović says:

    A 2013 paper by Adam I. “We.” Cooper showing at length that the syllabic sonorants of PIE as currently reconstructed are not bizarre in global comparison.

    • j. says:

      I’m not sure if they have ever been considered outright “bizarre” (at least by typologically informed scholars). I could add to his examples Northern Mansi, where there is extensive alternation in /CVCC/ nominal stems being realized as [CVC.C̩ ~ CV.CəC] in the unmarked nominative singular, versus [CVC.CV…] in most inflected forms. Though it is unfortunate that Cooper spends very little time exploring the fact that standard PIE allows also stressed syllabics, which is probably the oddest feature of the system. Of course though this would also have precedents (famously e.g. in Czech).

      On the other hand, this seems to be still firmly in the tradition of trying to apply methods of synchronic typology to reconstructions, which tends to provide very little evidence on if some other reconstruction might be preferrable. Confirmation bias, in a sense: we assume reconstruction X, show that it is not totally outlandish, and think that this is somehow an argument in favor of this reconstruction specifically — when essentially all proposed reconstructions are, in fact, not totally oulandish. (It also completely ignores typology of language change.)

      But, it is a start, I suppose…

      • David Marjanović says:

        Good points.

        The question of stressed syllabic consonants is, in the PIE context, part of a different question: they’re all in zero-grade, and if zero-grade is vowel reduction in unstressed syllables, why are there any stressed zero-grades at all? That includes stressed *u and *i, which aren’t even worth mention in the typology of phonetics.

        The trouble with typological arguments from absence is that nobody knows enough languages to do it right. :-) I’ll present a fairly beautiful example later, it’s too late tonight.

        • David Marjanović says:

          This paper from last year argues, and argues very well, that the cuneiform Anatolian contrast of single-spelled and double-spelled consonants reflects a phonemic length contrast that must, moreover, be projected all the way back to Proto-Anatolian. Then it argues that this contrast was present in word-initial position, too, where the spelling can’t show it clearly, and argues that in this position the contrast was phonetically realized as one of voice. No mention is made of Swiss German, where there are “geminates all over the word” (2001) which are articulated as such even when they can’t be heard (2008) and all obstruents are voiceless in all environments. (They aren’t ever aspirated, glottalized or whatever either.)

          Then, in and around footnote 23, the paper briefly argues against word-final voicing in PIE, quoting a typological description of word-final voicing of plosives as “a rarity, perhaps even unparalleled”. It certainly is rare, but it’s not unparalleled: check out “Part Three: Crazy Rules” about halfway down this presentation from 2013. Specifically, never mind Old Latin: Lezgian has it today. On top of that, word-final lenition without voicing isn’t that rare. Navajo collapses plain, aspirated and ejective consonants into plain ones word-finally; the Central Bavarian dialects (like mine) have turned /t/ and /tː/ into /d/ word-finally (with a few restrictions), and this is even carried over into Austrian Standard German; although all obstruents lack voice/aspiration/glottalization/etc., there is still a fortis-lenis contrast for plosives in all positions. Thus, if at some point in pre-PIE previously voiceless lenes were voiced in all positions, word-final voicing would emerge as a side effect.

          A counterexample to final voicing is proposed: “The word šeppitt- ‘grain’, which is commonly derived from a PIE preform *sépit- with a stem-final *t, shows in the oldest stages of Hittite a gen.sg. form spelled še-ep-pí-da-aš, pointing to a phonological form /sépːitas/, with a stem-final lenis /t/.” This shortening is regular, but was soon undone, presumably by analogy: “This leaves only one possibility, namely that the stem-final fortis /tː/ was restored from the nom.-acc.sg. form šeppit.” All other forms would have been subject to the regular shortening just like the gen. sg.. “This requires, however, that this form was phonologically /sépːitː/[…], with a word-final fortis /tː/ that directly reflects PIE *t.” As far as I can tell, analogy is not so limited. “Shoe” in my dialect is [ʃʊɐ̯xː] in both singular and plural. Etymologically, you’d expect a short /x/ in both, and indeed there’s a phonemic contrast between /x/ and /xː/ in the position in question. However, all word-final long consonants were shortened between two rounds of apocope around the end of the Middle High German period; the second round of apocope left us with a pattern where many words end in a short consonant in the singular and a long one in the plural. This way of plural marking was apparently extended (I think I’ve encountered examples, but I can’t recall them right now) to words that never had a long consonant. Perhaps “shoe” was one of them – and then, in another round of analogy, the length was copied to the singular from the more common plural, on the pattern of the nouns that are identical in singular and plural.

          Of the long argument for the Glottalic Theory, I’ll pick “the presence of pre-glottalization, pre-aspiration and tonal features in several Germanic languages”. I don’t know what the tonal features are – the pitch-accent phenomena all have well understood origins that are much more recent –, but the whole glottalization stuff is absent in High German, evidently because it’s a byproduct of aspiration: in positions where phonemically aspirated consonants are unaspirated in English, the aspiration isn’t merely left off, but actively taken away by holding your breath. I was explicitly taught aspiration at the start of my 5th year of school, but I wasn’t taught glottalization, so I still don’t glottalize enough when I speak English. Does Dutch glottalize?

          Given the fact that Anatolian and Rest-IE are most easily understood as sister-groups, the question emerges whether the Proto-Anatolian plosive system with its length contrast or the Rest-IE system with its voice contrast is older. The latter option is deemed too implausible to consider (“typologically weak”), because “spontaneous, unconditioned lengthening/gemination of original short stops is a development that, as far as I know, is cross-linguistically unattested”, followed by a literature review that takes up the rest of the page. Missing from this review is any mention of Swiss German, where the MHG short lenes have remained short lenes, while all fortes, both long and short, have become long lenes. (To some extent this is still productive: the nickname for chocolate is famously Schoggi.) The fact that this happened in all positions is how the abovementioned word-initial long consonants came into being.

          I don’t mean to single out this paper (let alone its author, Alwin Kloekhorst); I just happened to read it yesterday.

          • j. says:

            I’ve seen the Kloekhorst paper too — although I was expecting you to point out a different typological blunder, which I find more pressing: lots of languages, including e.g. Old Church Slavic which I’d expect Kloekhorst to be aware of as an Indo-Europeanist, have a constraint against voiced affricates specifically. (Some others include: Bench, Rioplatense Spanish, Xwarshi…; for specifically *dz > /z/, e.g. Kalami, most dialects of Pashto.) Hence an affrication rule of /ti di/ → [tsi si] does not establish that the contrast would have to have originally been *tti *ti: it can be just as well derived through *tsi *dzi > *tsi *zi.

            “Pre-glottalization, pre-aspiration and tonal features in several Germanic languages” probably refers to Kortlandt’s argument on traces of glottalization in Germanic, which relies the most on Nordic. Earlier on, he used to even include also a very odd-looking analysis of the High German affrication as having been *ʔp > *ʔf > *pf etc. but, sensibly enough, seems to have abandoned this in (what I think is) his latest defense of the concept.

            • David Marjanović says:

              OCS did have /dz/, and Polish, Slovak and Macedonian still do. (Macedonian, like OCS, spells it using the Cyrillic letter Ѕ ѕ, a remarkable false friend derived from lowercase ζ.) */dʒ/ became /ʒ/ because there was no preexisting /ʒ/ to confuse it with, not because voiced affricates fell out of fashion. */tː t/ becoming /ts s/ is imaginable because they did become /ts sː/ in the High German consonant shift.

              probably refers to Kortlandt’s argument on traces of glottalization in Germanic, which relies the most on Nordic.

              Sure, I’m just surprised tonal features are mentioned.

              *ʔp > *ʔf > *pf

              Whoa. Far out, man.

              Thanks for the new paper, I’ll read it…

        • j. says:

          My hunch on that would be that ablaut, as currently understood, is analyzed overly phonologically, and that many unvarying cases of *u and *i (a la *muHs ‘mouse’, *ni ‘down’) are not “zero grades” in any meaningful sense. The high frequency of *u ~ *ou ~ *eu and *i ~ *ei ~ *oi in verb roots probably does indicate that 1) there was a pre-PIE breaking of *i and *u to *Vi and *Vu, and 2) that after vowel reduction these could develop back to *i and *u, but it does not follow that this process would have been unconditional. Roots with retained *i and *u could have always existed alongside (*bʰuH- ‘to grow’ is one example where positing an original full grade seems to be recognized as unwarranted by now), and many others could have developed analogical full grade forms only after the initial phonological genesis of ablaut.

        • Blasius B. Blasebalg says:

          While zero grade can produce syllabic consonants, I don’t see why a word with a syllabic consonant is necessarily part of an ablaut series. In particular, ablaut implies either inflection or derivation. In the case of the numeral “seven”, it is hard to conceive how that word would be derived from anything else.

          Moreover, it is natural to ask about the origin of ablaut. However, is there really a theory that promises to consistently solve that riddle? Even if zero grade originates from stress absensce, wouldn’t we be talking about a much older stage of Pre-Proto-IE, with a potentially very different stress distribution?

          Finally, for *septm(t), there seems to be a good reason for stress on the second syllable: It just might be a composite. The beginning *sep- already looks temptingly like a loan from Semitic; for the second component, if it stems from an original IE root, several areas of meaning would be possible (“seven”, “number/count”, “pieces/things”, “complete”, “unit”, even “fingers”, “heads” etc.). This is, of course, just speculation, but it doesn’t seem unlikely, and it is compatible with the results.
          (As an aside, the Hurrian numeral “seven”, šinti, also roughly aligns with the IE structure.
          Perhaps both are based on the same extension by nasal+t of the Semitic root?
          Anyway, the alliteration in s- or š- for the numbers 6 and 7 in Semitic, IE, and Hurrian does suggest some dependence, which is very much plausible for the words in question.)

          • Blasius B. Blasebalg says:

            Right after posting I realized that Proto-IE *tem- “cut” would fit the option “pieces”.
            Wouldn’t *tmt´a be a participle passive (“rags, pieces”)? Drop-off of the final (stressed!) syllable *t´a isn’t a very systematic behavior, but plausible in so frequent words as numerals (I’m often amazed how modern languages omit stressed syllables to get practical short forms, start with French “manif”).
            I may just be indulging in pointless speculations, but this reconstruction has the advantage to yield also the second *t suggested by Kluge to motivate the Proto-Germanic form through the series *septmt > *sepmt > * sepm > *sebun.

            Of course, in that case, Hurrian has its very own nasal (since dependence on the IE form would be extremely unlikely).

            • David Marjanović says:

              Interesting idea.

              The development of *-tó- into a marker of passive participles is thought to be late, though. Its older use is to derive a different kind of adjective, like the famous example of “perish” > “imperishable”. The whole Anatolian branch formed passive participles with the *-nt- suffix that makes active ones in the rest of IE.

              I’ve seen an attempt to derive *septḿ̩ from the Proto-Semitic feminine nominative *sabʕatum. That fails to explain why [bʕ] would be borrowed as [p], though, as opposed to [b] (*b, *bh₃ = *ph₃) or perhaps [bʁ] (*bh₃ = *ph₃ again)…

              Back to the topic! If we can find a rule that removes the *h₃ from the word-final cluster bh₃tm, we’re saved: either *b existed but was devoiced by the following *t, or it didn’t exist as a phoneme and was interpreted as *p which was devoiced as soon as the immediately following voiced *h₃ was gone. PIE syllable-level phonotactics are not obvious to me, my native language having accumulated overlong syllables for 2000 years straight… I know what I have to reread. :-)

              • Blasius B. Blasebalg says:

                In fact, how do we know that *septḿ̩ was stressed on the second syllable?

                What about the stressing in Semitic – wouldn’t stress fall on the second syllable of *sabʕatum (i.e. the one that got cancelled)?

              • David Marjanović says:

                If we can find a rule that removes the *h₃ from the word-final cluster bh₃tm, we’re saved

                I can’t find any. The syllabic *m ought to be treated like a vowel, and fricative + *t is a perfectly acceptable syllable onset; there should be devoicing to *-ph₂t-, but no deletion, and we should get things like Sanskrit **sapʰitá, which we don’t.

                In fact, how do we know that *septḿ̩ was stressed on the second syllable?

                Odd as it is, it’s agreed upon by Sanskrit (saptá), Greek (heptá) and Germanic (Verner’s law struck the *p… though don’t ask me why the *t disappeared altogether). Does anyone understand what Balto-Slavic accentuation has to say about this?

                I have no idea about Semitic stress, except that it’s not phonemic.

                • sansdomino says:

                  there should be devoicing to *-ph₂t-,

                  Is *h₃ > *h₂ through devoicing actually attested, or are you just positing this due to assuming sound values as /x/ and /ɣ/ or the like?

                • David Marjanović says:

                  I don’t think it’s attested, but I also think nobody has actually looked for it.

                  For what that’s worth, I’m assuming [χ] and [ʁ], but the assumption that *h₃ was voiced – indeed the only phonemically voiced obstruent in the system, which is a bit strange – is mainstream, accounting for zero-grade *ph₃- surfacing as b among a bunch of rarely mentioned other examples. Regressive voice assimilation seems safe to reconstruct, except at the beginnings of words, where *s apparently devoiced a following aspirate.

                • Blasius B. Blasebalg says:

                  There is sufficient reason to keep the *sabʕatum hypothesis:

                  a) I don’t think there is enough evidence for concrete values of the laryngeals.
                  In particular, it isn’t clear at all whether there was any Proto-IE sould equivalent to Semitic ʕ.
                  b) We also can’t be too sure about the value of Proto-Semitic ʕ either.
                  Most importantly, Proto-IE may only have had contact with a fringe of the Semitic world.
                  I am not sure about the timing of Semitic split-up relative to IE, and we certainly
                  shouldn’t rely on such estimates too much, but it is quite possible that the
                  (Proto-)Semitic dialect with Proto-IE contact didn’t have ʕ any more.
                  Semitic languages tend to this omission independently (e.g. East-Semitic, Punic),
                  and dialectal differences are likely even at a Proto-Semitic stage.
                  So the Proto-IE word could either come from Proto-East-Semitic, or from a ʕ-less
                  dialect without continuation until today.

                  c) Even if it doesn’t (b) and if Proto-IE had the exact sound (a), it is not necessarily
                  retained in such a loanword. For instance, that German knows
                  medial ʔ as in Be-amter. So it would be possible to use that in Arabic loans
                  for ʔ and ʕ. You may, however, notice, that Qurʕan is not pronounced Kor-An in
                  German.

                  Of course, this example has two shortcomings: The laryngeal is not identical, and the
                  loan went through (at least) one intermediate language. But both effects might as well
                  have taken place here between Proto-IE and Proto-Semitic.

                  d) You spend a lot of thought on the lost laryngeal. But why don’t you cry one tear for the
                  two lost vowels? Vowels should be more consipcious, in particular if their omission
                  reduces the number of syllables.

                  Now it is plausible (considering modern Arabic, say) that Proto-Semitic short vowels
                  were appreciably shorter than Proto-IE short vowels.
                  But this situation corresponds to item b), no identical sound available.
                  Why shouldn’t that apply for ʕ as well?

                  That doesn’t imply that the hypothesis is established, but no reason to nitpick on the ʕ.

                • “the assumption that *h₃ was voiced – indeed the only phonemically voiced obstruent in the system, which is a bit strange – is mainstream, accounting for zero-grade *ph₃- surfacing as b among a bunch of rarely mentioned other examples”
                  The statement about “the only phonemically voiced obstruent in the system” makes sense only if one accepts some version of glottalic theory, where *b is a voiceless ejective stop (or some other theory claiming that there was no phonemically voiced stops in PIE). But then one cannot use the supposed shift of *ph₃ to *b as an argument in favour of [ʁ] as a correct phonological value of *h₃.

                • David Marjanović says:

                  Oh, sorry! Thanks for catching this – I only meant “the only phonemically voiced fricative in the system”.

                  (Or perhaps I got confused by the fact that there are a few scattered hints that *h₁ may have been a glottal stop rather than a fricative at some point.)

  9. Blasius B. Blasebalg says:

    Just curious: What would be Sanskrit reflex of **seph_1t´m ?
    Would -ph- also surface?

    While the sound values of Proto-Semitic *b and *ʕ very likely coincide with IPA /b/ and /ʕ/,
    we should remember that Proto-IE stops are not so clear.
    For instance, if Proto-IE *b was actually an ejective /p’/, then borrowing it as *p makes sense (of course, the problem with the laryngeal in the middle remains, and I would hate to resort to glottalic theory only to make this one example work).

    Again, I think there are many possible reasons for ʕ not surfacing after the loan at all.
    In fact, the explanation from *sabʕatum works almost creepily well, given how old the loan must be. Its weakness, in my eyes, lies in the disappearance of the second vowel as well as the final stress (where we set off for this discussion).
    Perhaps we need to look for other derivations of “seven” in Proto-Semitic than just the plain female form.

  10. David Marjanović says:

    (Taking this out of nesting, it’s getting pretty narrow in there.)

    I’m still reading Andrew Byrd’s thesis on PIE syllabification. It turns out that laryngeals may not trigger voice assimilation: the “daughter” word clearly had *gh₂, not *kh₂. (This is strange, because it’s pretty obvious that *s did trigger regressive devoicing.) So, the only regular way to turn the expected *b into *p would be to get it right next to the *t, where indeed we find it somehow.

    Now I wonder if the *pt cluster is the result of analogy to the *kʲt in “eight”. That could even explain why the second vowel isn’t there.

    it isn’t clear at all whether there was any Proto-IE sould equivalent to Semitic ʕ.

    You’re right! Substitution of foreign /ʕ/ by /ʁ/ is regular in Tatar, but that’s very much the exception. Of course many people (of those who have dared to make a pronouncement on the matter) think that *h₃ was in fact [ʕ] (or a rounded version thereof), but I think that’s unlikely. The obvious alternative is that the (pre-)PIE speakers didn’t hear the Semitic at all; pharyngeal and epiglottal approximants aren’t easy to notice if you’re not used to them.

    Qur[ʔ]an is not pronounced Kor-An in German

    Two reasons: 1) it wasn’t borrowed by ear, it has a spelling pronunciation; 2) [ʔ] is not a phoneme in any kind of German, it’s inserted by a rule that doesn’t apply here.

    (In the south, [ʔ] is inserted in front of utterances that would otherwise begin with a vowel. In the north and center, it’s additionally inserted in front of stressed syllables that would otherwise begin with a vowel. The obvious way to syllabify something like /koran/ is /ko.ran/, so the stressed syllable already has an onset and can’t get a [ʔ] anymore.)

    You spend a lot of thought on the lost laryngeal. But why don’t you cry one tear for the two lost vowels?

    Indeed!

    The first could have been lost by analogy as mentioned above; or it could have been put into zero-grade, at least if it was interpreted as *e, which is possible especially if the Semitic was epiglottal ([ʢ], as in e.g. most kinds of Arabic today, dragging vowels toward [æ]; the actual pharyngeal [ʕ] drags vowels toward [ɑ] instead) – but, as far as I can see, that completely fails to explain why it’s *sept- and not **spet-. Hm.

    The second… maybe the word was borrowed before PIE developed syllabic consonants. But even so, I don’t know why the end wasn’t borrowed as **-om. Our esteemed host has promised an investigation into the pronunciation of *o, so maybe that’ll explain it, or not. I would expect *-tum to be promptly resyllabified as *twm̩, so we’d have to explain how the *w disappeared… but maybe I’m wrong about that: there are those Sanskrit infinitives that end in -tum, not in **-tva. I’ll keep reading Byrd’s thesis.

    What would be Sanskrit reflex of **seph_1t´m ?
    Would -ph- also surface?

    Judging from duhitá- “daughter”, the outcome should be **sapitá or **sapʰitá, depending on whether *h₁ caused aspiration in Indo-Iranian (Kümmel’s recent research says yes).

    if Proto-IE *b was actually an ejective /p’/, then borrowing it as *p makes sense

    Rather, borrowing as *bʰ would make sense; all the different kinds of glottalists seem to agree that that would be the plain voiced [b]. However, *bʰt would become pt anyway, so we “only” need to get rid of that vowel in the middle…

    Disclaimer: the whole *sabʕatum business is hearsay on my part. I have not consulted primary or secondary literature on comparative Semitic, and I have no idea if that’s how the form is currently reconstructed or what the rest of the paradigm could have been – perhaps the feminine nominative is just the wrong place to look.

    • Blasius B. Blasebalg says:

      Now I wonder if the *pt cluster is the result of analogy to the *kʲt in “eight”. That could even explain why the second vowel isn’t there.

      Analogy between basic numbers should usually be a strong motivation.
      However, *septm_ and *o´kto are still quite different, aren’t they?

      Again, I wouldn’t rule out that the closest match for a short vowel in Proto-Semitic might have been no vowel in Proto-IE (if modern Arabic and modern Tigrinya is any indication).

      • j. says:

        Also, does the Semitic numeral have any native etymology? Egyptian sfḥw looks similar enough to be probably related, but not similar enough to be obviously cognate (e.g. last I heard, the expected old pharyngeal correspondences are ḥ ~ ḥ, ʕ ~ ʕ). It is not obvious that we are dealing with loaning into IE specifically from Semitic at all. Some third source could be involved, already since PIE itself was likely not in direct contact with ((pre-)Proto-)Semitic, but especially if the numeral is not inherited in Semitic.

      • David Marjanović says:

        Good points all.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

%d bloggers like this: