“All swans are underlyingly white”

An allegory that I started writing for something else, but which upon reflection should probably stand on its own.

Once upon a time, in a world closely alike our own, a biologist postulates a generalization: “All swans are white”. The hypothesis performs admirably for a time and appears to predict quite strongly the coloration of swans.

Small updates to the emerging theory are gradually accepted: for example, the impact of diseases, oil spills or Diogenean paint-set-wielding jokers on the coloration of swans is admitted via an adjustment “All swans are naturally white”. Likewise, objections drawn from the study of the beaks, flesh, skin, etc. of swans are admitted via a further adjustment “All swans’ mature plumage is naturally white”. Regardless, the original formulation remains in circulation, even if by now generally understood to be shorthand for the more nuanced version; and all remains well in the field of Generalizative Biology.

One day, however, a serious disaster appears to strike, as reports of black swans arrive from the far-away land of Australia. Some initial ways of explaining away this pesky conflicting evidence are explored: perhaps the birds in question have a peculiar habit of taking dust baths in coal beds; perhaps the birds possess not true plumage but a neotenous extension of the downy (and non-white) body covering of young swans; or perhaps their blackness indeed disqualifies them from being considered “swans”. However, detailed study of their behavior, genetics, etc. eventually proves these approaches untenable. The birds in question are in all appearences just another species of swan — except for its atypical plumage. Generalizative Biologists remain irked by this blemish upon their valued theoretical stance that all swans are white; thus far one of their most strongly established results in the field of animal coloration. It is proposed by some naïve outsiders that the theory has simply been falsified, but such busybodies universally fail to propose any new, better theory in its stead. How else would one explain the repeated and well-replicated observation that swans everywhere else appear white? A weakening to a mere “some swans are white” would have no predictive power, and it would clearly be a great failure of parsimony to posit that dozens of swan species are individually white, but with nothing common between these individual facts.

In the minor field of historical biology, it is pointed out that black and white swans alike likely descend from a white ancestor. This does not generate much disagreement as such (by definition the proto-swan has been a swan, and hence is clearly predicted to have been white), but at the same time is agreed to not be an answer to the problem of black swans. Swans must be describable as synchronic biological systems! Their identity as swans cannot hinge on evolutionary theory. After all, has not the concept of swans — even including the problem of black swans — been already defined long before anyone had heard of evolutionary biology?

Finally a new promising theory is found. Informed by close study of the developmental history of swans, it appears that black swans’ plumage only gains its coloration by a pigment. It is shown in careful experiments that, without this pigment, the plumage would rather turn out white! This immediately gives a new, seemingly paradoxal result. Even black swans are white after all — that is to say, white in (what comes to be called) their underlying anatomy, and only seemingly black due to (what comes to be called) their individual bodily realization. All biology, of course, has for long recognized that members of a species often differ in their individual bodies: taller, shorter, missing digits or having additional ones, indeed sometimes albino; and that this should not be seen as invalidating their identity as members of a species with a certain typical height, number of digits, or coloration. But the recognition that such individual processes can be highly widespread across a species proves revolutionary.

The theory of underlying anatomy quickly finds applications to several seemingly unrelated problems. The hooded crow, for example, turns out to be not a true counterexample of the old but contentious theory that “all crows are black”: it can be treated as a completely typical, underlyingly black crow (instead of e.g. the popular earlier theory to identify it as being actually a remarkably large and crow-like magpie). Likewise, the old suggestion that “all birds have wings” proves to be underlyingly true and only superficially contradicted by species such as the kiwi or, by some views, the ostrich. Even the older hedge about swans being only naturally white proves to be but a trivial special case of the new theory: e.g. any swan spraypainted green remains, of course, still underlyingly white. All this is widely seen as strong evidence for the validity of the concept; and thus, the problem of black swans has, in the end, only made Generalizative Biology stronger.

Some still occasionally express confusion about the nature of underlying anatomy, mostly people without proper training in theoretical anatomy (lamentably including even several experimental anatomists). It is admitted, of course, that experimental correlates of underlying anatomy remain difficult to identify. However, even if one for some reason sees fit to completely ignore e.g. the original pigmentation studies among the black swans, what of other case studies such as the embryological evidence for humans as an underlyingly tailed species? the fossil and written evidence for lions as an underlyingly European species? or any other number of such demonstrations? No, all accumulated evidence must surely be taken in favor of underlying anatomy as the main cause behind organisms’ observable biology. And the theory clearly advances by the day — why, just last year it was argued that even horses, too, share the important mammalian universal of five underlying digits.

Tagged with: , ,
Posted in Methodology

Sami ruoŧŧa ‘Swedish’, ruošˈša ‘Russian’

The ethnonym and state name Russia(n) traces its origin back to older Rus’ (Русь). As the current standard etymology goes, this is thought to then derive, via the Varangian ruling class of pre-Slavic Russia, from Finnic *roocci ‘Sweden’, in derivatives ‘Swedish’; which is itself considered a loan ultimately from Germanic *rōþaz ‘rower’, in most versions via the name of the area of Roslagen on the eastern coast of Sweden, or perhaps various compound names in *rōþs- for its inhabitants. There are various details in this chain of hypotheses that are not exactly straightforward, and currently a lively session is ongoing at academia.edu, around a recent discussion paper by Viacheslav Kuleshov. [1]

Most discussion has focused only on the Scandinavian – Finnic – Slavic main chain. There are old offshoots also in other Uralic languages though, and on closer consideration I find interesting the existence of two separate groups of reflexes in Sami.

In this blog post’s title I’ve given the standard Northern Sami forms. The first of them is etymologically no trouble at all: it is simply a transparent loan from Old Finnish †Ruodzi /ruoθθi/ that probably does not need to be projected deeper back in Sami than the 17th century (around the time when the Swedish state itself, i.e. not just Swedish-affiliated Finnish peasants, begins to have a stable presence in the Northern Sami areas). Inari Sami ruátálâš ‘Swede’, Ruotâ ‘Sweden’ and Skolt Sami Ruõtˈt ~ Ruõcˈc ‘Sweden’ are transparently newer loans still. The first two come from spoken northern Finnish ruottalaine(n) ‘Swede’, Ruotti ‘Sweden’, [2] the last from standard Finnish Ruotsi. Lagercranz in Lappischer Wortschatz documents newer-looking loan variants from Northern Sami dialects too, e.g. (transposed to modern orthography) ruoha from Talma in Kiruna, ruohta from Gratangen and Nesseby, ruoha ~ ruohta from Parkalompolo in Pajala. I would take this variation similarly as evidence that there was no name of Sweden known even in Common Northern Sami (which very well might be older than a unified Kingdom of Sweden at all, putative Proto-NS seems to be closer to Proto-Samic than to modern NS dialects). Northern, Inari and (since WW2) Skolt Sami are of course also the three most Finnish-influenced Sami varieties. From Lule Sami on south I would presume ‘Sweden’, ‘Swede / Swedish’ to be instead direct loans from Swedish (Sverige, svensk). I have not checked primary lexicographic sources beyond drawing a blank for cognates of Ruoŧŧi in the Álgu database, but a quick look at Wikipedia incubators seems to confirm this, attesting Lule Sami gen.sg. Svieriga (nom.sg. Svierihka?), Pite Sami Sverji, Southern Sami Sveerje and Svïenske.

Explaining ruošˈša from Finnic is however harder. Already the sound correspondence *šš ~ *cc looks mysterious to me: all Sami languages would have the voiceless affricates c, č available (both plain and geminate, add preaspiration to taste), and substituting later Finnish *θθ as a palato-alveolar sibilant wouldn’t really make sense either. I have not found any real explanation or reference to one in the handbooks of Korhonen or Sammallahti. There seems to be at least one parallel though: Fi. viitsiä ~ NS višˈšat ‘to bother’. I could imagine this reflecting an intermediate early Finnish stage with a lamino-dental sibilant *s̪s̪, which is phonetically expected and perhaps directly ancestral to some of the small Finnish dialect areas with *cc > ss. There would be a better match if we assumed that the palatalization of Karelian čč was earlier found across Finnish as well (perhaps even as a retention: PF *cc after all comes from palatalized *ćć), and that assibilation took place already before any fronting, so that “very old Finnish” had a reflex *śś that could be adopted intact in Sami. But for this there is zero evidence across the Finnish dialect reflexes.

Kuleshov however has, in an earlier paper on the same topic, a promising suggestion that is new to me: borrowing not from Finnic but Slavic. This for starters fits the meaning better. Nowhere in Finnic does *Roocci mean ‘Russian’, and even the possible loan etymology into Slavic would seem to involve the semantic shift ‘Swedish’ > ‘Russian’ coming about within Slavic speakers. As an intermediate stage I would assume the word referring to Slavs affiliated with or ruled by still-Scandinavian Rus’. There is also common Permic *Roć ‘Russian’, usually analyzed as a loan from Finnic. I guess Permians should be predicted to follow roughly the same semantic trajectory as in Slavic however, once Scandinavian Rus’es cease to exist and are replaced by or assimilated into Slavs.

(Actually this also gets me wondering about Norwegians running trade connections to Old Perm via a northwestern oceanic route. They must have been known by some Permic name, and I wonder what… This is not required to have been the same as that for the inland Rus’, of course.)

Phonologically, NS uo from Slavic u does not immediately look good. Kuleshov’s suggestion is loaning already very early, before the Middle Slavic raising †ō > u. In this particular word this stage actually happens to be clearly attested even, in a Byzantine Greek translitteration Ῥῶς. On some thinking I have developed a different idea though. Any contacts with (pre-)Russians must have started at the eastern end of the Samic dialect continuum. And if we look at the other Sami varieties’ cognates of ruošˈša, we find not only identical Inari ruošˈša and roughly the same Skolt ruõšˈš, but also Kildin Sami rūšš. The correspondence “mainline” uo-a ~ Kildin ū is regular, probably regular enough that ū could be etymologically nativized back to uo, if the loan was transmitted thru Kildin or a similar Kola Sami variety (just as Rūʰt̀s = Rūʰcc for ‘Sweden’, mentioned by Kuleshov, must be etymologically nativized, either straight from Fi. Ruotsi or more likely from Skolt Ruõcˈc). This gives more flexibility in absolute chronology, which would be handy: the Middle Slavic era is usually dated somewhere in the mid 1st millennium, while the Russian Pomors arrive on the coasts of Kola peninsula a fair bit later, in the first centuries of the 2nd millennium. I do not think occasional reports of people further southeast by more trade-minded Norse or Karelians would be sufficient to establish a Sami name for the people who would eventually become Russians.

A different issue appears in the consonantism, but this too seems to work out by the assumption of borrowing initially into Kildin Sami. Samic palatoalveolar š from Slavic palatalized s’ is as expected; but why overlong *šš? In discussion Kuleshov has pointed out as a parallel the substitution of Russian medial с, ш as mostly geminate ss, šš even in recent loans into Finnic, a correspondence that is upon checking known to be fairly systematic but which I had not realized before. But so indeed: jorssi ‘ruff’, kassa ‘hair’, kassara ‘billhook’, kasseli ‘backpack’, kiisseli… Thinking a bit more, while this substitution looks unnecessary, it would be not so in varieties like Ludian or southern Karelian, where medial *-s- has been voiced to -z- or -ž- and the only native voiceless sibilants are therefore -ss-, -šš-. Browsing SSA, there are even cases with a geminate in the eastern languages but a singleton attested in Finnish, e.g. EFi. †kaasa (19th c.) ~ Krl. koašša etc. ‘porridge’; EFi. kosinkka ~ Krl. kossinkka ‘scarf’; EFi. ko(s)suli ~ Ingrian kossula ‘type of plough’; EFi. ku(s)sakka etc. ~ Krl. kuššakka ‘woven belt’. (However this pattern is weakened by forms kosinkka, kušakka even in southern Karelian, with singleton voiceless sibilants apparently re-established in loans.) Makes of course also a nice parallel to the long-running sound substitution strategy that Indo-European -p-, -t-, -k- are borrowed as Finnic -pp-, -tt-, -kk-, but -b-, -d-, -g- as -p-, -t-, -k-. This is by now regardless a strategy, a kind of a cousin of etymological nativization, and not a mechanical phonetic substitution. [3] I think this is also what allows the gemination pattern to turn up even in Russian loans into eastern Finnish, despite the availability of unvoiced -s-. Further in western and standard Finnish such loans have been of course mainly mediated by the eastern dialects.

Now what has this to do with a loanword into Sami? Directly nothing, I think: the Sami languages are not bound to Finnic and should be free to develop their own patterns of loanword adaptation. But in eastern Sami, from Skolt on, we also find medial voicing of sibilants, in the weak grade that is (the strong grade is a regular / “short” geminate, as everywhere in consonant-gradating Samic). This would have created the opportunity to innovate the same loanword nativization strategy: Russian -з- gets taken over as the paradigmatically voiced -ss- : -z-, versus Russian -с- as the consistently voiceless -sˈs- : -ss-. I have no data easily on hand on if this actually happens in Russian loanwords into Kildin Sami, but a draft paper by my colleague Markus Juutinen, on Russian loanwords into Skolt Sami, comes helpful. Some geminates from Russian -с- indeed turn up there: bie´sˈs ‘devil’ ← бес, [ki̮s̄sa̮] ‘bag made of sealskin’ ← киса, pleäsˈsjed ‘to dance’ ← плясать. All three must be fairly recent (note retained b-, pl- and lack of *ki > ǩi), but this is regardless evidence that the same adaptation-by-gemination strategy has been innovated in eastern Sami. Within a younger chronology, where rūšš is first loaned from Pomors in maybe the 12th century and perhaps reinforced as geminate during ongoing contacts, it does not seem outlandish to assume that medial voicing and also vowel raising to ū (IMO a part of an already Proto-Kola Sami chainshift) could have been in place already.

I do not know what to think of the absense of a known Ter Sami cognate of rūšš (we would predict rī̮šš = /rɨːɕː/). Is this merely a documentation gap? The main word for ‘Russia(n)’ in Ter Sami is however instead Tārra, cognate to the words for ‘Norwegian, Scandinavian’ elsewhere in Samic (NS dárru, etc.). While this is a neat parallel to the presumed Rus’ > Russian shift, it does raise questions. Were Pomors first interested in areas further north and west before showing much attention to the Ter Sami? Was there earlier an inland / coastal split within Kola Sami instead of the current west / east one?

A further interesting aspect of what this etymology adds up to is, I think, that although ruošˈša and ruoŧŧa still are likely doublets from a common ultimate source, their identical vocalism would be kind of accidental: the latter gets its uo straight from Finnish < Proto-Finnic *oo, the former develops thru Middle Slavic ō > (Old) Russian u → Kildin ū, which is only incidentally also < Common Samic uo. Under my current hypothesis, probably not even the shifts *ō > *ū in Slavic and Kola Sami can be considered connected: they merely reflect the universal tendency towards long vowel raising (besides, one is conditional, the other unconditional). Thus can typology of sound change conspire to create similar sound patterns via different routes. A kind of a second dimension to my earlier typology as parallel loanwords being somewhere between “diverging” (adopted in forms more different than they should be natively) or “converging” (adopted in forms more similar than they should be natively): they can also show correspondences “regular due to internal development” versus “regular due to unrelated developments”.

[1] Itself in response at a recent proposal that tries to sketch a novel Balto-Slavic origin for *Roocci; which I find so far still worse phonologically, semantically and sociolinguistically, so enough about that for this blogpost.
[2] Note also the Finnish second-syllable alternation between -a- in the ethnonym and -i- in the toponym, copied also in Inari Sami -á- ~ -â- (the latter with usual etymological nativization of stem vowels in F/S loans). This phenomenon remains without a clear known origin but is paralleled by Suomi ‘Finland’ : suomalainen ‘Finn’ and Lappi ‘Lapland’ : lappalainen ‘Lapp’ (≈ Sami). Presumably some two of these are analogous to the third, but I have no solid idea which way around. Slightly different is Häme ‘Tavastia’ : hämäläinen ‘Tavastian’ where the first seems to be a simple derivative in -e (within just Finnish we cannot tell very well if from *-eh or *-ek) from an earlier *Hämä.
[3] It has likely been originally phonetic though, way back when Proto-Norse and Proto-Germanic *-p-, *-t-, *-k- were still preaspirated or preglottalized. At least one example could show a similar development in an old loanword from Baltic: PF *rattas ‘wheel’ ← *ratHas < PIE *HrótHos.

Tagged with: , , , ,
Posted in Commentary, Etymology

Phonology squib: *ë in Kamassian

Another word of previously notably unknown etymology recently has a new lead for it: Finnic *sana ‘word’, suggested by one Otso A. Bjartalíð (in a draft that was briefly posted on Academia.edu but seems to be currently down) to have cognates in Kamassic: Kamassian tenü, Koibal tano ‘word’. Looks possible enough, though I have some lingering douts about a comparison with just one branch of Samoyedic.

Otso holds that the Kamassian form /tʰenü/ (thênü (C), tʰenʉ (D)) would point to PSmy *tänü, slighly off from the back-vocalic forms in Koibal and Finnic, which he takes to suggest Proto-Uralic *sana. Actually, this does not seem to be necessarily a problem: the correspondence Kamassian e ~ Koibal a can also continue PSmy *ë, especially before nasals. Compare:

  • PSmy *cën (? *tën) > Km. then (C), tʰeǹ (D) ~ Kb. танъ (Sp.) ‘sinew’
  • PSmy *jëpsə > Km. ťepsü (C), ťepsʉ (D) ~ Kb. джяпсы (Sp.) ‘cradle’
  • PSmy *këm > Km. khem (C), kʰə̑m̀, kʰɛm̀ (D) ~ Kb. камъ (Sp.) ‘blood’
  • PSmy *pën- > Km. phelľim (C), pʰel̆ľem (D) ~ Kb. паллямъ (Sp.) ‘I put’ (the presumable stem pʰen- does not seem to be attested)
  • PSmy *wën > Km. men (C), mɛǹ (D) ~ Kb. мянъ (Sp.) ‘dog’

(Usual primary source abbreviations: C = Castrén, D = Donner, Sp. = Spasski.)

As PSmy *ë regularly corresponds with Finnic *a, both coming from PU *ë, a better first-pass PU reconstruction for ‘word’ would therefore seem to be *sëna, [1] especially given that PU *a most typically gives PSmy *å > Kamassic o. (A number of cases of PSmy *å > a are still known though; even a number of > u.)

The stem vocalism and overall morphology could call for more thinking too, since PU *sëna would be actually expected to give PSmy **tïnå (regardless of if one thinks *ë > *ï is conditioned by syllable closure or by an *a-stem). Maybe the root was instead a verb *sën(ə)- ‘to say’? Despite the semantic identity, both reflexes could then turn out to be parallel deverbal nouns: in Finnic *sën-ma > *sëna (with *nm > *n as also in the 1PS oblique possessive suffix -ni), in Kamassic *sënə- > *tën- → *tën-u (or *tën-o).

We have already in the data above still some variation within Kamassic though. For one, Spasski’s ‹я› could indicate fronting in ‘cradle’, perhaps *ďapsə > *[dʑæpsə]. The same appears in PSmy *jalä > Kb. джяла ‘light, day, sun’ and PSmy *japsə- > Kb. джяпсьы- ‘to roast’, which retain back /a/ in Km. ťala (C), dʑåɫ̀å (D) and ťapse- (C). Donner’s variant with ə̑ could be just unclear articulation, as he has many variant forms with this.

PSmy *ë also does not always result in these reflexes. Firstly there are also cases of retained /ë/ (= FUT ), though this is rarer and is only found in a few of Donner’s records. Castrén’s materials as published do not even recognize this phone, [2] and for Kamassian we seem to find in them ä for the short version, ö̂ for the long version.

  • PSmy *ëlə- > Km. è̮lɛgɛn (D) ‘wet snow’
  • PSmy *ëptə > Km. äbde (C), e̮ʔʙᴅᴉ, eʔʙtə, ɛʔʙte (D) ~ Kb. абде (Sp.) ‘hair’
  • PSmy *kë ‘year’ > Km. khä (C), kʰɛ̮, kʰe̮ (D) ~ Kb. ка (Sp.), kôa (Klaproth) ‘winter’
  • PSmy *ëməl- ‘to forget’ > Km. nö̂mel- (C), nē̮məl- (D) ~ Kb. нумил- (Sp.) ‘to forget’ (with unclear initial /n-/ and Koibal /u/).
  • perhaps also PSmy *këpə > Km. khö̂b (C), –ʉʔʙ (D) ~ Kb. копъ (Sp.) ‘wasp’.

This representation could suggest that Kamassian short /ë/ was phonetically mostly an open-mid [ʌ] or [ɜ]. Spasski’s ‹a› here and above could also actually stand for such a sound; but this is unknowable in the absense of phonetically exact records of Koibal.

On the other hand, I do not suppose Kamassian e to also stand for a mistranscribed back /ë/, as frontness is corroborated by the 2nd-syllable vocalism in thênü, ťepsü and phelľim. Kamassian seems to have had stronger vowel harmony than Proto-Samoyedic or even Proto-Kamassic had, reasserting harmony in words like *jalä > ťala ‘light, day, sun’ as mentioned above, or also e.g. *kålä > Km. kola (C), kʰōɫă (D) ~ Kb. кола (Sp.) ‘fish’; *səjmä > Km. sima (C D) ~ Kb. сима (Sp.) ‘eye’. We see this reharmonization also in some words that show PSmy *ï > Km. /i/, e.g. (*pujå >) *pïjå > Km. phîjä (C), pʰʉj̀e (D) ‘nose’; *sïrå > Km. sirä (C), se (D) ‘snow’ (back vocalism retained in Koibal?: сыра (Sp.)). This development is likely to be due to Turkic influence.

Annoyingly however, I cannot assign any particular conditions to this apparent *ë > /e/. As mentioned, to some extent it looks like the typical development before nasals (tʰenü, tʰen, kʰem, men; also lem (D) ‘bird cherry’ << PU *ďëmə, though l- is unexpected), but this is contradicted by ‘to forget’. Other examples still occur too. In three cases we find a front vowel also in Koibal:

  • PSmy *ërö > Km. ere (C), ɛre (D) ~ Kb. ирэ (Sp.) ‘autumn’
  • PSmy *lë > Km. le (C D) ~ Kb. ле (Sp.) ‘bone’
  • PSmy *ńëj > Km. neä, njä (C), ńȧ` (D) ~ Kb. не ‘arrow’

while some show Castrén’s ä corresponding to Donner’s e:

  • PSmy *ëtå- > Km. ädeʔb- (C), edəʔ- (D) ‘to wait’
  • PSmy *këcɜ- > Km. khä̂demgä (C), kʰēdəmgə (D) ~ Kb. кадума (Sp.) ‘ant’
  • PSmy *këptu ‘currant’ > Km. khäʔbde (C). kʰɛʔʙᴅɛ (D) ‘berry’

Given furthermore vacillation between , e, ɛ in Donner’s records of ‘hair’ (and with a 2nd syllable front vowel!), perhaps we should rather assume *ë > /e/ in general in Kamassian — also Castrén’s ä might be generally accurate in this case — with some secondary idiolectal backness variation. Turkic loanwords also show occasional for front ö in the original, e.g. ȫɫʉm, ē̮ɫə̑m ‘death’, qē̮bərʉʔ ‘bridge’ (from ölüm, köbrük). [3] The two examples of Koibal e might represent independent fronting (in CV stems? though not in ‘winter’).

Some sporadic examples also have still different reflexes of PSmy *ë in Kamassic: /a/ in e.g. *ńërkå > Km. narga (C), nå̆rga, nə̑rʁa (D) ‘willow’; /o/ in e.g. *më- > Km. mo- (D) ‘to take’ (potentially regular after /m/); maybe /i/ in *sër > Km. siri (C), sɪrɛ (D) ~ Kb. сыры (Sp.) ‘white’ (probably influenced by, if not outright rather derived from *sïrå ‘ice’)…

More strange however are cases where we find correspondences similar to the above, but resulting from a few other PSmy vowels. For starters, the following also show Donner’s ~ Castrén’s ö̂:

  • PSmy *ńamɜ- > Km. nö̂mür (C), nē̮mə̑r, nȫmə̑r (D) ~ Kb. нёморъ (Sp.) ‘soft’
  • PSmy *kät- > Km. šö̂- (C), še̮ʔ-, šə̑ʔ- (D) ~ Kb. сод- (Sp.) ‘to sew’ (clearly with an original front vowel given *k- > š-)

The correspondence Kamassian e ~ Koibal a likewise sometimes seems to result from other PSmy illabial vowels. There is (at least) one case of *ï (usually > Km. i, Kb. и, ы), two of probably *ə (usually > Km. ə̑, a ~ Kb. а, о) and four of *ä (usually > Km. e, Kb. е, э):

  • PSmy *ïtå- > Km. ä̂de- (C), ɛdə- (D) ~ Kb. ад- ‘to hang up’
  • PSmy *ət³kä > Km. eši (C), ɛɕĭ, eɕĭ (D) ~ Kb. асе (Sp.) ‘child’
  • PSmy #kəå > Km. kee (Adelung), ket (Pallas) ~ Kb. ка (Sp.) ‘moose’
  • PSmy *kärɜ- > Km. šêr- (C), šērə- (D) ~ Kb. сар- (Sp.) ‘to be ashamed’ (contrast Km. šêr-, šēr- ~ Kb. сер- ‘to dress up’)
  • PSmy *mäktə > Km. bäkte (C), mɛkté (D) ~ Kb. бакты (Sp.) ‘tussock’
  • PSmy *wäto > Km. bêdü (C), bedʉ (D) ~ Kb. бадё (Sp.) ‘intestine’
  • PSmy *tättə > Km. thêʔde (C), tʰēʔd̀ə (D) ~ Kb. таде (Sp.), tätde (Pallas) ‘4’

This correspondence also appears in negators: Km. ɛm (D) ‘I don’t’, ei (D) ‘not, doesn’t’ ~ Kb. абы (Sp.) ‘not’, though this is probably morphological rather than phonological: the rest of Samoyedic also reflects a variety of vowels in different negators, e.g. Selkup aša ‘not’, 2PS imp. ïkə ‘don’t!’ (indeed also elsewhere in Uralic, e.g. Finnish en ‘I don’t’ but älä ‘don’t!’ [4])

The relationship of these correspondences and *ë > e ~ a could be understood in at least three ways:

  1. there was a common Kamassic change such as *ë > *e, merging with primary *e from at least *ä, and later backing to ‹а› in Koibal;
  2. there was a change such as (*ä >) *e > *ë with later fronting to /e/ in Kamassian;
  3. backing and/or lowering of vowels in Koibal and fronting of *ë in Kamassian are independent processes that coincidentally result in similar-looking correspondences.

My considerations above on vowel harmony would maybe suggest option 1; cf. also асе, бадё = ? /ase/, /badö/ where disharmony might be secondary from *eše, *bedö; however, the two cases of secondary /ë/ suggest that option 2 can have happened too. Option 3 would have the most flexibility of these, allowing for *ë > e ~ а, *ä > e ~ а etc. to each occur under different conditions. Indeed, while the former looks like it might be conditioned by a preceding nasal, the latter shows no such evidence, and counterexamples can be found: e.g. *ken > Kb. сэнъ ‘knife sheath’, *tänä- > Kb. тен- ‘to think’.

So far I have no really firm conclusions on this wider topic, but at least it is quite evident just how much about the historical vocalism of Kamassic is still up in the air. Substantial progress will probably require various lines of attack, such as revisiting the primary data (esp. Castrén’s original records, cf. footnote 2); seeing what processes could be corroborated by loanwords from Turkic; or double-checking with evidence from Selkup and Mator if any of the “Proto-Samoyedic” reconstructions I’ve given (most from Janhunen) might be really only valid for Northern Samoyedic.

[1] Intriguingly we still find a non-open vowel also in Estonian & Votic sõna, Livonian sõnā, and South Estonian syna (with regular pre-nasal mid vowel raising). It is not inconceivable that this is some kind of a retention, though clearly this is not really regular either.
[2] However, this is probably due to the posthumous editing by Schiefner. The preface to Castrén’s only recently fully edited Selkup materials (Manuscripta Castreniana Ostiak-Samoiedica) already has Castrén using ê and î for the nonfront unrounded vowels that we today transcribe , or /ë, ï/.
[3] Incidentally I only now realized when writing this post, cross-checking Donner’s dictionary with Joki’s treatment in Die Lehnwörter des Sajansamojedischen, that Donner distinguishes (‹e› with dot = close-mid /e/) from (‹e› with ring = labial /ø/), likewise their reduced equivalents ə̣ and ə̥. Atrocious practice in transcription, if you ask me — and this has been copied also in Janhunen’s SW, whose printing quality seems to be too bad for me to even distinguish the two most of the time.
[4] Negation in early stages of Uralic is still in need of a good reconstruction though. The negative verb maybe most likely had the stem *e- in the indicative and *ä- in the imperative (at minimum Finnic, Mari and Khanty all suggest something like this), but there are also forms pointing to a stem *a-; and e.g. Permic moreover shows a split between *o- (< *e-) as the present stem and *e- (probably < *ä-) as the preterite stem.

Tagged with: , , , , ,
Posted in Etymology, Reconstruction

Some new work on the Agricultural Substrate

Back in 2009, a very interesting paper was put out by Jaakko Häkkinen, then an early-stage PhD student: [1]Kantauralin ajoitus ja paikannus: perustelut puntarissa“. While no longer especially up to date (I will probably follow up on this claim in another post soon-ish, once one major paper in the works has come out in a future issue of Diachronica), this still remains a notable work that has turned out to be an impetus for quite a lot of discussion over the 10s and ongoing, on our basic assumptions about the early history of the Uralic languages. One of Häkkinen’s suggestions is to attribute some of the shared Finnic–Mordvinic vocabulary to a common southwestern substrate language. He outlines this on the basis of just six words that can be suspected to be of substratal origin per their semantics: three deciduous trees with a southern distribution (the word families of Finnish tammi ‘oak’, vaahtera ‘maple’, pähkinä ‘nut’ < *’hazelnut’ [2]), two species of high importance to agricultural societies (Fi. vehnä ‘wheat’, lehmä ‘cow’), and one innovative numeral (Fi. kymmen(en) ’10’), and which all also show novel phonotactic features: the word-medial consonant clusters *-mm-, *-kšt-, *-šk-, *-šn-, *-šm-, per him not attested in the Uralic comparative data reaching into the Ugric or Samoyedic languages. Häkkinen mentions also some more narrowly distributed substrate loan candidates with similar phonotactic features (e.g. with geminate nasals: Fi. konna ‘toad’, nummi ‘heath’; Northern Sami lidnu ‘eagle owl’, dápmot ‘trout’) that had been identified already in still earlier studies probing the possibly substratal vocabulary of Finnic or Samic in particular. But as far as I can tell, the idea of a common substrate vocabular layer extending also further east to Mordvinic, partly even Mari and Permic, was a new key innovation.

Increasing phonotactic complexity towards the (south)western end of Uralic is quite apparent really as soon as you pay attention to the topic. Already in one of my earliest posts on Freelance Reconstruction in 2013 I outlined the branch-level distribution of the clusters *šk, *kš, *kšk and *kšt across the Uralic comparative material. Heavy emphasis on Finnic, Mordvinic and Mari, but also not the northwestern Samic, is immediately evident. So there probably should be quite a lot of material that might be attributable to this “Agricultural Substrate” if we went looking for it in detail. 2014-ish I started collecting some additional data on this, taking particular semantic fields as my starting point. Before this reached sufficient completion though, a few other publications already ended up paying more attention to the same vocabulary stratum. I first saw Ante Aikio’s take, in a preprint version of his article “The Finnic ‘secondary e-stems’ and Proto-Uralic vocalism“. This singles out the consonant *š already by itself as a marker of vocabulary of possibly substratal origin (with 25 examples given; about 10 of them not otherwise phonotactically suspect) as well as proposes 9 other cases more on the basis of general phonological irregularity. As he had worked already earlier extensively on the Samic substrate in Northern Finnic and the pre-Uralic substrate of Samic, perhaps some of this was discovered independently though… Aikio only refers to Häkkinen’s paper passingly, not as a main inspiration.

Before Aikio’s paper officially coming out in 2016 [3], another version still was also outlined by Mikhail Zhivlov in a small conference paper “Неиндоевропейский субстрат в финно-волжских языках“, which identifies 20 items, likewise on the grounds of phonotactic novelties, the general presence of *š and some phonological irregularities; with substantial overlap with Aikio’s list. Taken together, these were already about as much I had assembled too, and I haven’t done much more on my draft since. Not much else seems to have happened on this topic in the late 10s either.

Last fall however, Carlos Quiles, an archeology/genetics/linguistics blogger at Indo-European.eu now seems to have put together a somewhat more substantial review of this and also some other data relevant for Uralic linguistic archeology, in a series of about ten blog posts starting here. This is nominally aimed more at locating the Proto-Uralic homeland — though it is easy to notice that Quiles relies mostly on secondary sources so far, and seems to miss a decent amount of relevant basic data in his chapters working more towards this goal. E.g. already the section on fishing technology is missing at least *sopśə ‘net needle’ and *tulkV ‘dragnet’; perhaps because these are traditionally identified as “Proto-Finno-Ugric” (only found up to Khanty in the east) and thus absent from earlier sources attempting to apply linguistic archeology to Proto-Uralic specifically. I also wonder about some geographic claims like Udmurt supposedly being spoken within the range of the Siberian pine. Probably today if we count migrant dialects further east and/or planted Siberian pines, but to my knowledge it’s certainly not native to Udmurtia (not even most of Komi Republic).

A full review of this whole topic would be a more involved question than I want to go into on the blog though, and anyway I am also not highly impressed by the overall precision of linguistic archeology as a method. It works just fine for ruling out places like the Circum-Baltic, the Arctic coast or the Caucasus as the Proto-Uralic homeland, but finer details like the long-standing debate on Volga-Kama versus Western Siberian homelands don’t seem like they can be easily resolved. At least two reasons conspire to make further progress difficult. One, if a language family starts off as (a part of) an only slowly expanding or even in situ diversifying dialect continuum, we might have trouble distinguishing “common Family” vocabulary from true proto-Family vocabulary. If any newly incoming vocabulary avoids hitting all the earliest isoglosses within the family, or is etymologically nativized across them, it may end up gaining a wide distribution and an appearence indistinguishable from native. Cases like the common Algonquian calque ‘firewater’ for ‘whisky’ that can be identified as much too recent on cultural grounds are just the tip of the iceberg here. Others could include cases like Proto-Finnic *lohi ~ Proto-Samic *lōsë ‘salmon’, which happen to fall into the outlines of Uralic comparative phonology just fine and would point to a common proto-form *lošə. Both are probably instead more recent loans from Baltic, either independently or in Samic thru Finnic; thus so even of they did really go back to this form in both lineages. From some language pairs like North Estonian ~ South Estonian (last common ancestor ca. 500 BCE), or indeed dialect pairs like Western Finnish ~ Eastern Finnish (LCA ca. 500 CE), with heavily parallel and mutually reinforcing trajectories of historical development up to today, we could probably find examples of this type by the thousands. (I call this phenomenon “convergent parallel loaning” and hope to one day treat it in more detail than just the one presentation in Finnish from 2016 so far. Cf. also Häkkinen’s spin on this under the name “invisible convergence“.)

I also consider it probable that our efforts on Uralic reconstruction so far on many points stops at the common Uralic stage, maybe especially in vocalism, not quite yet reaching Proto-Uralic proper. This is evident when attempting to reconstruct the proto-forms of several core vocabulary items, e.g. ‘heart’. West Uralic (Samic, Finnic, Mordvinic) suggests *ćüdäm(ə); Udmurt /śulem/ suggests *śedämV; Komi /śëlëm/ suggests *śädämV; Ugric suggests *śiďVmV or even *śijVmV; Samoyedic *säjä suggests *śäďä or *śäjä. We have no especially good way to explain most of this kind of “proto-variation” or to decide which of any of these variants might be the most original (of course at least the vowel difference between Udmurt and Komi is likely to be recent). The suggestion first made by Zhivlov that traditional PU *ś comes from an earlier *ć that was preserved in Samic, but replaced in areal vocabulary by a new *ć in Permic and the three Ugric branches, is probably right at least though. “*ś” is then basically a Common Nonwestern Uralic (maybe even just Nonsamic Uralic?) but not the proper Proto-Uralic reconstruction. (On structural grounds the same proposal has been made earlier also by at least Janhunen and Abondolo.)

Two, linguistic archeology cannot even in principle pinpoint an origin outside of a family’s current or historical range. Under the basic assumptions behind linguistic archeology, any terminology for e.g. natural realia exclusive to an “external homeland” would have to be either lost or repurposed in all descendants. This would even hold if one of the daughter lineages ended up re-entering the original territory. (Northern Sami speakers moving to Helsinki are not going to magically recover the lost but presumably once extant Proto-Samic words for things like ‘maple’ or ‘eel’.) Suppose for the sake of the argument that Uralic first expanded in a northward fan from someplace around the southern end of the Urals, near Orenburg or Magnitogorsk; southeast of the current range of Permic and Mari, well south(west) of the current range of Mansi. What kind of vocabulary evidence would we even expect this to leave, as distinct from an already originally more northern homeland?

But I believe that’s enough said for now on attempts to locate Proto-Uralic (again, watch for the upcoming issues of Diachronica for news on this). Going back to the Agricultural Substrate, Quiles identifies four semantic areas which would show prominent influence from this:

  1. tree names and related botanic terms;
  2. apiculture;
  3. agriculture;
  4. metallurgy,

In terminology related to animal husbandry and textileworking he gets together a few possible examples too, but contrasted with a more substantial number of loanwords from Indo-European.

I agree with most of these assessments as well. The one exception is apiculture, as the words actually comprising this layer (*mekšə ‘bee’, *metə ‘honey’, *śišta ‘wax’; unreconstructible #käras ‘honeycomb’ [4]) all have good Indo-European / pre-Indo-Iranian etymologies, unlike the vast majority of the others, and the cases of *š appearing in these can be well derived by RUKI. Even if *š might be often a marker of the Agricultural Substrate, this does not imply that all cases have to be so, and in particular this does not provide reason to abandon well-established loanword etymologies coming from actually attested language families. By a similar argument, I am likewise unconvinced with trying to reinterpret words like *šiŋərə ‘mouse’ (with regular reflexes in all three of Hungarian, Mansi and Khanty) as having anything to do with the Agricultural Substrate. The key motivation for setting this hypothesis up in the first place has after all been the highly limited distribution of words of certain semantic categories or with certain phonetic features. If we start including occasional etymologies that reach also Ugric or Samoyedic, we can no longer maintain the original explanation for why other words of this layer do not do the same (i.e. that the Agricultural Substrate was never in contact with these branches of Uralic). This indeed would come close to abandoning any reason for treating this layer as non-native in Uralic in the first place!

An additional issue that I seem to notice at this point is that, out of the possibly substratal cases of *š, quite few also occur in RUKI environments. The cluster *kš is particularly prominent: *makša ~ *mäkšä ‘rotten wood’, *päkšnä ‘linden’, *wakštVra ‘maple’, maybe *päkškV ‘hazelnut’ and *tekškä ‘ear of corn’ (surfacing as *šk ~ *kš vacillation). There is also a phonologically similar though clearly non-IE *š after *ŋ in *jaŋša- ‘to grind’, maybe also behind *riŋəšə ‘threshing ground’. Examples of *ks or *ŋs also do not seem to occur. I suspect that this points to the Agricultural Substrate actually coming to Uralic second-hand, and that it was instead first adopted into an extinct para-Balto-Slavic and/or para-Indo-Iranian language that, as expected per general Indo-European dialectology, regularly retracted *s to *š at least after velars; including in words that it had earlier adopted from the Agricultural Substrate proper. This hypothesis gives us also some more wiggle space in identifying the substrate in the archeological record: even archeological cultures that were probably Indo-European-speaking could be considered as the source.

Speaking of the ultimate identity of the substrate, Quiles has an interesting new suggestion on this, too: he seems to have found parallels for a number of the involved words in the West Caucasian language family, and attempts to sketch ways it could have been in contact with Uralic. This I think would be worth further exploring. Some more data to this effect might be also findable from Bernát Munkácsi’s 1901 monograph Árja és kaukázusi elemek a finn-magyar nyelvekben. While Uralic–Indo-European loanwords studies have been an extensive and productive field for long, on the topic of Uralic–Caucasian comparison of almost any flavor this remains just about the most recent even halfway serious overview. — Directionality, however, is not obvious to me. As Quiles notes, the WC ~ Uralic parallels center on technology and metalworking terminology. It seems to me they could be well explainable, besides pure accidental resemblance, also as a set of recent Wanderwörter, or parallel loanwords from a lost common source. There is thus barely any evidence yet to speak of a West Caucasian substrate language specifically.

By now I would have also more detailed comments on numerous individual etymologies proposed to belong in the Agricultural Substrate by one researcher or the other. This task will be best left for another time however, in many cases maybe also for another context entirely, and I might return to the topic only after having gotten more of these forthcoming etymological etc. observations out to print individually. Substrate languages are a fascinating topic, but they really are not highly feasible to tackle head-on: they emerge only from the dark corners of linguistic reconstructions, generally identifiable more by what is absent than by what is present.

[1] While Häkkinen continues to be active in our field and has a lot to say especially on the topic of the relative and absolute chronology of Uralic languages (recently e.g. coauthoring an article on Southern Sami with Minerva Piha in the latest Sananjalka), his PhD though unfortunately still remains unfinished.
[2] Part of the Finnish / Swedish grouping jalopuut, jalot lehtipuut / ädellövträd ‘noble (broadleaf) trees’. Other generally agreed members include the elm, ash, linden, beech and hornbeam. This might be convenient to calque into English too. Delimiting it in a context wider than just the Nordics has some difficulties though… would we only accept species whose distribution overlaps with the taiga zone at least within gardens, ruling out the likes of plane trees; and would we follow the main practical motivation of the term and rule out softwood broadleaf trees like the poplar?
[3] Nominally regardless claiming to be in the 2015 issue of Suomalais-Ugrilaisen Seuran Aikakauskirja. I wonder how often these kind of delays, between when a periodical is dated and when it actually comes out, are due to printing queues and how often due to actual editing issues.
[4] Mordvinic *käŕas, Mari *käräš, Udmurt /karas/; none of these can be native as such. The Mordvinic and Udm. words show a ⁽*⁾front vowel in the first syllable plus a ⁽*⁾back vowel in the second (PU unstressed *-ä- > Udm. /e/), and such disharmonic vowel combinations always result from either recent derivation or recent borrowing. The Proto-Mari vowel *ä then is non-native entirely. Probably mostly likewise for those cases of pre-Permic *ä that end up retracted to /a/.

Tagged with: , , , ,
Posted in Commentary, Methodology

A Century Late on Proto-Finnic sibilants

There are broadly two commonly seen ways of thinking about progress in science. The first is the “naive” Science Marches On narrative where we have ever-increasing aggregation of solid Results; the archetype is mathematics, where results indeed stay around as long as they’ve been established once, but a good part of the natural sciences today follow this as their main narrative as well (for no lack of reason, I feel). The second is the Kuhnian succession-of-paradigms narrative where most of the time scientists can go around aggregating results, but ever once in a while some basic assumption is declared to have been wrong, quite a lot of stuff ends up discarded and work is started over. Hence even the results we continue to accept still not should be thought of as unchanging truths but to be rather more temporal, provisional even. The archetypes for this seem to come from the humanities, where theories of how to understand even the main forces of history or literature or psychology still seem to be in quite a bit of flux and views are often split between battling schools.

In historical linguistics, as really in most even vaguely empirical sciences, we clearly have aspects of both around. Etymology and reconstruction generally turn up ever more results as time passes, though some individual results occasionally turn out to have been built on sand. We are lucky to have avoided drastic paradigm shifts though: there clearly do not exist any examples of things like language families that were first set up in detail and later abandoned entirely. [1]

These two attitudes have also similarities, not just differences. Above all, both are forward-looking: they hold that science is something that continues to be done and will have something new to say ten, hundred, probably a thousand years from now (no matter if built on top of or beside the things it says today). Another alternative yet exists as well though — the “golden age” narrative, according to which knowledge is not created (anymore?): it is or has been already out there, and what we can accomplish amounts to either preserving or rediscovering it. “Nothing new under the sun” & its restatements in various forms (probably this sentiment is itself ages-old too).

In fields like Uralistics, with a “long-and-thin” history, occasionally this also rings true. To quote here my colleague Niklas Metsäranta in the foreword of his recent PhD thesis (English translation mine):

“The best aspects of etymological research are doubtlessly those fleeting moments, when, while reading dictionaries, the stars align and one notices or at least thinks of having noticed a new connection between words, that no one has noticed before. Occasionally the initial buzz turns to disappointment though, when upon more careful browsing on etymological references one realizes to not have found anything new, but to only have brushed up an old dusty comparison advanced already by E.N. Setälä or Yrjö Wichmann.” [2]

Metsäranta’s work in Mari and Permic etymology has indeed a lot of preliminaries and precedessors around for it in the late 19th and early 20th century. Most progress in Uralic etymology in the second half of the 20th has not come from extending the corpus of comparisons, but rather from trimming it down, trying to find which parts of it are actually reliable and which of them might have other, better explanations, e.g. as Indo-European loanwords. This issue has been particularly obvious during the work that led to my recent paper on a sound change *i > *i̮ in Permic, which consists almost entirely of the rehabilitation of old etymological comparisons, most of them rejected later on for one reason or the other (but generally without any detailed critique). Only time will tell if this idea will lead to any all-new etymologies, too. Probably yes however, if the numerous 21st century works that again seek also entirely novel Uralic etymologies are anything to go by, I already cite also one applicable new etymology from a preprint by Aikio after all.

The early pioneers of Uralic of course did not just work on etymology. The development of general Uralic historical phonology shows also a similar broad outline: a “brainstorming” phase pre-WW1 eventually turning into a “consolidation” phase post-WW2. Here the situation seems also much more precarious in the details, really. There are several major etymological dictionaries out there by now, all household names to the historical Uralicist (FUV, SKES, KESK, DEWOS, TESz, UEW, YSS, SSA…). Most early etymologies worth consideration have been caught by at least one of them, even if not necessarily concluding in their favor. [3] By contrast studies of historical phonology have remained more data-driven / less literature-driven. Small details can be often re-derived as needed as long as their underlying etymologies remain known, sidelining credit from their first discoverers; or, also, they may end up forgotten entirely.

Getting finally to the topic my post’s title, about five years ago I sketched an observation about a distinct reflex of Proto-Finnic *c in Karelian. This is quite noteworthy in that *c is the first new phoneme to be added to Setälä’s 1890s reconstruction of Proto-Finnic that seems likely to stick, first properly consolidated as recently as by Kallio in 2007. Here we would then have evidence that this has not been retained only in the previously marginal South Estonian (its proper importance to Finnic reconstruction was not realized before at least the 70s) but also in the long-researched Karelian. There is quite a bit of noise in the Karelian data though, e.g. due to secondary affective affrication and some evident dialect mixing in the complex reflexes of *s. I wouldn’t blame earlier generations for not catching this idea.

But caught it has been. Earlier this year I noticed yet another old journal relevant to Uralic studies to be available online by now: De Monde Oriental, published in Uppsala from 1906 to 1947, still turning up regularly in bibliographies thanks to several contributions from K. B. Wiklund. The early issues are by now in the public domain and can be found at least in part in the archive.org collections. I usually follow up these kind of finds by taking a brief look over the contents of the back issues in general. Vol. 6 from 1912 turned out to contain an article from one N. Moosberg (not a previously familiar name to me at all), “Om utvecklingen af samfinskt s i den ryskkarelska dialekten in Vuonninen”. This contains pretty much exactly my observation, just more than a century earlier already: while North Karelian (in his article: just from the village of Vuonninen in the parish of Vuokkiniemi) reflects *s as /š/ by default, it also maintains instances of /s/ that cannot be explained by any regular secondary conditioning factors. In particular, this holds for the assibilated reflex of *t before *i, where we today reconstruct *c per the South Estonian evidence. Moosberg too concludes that the result of this assibilation must have been a consonant distinct from plain *s. I don’t know what to make however of his suggestion for a “probably more spirantic sound” (“troligen mera spirantiskt ljud”) — should this be read as suggesting something like a nonsibilant *θ?

Moosberg’s primary data behaves also more cleanly that what I was able to scrape together. In particular he finds *c > /s/ just fine also in kaksi ‘2’, kuusi ‘6’, kyⁿsi ‘nail’, uusi ‘new’, varsi ‘shaft’. Several preterite stems like kokosi ‘collected’, läksi ‘left’, löysi ‘found’, makasi ‘lay (down)’, tuⁿsi ‘felt’ are also adduced, some of these are even confirmed by the KKS data, while I did not look into the topic at all. My own preliminary suggestion that (*uc, *rc >) *us₂, *rs₂ > *us₁, *rs₁ (> , ) could of course still hold for some other varieties upon closer investigation, but I am now less trustful.

The other typical position where we (at Helsinki at least) now reconstruct PF *c is the cluster *cr. Moosberg notes a reflex /s/ in these as well, but he follows E. N. Setälä’s influential reconstruction with *str and is unable to treat this as the exact same sound change, instead assuming a distinct cluster change *str > *s₂r. Yet, also these cases still have had an affricate reconstruction advanced for them early on as well. This I believe was first proposed by Frans Äimä in a 1921 article in Virittäjä; as I found out already a bit sooner after my previous blog post, in the spring of 2018. Äimä in fact refers to an outright palatalized pronunciation with [źr] or [śr] from the dialects of Rugajärvi, Jyvöälahti and partly Tver. This has not been recorded in the macrophonemic transcription of Karjalan Kielen Sanakirja, but aluckily, scans of the original field records are already partly available too and they do show this unexpected palatalization: Rj. aźrain, keźrä, Pistojärvi aśroan, Tver ḱeźŕä, ildaḱeźro (the latter still there also in 1958) and even Vuokkiniemi keśrä (1956). Äimä also builds here on a suggestion made slightly earlier by Ojansuu (Karjala-aunuksen äännehistoria, 1918 [4]) to reconstruct *st > *ts > *ćć just for Karelian, but takes a step further and proposes a very modern-looking reconstruction *tsr already for Proto-Finnic. Unfortunately, it appears that no one has before now brought their proposal(s) together with Moosberg’s. The nascent discussion on what exactly to reconstruct behind the correspondence NKrl sr ~ SKrl, Ludian–Veps zr ~ Western Finnish hr ~ EFi and southern Finnic *tr simply seems to have been dropped post-WW2, with overviews defaulting to Setälä’s *str almost up to the present day. Even the current reconstruction with *cr is still not highly prominent really, being proposed by again Petri Kallio merely in a lengthy footnote #9 of his 2012 article “The Prehistoric Germanic Loanword Strata in Finnic“. A bit more visibility seems to be warranted here, and I would propose introducing the name “Moosberg’s law” for the North Karelian retention of /s/ from *s₂ < *c.

These finds taken together do not amount to merely rediscovering lost earlier wisdom, but the flavor is certainly there, and it’s hard not to wonder what other small but potentially crucial notes on Uralic historical phonology might be already out there, theoretically available to the reader but not roadposted by any modern back-references. [5] Considering the issue I have, in fact, considered starting work on a Uralic analogue of N. E. Collinge’s 1985 monograph The Laws of Indo-European, or maybe first some more limited analogue similar to e.g. Nathan W. Hill’s 2011 paper “An Inventory of Tibetan Sound Laws“.

[1] The closest is maybe the defunct Ural-Altaic hypothesis, and its succession in the Altaic wars on the other hand (restricting the family by the exclusion of Uralic and perhaps other parts), the Nostratic hypothesis on the other (widening it by the inclusion of e.g. Indo-European, Kartvelian and Yukaghir). All early defenses of Ural-Altaic are however obviously sketchy and often admit as much. There are no systematic reconstructions of grammar or phonology or lexicon, only take-it-or-leave-it collections of parallels, many of them by now reinterpretable as areal or typological rather than genealogical; and hence not strictly speaking abandoned as such.
[2] “Etymologisen tutkimustyön parhaimpia puolia ovat epäilemättä ne ohikiitävät hetket, kun sanakirjoja lukiessaan tähdet asettuvat linjaan, ja sitä huomaa löytäneensä tai ainakin luulee löytäneensä yhteyden sanojen väliltä, jota kukaan muu ei ole ennen huomannut. Välillä ensihuuma muuttuu pettymykseksi, kun tarkemmin etymologisia sanakirjoja selailtuaan tajuaa, ettei todellisuudessa olekaan löytänyt mitään uutta, vaan on tomuttanut esiin vain jonkin vanhan pölyisen jo E. N. Setälän tai Yrjö Wichmannin esittämän rinnastuksen.”
– Seconded on the buzz as well, which you might get a glimpse of from my previous post.
[3] The biggest remaining gaps are probably among words not found in the four “key languages” to have been covered by dedicated etymological dictionaries already in the 20th century (i.e. Hungarian, Finnish, Komi and Khanty). Newer etymological or etymologically-minded comparative dictionaries exist also for Estonian, Mordvinic, Mari and Selkup at least, but these do not pay much attention to early literature.
[4] Earlier in the shorter overview “Karjalan äänneoppi” (1905; p. 30) Ojansuu still follows Setälä in positing one-step *str > *sr.
[5] A search in my digital literature collection indeed turns up zero references to this article of Moosberg’s, only a handful of mentions of his other work on Ume Sami.

Tagged with: , , , , ,
Posted in Commentary, Reconstruction

How to (not) report a lack of etymology: Samic *keaðkē

I have been having a simmering discussion with commentator “M.” under the post on what’s important for what in historical Uralistics. One general topic there that I keep pushing hard back at is the idea of “etymology unknown” as anything like a fallback explanation or default hypothesis. This is not a hypothesis at all, it is the absense of one. At the worst it might end up being elevated to a curiosity stopper, an excuse to not keep looking.

At the same time, I want to still stress that this doesn’t mean that anything at all, any kind of nonsense thrown out, makes an acceptable etymology. I’m already on record in favor of more attention being paid to “anti-etymologies“. “Etymology unknown” sometimes really is what should be reported. But I think that this is essentially always too little detail by itself and should be combined with telling what, exactly, is it that we have ruled out as not being known. Basically no language on Earth is at a point of etymological research so widely practiced and thoroughly scoured that we would have grounds to assume that “etymology unknown” means actually having exhausted all possibilities. Words reported as “etymology unknown” in some sources have new good etymologies coming out for them all the time, sometimes even from older literature that was neglected by the compilers of the reference work in question. They will keep coming too, if my own backlog of unpublished etymologies is anything to go on on.

So what should it look like when a word’s etymology really remains firmly unknown, not just underresearched? For an example, let us consider Samic *keaðkē ‘stone’.

Step one: check the semantic equivalents in all known relatives and main contact languages. In the meaning ‘stone’, we can find clear non-cognates in all reasonable directions:

  • Most Uralic languages reflect Proto-Uralic *kiwə. There is some phonological overlap here (initial *k, front vocalism) but the correspondence *ðk ~ *w seems unbridgeable without massive speculation. *ea ~ *i doesn’t have any good precedents for it either. It’s not literally impossible that these could be some day solved, especially as long as no traces of *kiwə are otherwise found in Samic, but for the time being this is a non-match.
  • Samoyedic reflects instead *pəj, an even worse phonological fit. *ðk ~ *j would be actually regular (< PU *ďk?), but this observation conflicts with the proposal to treat the Samoyedic word as cognate with Finnic *pii-kivi ‘flintstone’, both if reconstructed back to a separate PU root *pijə, or if treated as a semantically and phonologically divergent reflex of PU *piŋə ‘tooth’ (> Finnic *pii ‘tine’), e.g. by back-formation from the same or a similar compound, plus irregular lenition of *ŋ. [1]
  • Per Nikolayeva, Yukaghir has *kïj ‘stone’ (plausibly ~ *kiwə [2]), Kolyma Yukaghir also /pē/ ‘rock, big stone’ (plausibly ← Samoyedic), Tundra Yukaghir also /jeďi/ < ? *jenći ‘stone’ (no idea about the etymology of this), all still nowhere near *keaðkē and now also way off geographically and genealogically, hence a priori weaker than anything found in languages securely known to be related to Samic.
  • Germanic reflects *stainaz; Baltic reflects *ákmō and Slavic *kamy, both going back to PIE *h₂akmon- whence also e.g. Sanskrit áśman. No chance here for a loan from any known non-Uralic language of Northern Europe, no evidence for an ancient Indo-Uralic archaism either.

In known loanword sources a bit further off, we could try looking more into Indo-Iranian, where words for ‘stone’ seem to diverge quite a bit. A quick trawl thru Wiktionary nets at least Persian and Balochi /sang/, Pashto /kāɳaj/, Kurdish /bird ~ berd/, Wakhi /wurt/, Ossetic /dur/, Hindi, Kashmiri etc. /pattʰar/… but again none of this initial haul really gets us any closer to Samic.

Step two: check for morphological analyses. For words that don’t look like basic word roots, this probably should be step one. There is something that can be done here too though: *-kē < *-kA is a widespread Uralic nominal suffix, and we probably shouldn’t stress too much if this in particular fails to correspond in an otherwise decent cognate. Still, a shorter #keað- (suggesting pre-Samic #keð-) does just as poorly among the non-cognates above. We also don’t have anything within Samic that would particularly point to such a division. The most phonologically similar words reconstructible for Proto-Samic are *(s)keaðē- ‘temple (of head)’ and *kiðë ‘spring’, both semantically miles off from ‘stone’. In more narrowly distributed words from Northern Sami I can find geađđat ‘amicable’, geađđi ‘dimness’ (+ Lule skädot ‘to dim (of eyes)’, Skolt ǩieđâš [3]) which don’t help either. Relaxing phonological similarity even further allows reaching a different substance term *čëðë ‘coal’ (< PU *śüďə), but even allowing for irregular *č > *k would not suffice to set up any morphological relationship. Unless we are also wrong about the development of PU *ü(-ə) to PS *ë(-ë), and this somehow first merged with PU *e-ə rather than the phonetically expected *i(-ə)? If so, then we might consider *śüďə > *ćeðə > *ćeð-kä > *čeaðkē > ? *keaðkē. But I feel a semantic shift ‘coal’ > ‘stone’ remains nonsensical despite a vaguely shared semantic field. A connection between these meanings probably should rather start from something more generic like ‘nugget, pellet, grain’. Even ‘small stone’ perhaps, but that would be a poor match with ‘stone’ just in a supposedly derived Samic reflex vs. ‘coal’ all across Uralic.

Step three: check for phonological matches and see if their semantic difference can be bridged. We have done some of this already in the previous step. Looking more widely for PU roots, even of the very rough shape *k + front vowel + *d/ď again fails to turn anything good though. Besides ‘spring’ (with cognates in Mordvinic) our options are *keďə ‘skin’, *käďwä ‘female; ermine?’, *küdV ‘brother-in-law’ (unless the proposed Ob-Ugric cognates of Finnic *kütü are just divergent reflexes of *käləw ‘sister-in-law’), all again no-go. Germanic could be scanned as well, though for the time being I have no good resources for doing this thoroughly (anyone want to link me to a digital dictionary of Old Norse?). Balto-Slavic and Indo-Iranian we can probably leave aside, as there are no examples of Samic *ð or PU *d/*ď that originate from these.

Step four: check for semantic near-matches. This is somewhat harder to do rigorously. In recent times the CLICS database offers one handy tool at least: charts of typical colexification relationships between concepts in the world’s languages. Their concept map for STONE provides us with the rough gist that the options are limited. So far the only attested colexifications are with ‘mountain’, ‘egg’, ‘hill’ (mostly in Pama-Nyungan) and ‘seed’ (mostly in Austronesian; Finnish kivi as ‘pit of fruit’ might count too). Only the first, as observable already in e.g. English rock, has substantial amounts of evidence backing it.

However, it turns out that we are now in luck! PU *muna > PS *monē ‘egg’ is right out, and no PU or PS word for ‘seed’ is known at all. The proposed PU words for ‘mountain’ or ‘hill’ number a handful, and the best-attested cases like *wärä (> Samic *vārē) or *mäkə are also way off. But one less firmly attested example is *kaďV — continued in Hungarian hegy and a Samoyedic word family that might reconstruct as *koəjə (if we take Nganasan †koaja as recorded by Castrén as representative and not a later derivative from something shorter). This turns out to match well indeed with the morphological analysis *keað-kē that I have already hypothesized above, and the two root consonants match regularly. The vowel development *a > *ea is not the usual one, but can be tentatively explained: this turns up in Samic also in other cases before palatalized consonants, especially syllable-final ones, including *kaća > *keačē ‘point, end’, *kaććV- > *keaččë- ‘to look’, *laśkV- → *leaškō- ‘to pour (out)’, *waćara > *veačērē ‘hammer’ (cf. Finnic *kaca, *kacco-, *laskë-, *vasara); and perhaps the common Uralic Wanderwort *waśkV > *veaškē ‘copper’, back-vocalic also in Finnic *vaski, Mari *wåž, Hungarian vas, Khanty *wăɣ (but then front-vocalic also in Mordvinic *viśkə, Permic *-veś, Samoyedic *wäsa). The conditioning of this probably could use more research though.

Regardless it seems we can, after all, propose an etymology: PU *kaďə ‘(rocky?) mountain’ > early pre-Samic *kaď-ka ‘rock (object)’ > late pre-Samic *keďkä > *keðkä > Proto-Samic *keaðkē ‘stone (substance)’. A very nice result I feel, for explaining such a basic vocabulary item that has so far gone unetymologized! [4]

At this point I must emphasize that this result was not pre-decided. This etymology does not come from my above-alluded stash of unpublished discoveries. Right up to looking up the CLICS concept map, I was laboring under the assumption that *keaðkē indeed is a word of unknown etymology; certainly that’s the only thing I’ve seen reported for it, and certainly it also fits my typological expectations of substrate vocabulary (which is, in the absense of features like consistently recurring phonetic irregularities, generally fairly unknowable speculation in the case of any one particular word). And yet it turns out … if we just diligently explore the options, instead of worshipping our ignorance and writing words off as “unknown-therefore-unknowable”, a lot of the time we can make progress on their etymology. Wir müssen wissen, wir werden wissen.

Probably there would be indeed words where the four steps above are still insufficient for putting together an etymology; then again it would be possible to sketch out also a few further steps. And I think I have demonstrated regardless not just an apparent etymology for *keaðkē after all, but also, how and why the first few directions that we could think of for seeking its etymology do indeed fail.

[1] A hypothesis that would work decently here is that first *iŋ > *iń, which is not contradicted by any data (is nonprovably regular) and is within Uralic even paralleled by Permic *piń; followed by regular *ń > *j in most reflexes. Only Selkup really conflicts with this. — The reconstruction of *ə seems unclear too (actually given by Janhunen as *ə¹ = *ə/*å). Only the correspondence Nganasan /hᵘalə/ < †fala ~ Nenets *pæ points to this, while we have a seemingly preserved /i/ in Kamassian /pʰi/ and Mator hilä, and a close vowel also in Enets /pū/ < †puj ‹пуи›. Maybe some of these could even reflect a heavily contracted *pijwə < PS *pińwə < pre-PS *pińkiwə < PU *piŋə-kiwə (with loss of *k from a secondary cluster *ńk, but intervocalic *w preserved in a no longer posttonic position)?
[2] Considering the main etymology I discover here, another possibility could be to derive this thru some flavor of Samoyedic #kVj ‘(rocky?) mountain’.
[3] Related to Germanic *skadwaz ‘shadow’ somehow…? The front vowel seems like a poor match, though.
[4] One further phonologically interesting feature in this is that the Samic-specific fronting *a > *e seems to take place earlier than the common West Uralic depalatalization *ď > *d (or > *ð). I’m not concerned though. This seems to be proven as an areally-spread change already the fact that also Mari shows *ď > /ð/ while differing from West Uralic in showing *d > ∅. Actually in principle nothing rules out either that palatalization before *ď was more widespread, since we lack Finnic and Mordvinic reflexes, but I don’t see much benefit in this assumption over the previous.

Tagged with: , , ,
Posted in Etymology, Methodology

nyolszáz, kilenszáz

Recently when tracking a variety of citations back into early literature, I was directed to Zsigmond Simonyi, 1901: “Az Ábel-féle szójegyzék” (Nyelvtudományi Közlemények 31: 225–227), an article reporting the corpus of a small Hungarian–Italian phrasebook from 1438. One point that caught me by surprize were the words for ’80’ and ’90’. These are written as gnalsase ~ gnalzase, chilansase ~ chilanzase. These are clearly not quite the modern Hungarian words nyolcvan, kilencven — they look like they instead contain száz ‘100’ as the last member. The article does not give much commentary in general, but this is indeed noted. Simonyi thinks they are simply mistranslations and stand for the Hungarian words for ‘800’, ‘900’. However, the phrasebook at other times renders Hungarian /ts/ as z or ç; nyolc ‘8’ is gnauz, harminc ’30’ is armiz ~ armiç. kilenc ‘9’ is indeed written as chilens, but I don’t think this would represent a failure of the Italian author to distinguish /ts/ generally, perhaps just after /n/. So why then gnalsase?

It can be noted that etymologically nyolc and kilenc do contain old morpheme boundaries: they’re constructed on the general Uralic pattern of ‘8’ and ‘9’ as 10−2 and 10−1, and their shared final -c represents a contracted reflex of tíz ’10’. I think this might be happening here in a different way. That is, the words indeed do not have an affricate, and would be nyolszáz, kilenszáz if projected to modern Hungarian. They are also not to be read as 8·100 or 9·100, but rather, as subtractive constructions 20−100 and 10−100; “two (decads) before 100” and “one (decad) before 100”. Perhaps this idea is known already in Hungarology, but of course tracking references forward in time is much more difficult than backward in time. (Google has nothing for nyolszáz, kilenszáz, but then these are modernized spellings by me. Honti’s 1993 monograph on numerals in Uralic I do not have on hand to consult.)

Also the word for ‘100’ is itself given as tissase, seemingly standing for ‘ten hundred’ (tízszáz). However the word for ‘1000’ (mod. ezer) is still given separately as esere, so I don’t think this represents a translation error either. My guess would be that the phrasebook’s Hungarian informant spoke a dialect where this archaic-seeming model of ’80’ and ’90’ was pleonastically extended to ‘100’ as well.

[added 2021-07-04] Novel words for ’80’ and ’90’ would not feel terribly out of place also since Hungarian shows a wide variety of strategies for forming decads anyway. ’20’ is a separate word húsz (in 1438 usso ~ us), which has cognates in Ob-Ugric, Permic and Mordvinic; ’30’ is harminc, with a suffix -inc that has been compared with Permic /-mɨs/ in ‘8’ and ‘9’ (though I would think the –c is again from ’10’ with similar contraction later, and this means that the nasal could also have some different origin entirely [1]); ’40’, ’50’, ’60’ have a suffix -van ~ -ven (negyven, ötven, hatvan; in 1438 negiuem ~ neguieun, ethuem ~ octauen and otovan ~ otouem) that is normally compared at least with the decad endings in Komi (/-mɨn/) and Mansi (/-mən/).

Worth mentioning while I’m at it: the original point that led me to Simonyi’s article is that this phrasebook is apparently one of the last sources (maybe the last source?) that still displays retained word-final vowels in Hungarian, as we already see in sase, esere for modern száz, ezer. The former could in principle be an orthographic device to indicate voiced /z/, the latter however seems patently genuine: it can be contrasted with hamor and not anything like **hamoro for mod. hamar ‘soon’. This seems to be another sign that the Hungarian informant spoke a nonstandard dialect. To my knowledge, 1400s Hungarian codices otherwise no longer contain any trace of the word-final short vowels as they appear in the earliest Hungarian texts from 1055 (the Tihany abbey charter) and the 1190s (Halotti beszéd…). Also, while these two early sources seem to reflect a word-final u for several consonant stems of modern Hungarian, in the phrasebook this is more typically now an o; a front-harmonic equivalent e is also well attested. One word recorded at both stages of development is the adjective ‘big’: HB nogu > 1438 nogio > mod. nagy. This of course is just the same change as the lowering of Old Hungarian u to modern o (when from Proto-Uralic *u and probably standing for a short reduced /ʊ/) as also found inside word stems. Some cases of seemingly unlowered u still appear too though, e.g. burso ~ borth ‘pepper’ > mod. bors; harum ‘3’ > mod. három. Probably the reflex of Old Hungarian *ʊ was at this point still a high-mid-ish vowel [o̭] that was partly heard as /u/ by the Italian author of the phrasebook (when adjacent to labials? this would kind of parallel the modern Northern Mansi spelling of unstressed [ə] as у before labials, as in e.g. хӯрум /χūrəm/ ‘3’).

There is also evidence of consonant-stem nominals already, such as aram ‘gold’ (> mod. arany), assem ‘woman’ (> mod. asszony) (m for word-final /ń/ appears to be regular in the phrasebook for some reason), bor ‘wine’, fos ‘penis’ (> mod. fasz); nevet accusative of ‘name’ (but leginto acc. of ‘young man’ > mod. legényt, napotu acc. of ‘day’ > mod. napot). A possibility that these bring to mind is that word-final vowels may have been already regularly lost in words of some shapes such as *CVRV or *CVCVNV, while most retained cases in the phrasebook seem to follow obstruents; or a consonant cluster in embre > mod. ember ‘person’, olno > mod. ón ‘tin’. The available corpus of data is lamentably small though and I would also not rule out that some words like bor (coming via Turkic from Middle Persian bōr) simply were always consonant stems in Hungarian.

[1] Even *harm-van-c > *harmanc with a cluster simplification *mv > m could be worth considering, but this would leave -i- very mysterious.

Tagged with: , , , , ,
Posted in Etymology

Details of some vulpine words in Uralic

A recent open access paper by half a dozen Leiden Indo-Europeanists: Palmér, Jakob, Thorsø, van Sluis, Swanenvleugel & Kroonen, “Proto-Indo-European ‘fox’ and the reconstruction of an athematic -stem” presents a very thorough analysis of various core IE words for medium-sized carnivores (h/t Languagehat). The main conclusion is that these constitute two etyma rather than just one: *h₂lop-eḱ- ‘fox’ ≠ *wl̥p-i- ‘wildcat’ (surely not **ulp-i-?), even though some reflexes of the latter do end up with the meaning ‘fox’, namely Latin vulpēs and Albanian dhelpër. The latter has been included here thru a dissimilation *v > dh / _V(C)p (another tally to the already lengthy list of Weird-Ass Albanian Sound Changes™, but the other mentioned examples dhampir ‘vampire’ and dialectal dhespër ‘evening’ do look watertight to me).

The paper includes also a lengthy digression on loanword reflexes of the former etymon in Uralic. Despite the unusually-large-for-linguistics author team however, none of the writers seem to be Uralic specialists. They have had some good help on this at least; Petri Kallio has been thanked for consultation and Sampsa Holopainen’s 2019 thesis treatment of these loanwords is also referred to repeatedly. I would still add a few details to the account of the Uralic data though, as they seem to illustrate several novel or less-known phenomena in phonology and morphology.

1. Finnic

Palmér et al. start their discussion of Finnic by asserting a back-harmonic Proto-Finnic **rpoi behind North Finnic *repoi. I however do not see any grounds for this. Second-syllable *o was neutral with respect to vowel harmony in PF; key data for this phonological interpretation comes from two corners of the southern part of Finnic language area, where we still find even an explicitly disharmonic vowel comination ä–o in languages that otherwise follow vowel harmony. The first is Votic, showing e.g. tšäko ‘cuckoo’ (< *käkoi), pääsko ‘swallow’ (< *pääskoi), sälko ~ śalko ‘foal’. Note that these also cannot be explained as later loanwords, since their cognates in North Finnic do end up re-asserting harmony (Fi. käkö, Ing. käkö(i), Krl. Lud. kägöi; Ing. pääsköi, Livvi piäsköi ~ piätšköi; Fi. sälkö, Krl. proper šälkö ~ šäľgö). Secondly this vowel combination has been retained also in South Estonian. Besides pääsokõnõ ‘swallow’ (no reflexes of *käkoi, *sälko), cf. at least näio ‘maiden’ from PF *näito(i) (> core Finnic *neito(i) > Fi. Vt. neito, Ing. neitoi, Krl. Lud. ńeidoi, Veps ńeidō) and räbo ‘junk’ ~ Est. räbu; also Fi. räp-eä, Krl. räp-äkkä, Veps räb-ed ‘brittle’ (different derivatives but affirming original *ä). Vt. repo and also SE rebo ‘fox’, neglected in the paper, can be therefore taken to directly continue PF disharmonic *repoi.

The “clipped” derivation *rebäs → *repoi is certainly unproblematic: this is very typical for *oi-diminutives in Finnic, already found among the oldest examples such as *jänis ‘hare’ → *jänoi > NF *jänöi ‘bunny’, *kaunis ‘beautiful’ → *Kaunoi ‘name of a cow’, and perhaps (the semantics seem off) *talas ‘platform, shed’ → *taloi ‘house’. In later, more localized examples we find all sorts of stem-final or even root material dropping off, like Ingrian hanoi ‘goose’ ← han[hi], Ludian ohtoi ‘thistle’ ← oht[ikaz], South Ostrobothnian Fi. Torstoo ‘name of a cow born on Thursday’ ← torst[ai] ‘Thursday’. [1] There is also some minor evidence of stem-final *-(a)s : *-aha- being reanalyzed as a suffix eventually at least, since we find it sometimes secondarily attached to native stems, e.g. Fi. lippa ‘overhang, visor, etc.’ → lipas : lippaa- ‘chest’. This leaves some space for an analysis similar to Hungarian (cf. below).

Other stem variants present two other problems, which to me appear to largely cancel each other out however. For one, while scarcely attested PF *rebäs could indeed regularly continue earlier *rebäś < *repäć(ə), to me this would not seem to predict an inflectional stem **repäh(e)-: there is no positive evidence that the early lenition *-s- > *-h- between unstressed syllables applied to secondary *s from palatalized *ś < *ć. I believe an explicit counterexample is at least the North Finnic conditional mood marker –isi-, which I would derive from pre-PF *-j-śə- < *-j-ćə- (*-j- from the imperfect stem); not from a suffix *-ŋćə- with an original nasal (the Samic potential mood marker *-ńće̮- I would consider to get its nasal from the PU potential mood marker *-nə-). For two, the authors note that forms like Estonian rebane could continue a diminutive *repäh-inen, but that Veps rebāńe does not quite support this, pointing instead to PF *repäinen. This is not a problem though if the PF paradigm of *rebäs originally did not have forms with *-h-! I would instead consider an earlier West Uralic *repäć(ə) first giving *repäś : *repäśə-, evolving by late Proto-Finnic into a paradigm *rebäs : *repäise-, with *-i- by palatal unpacking. The latter would then have been readily interpretable as the oblique stem of a diminutive *repäinen, motivated also by the fact that by far most bisyllabic nouns ending in *-s had either an oblique stem in *-hE- (if from pre-PF simple *s) or *-ksE- (if with the PU noun-deriving suffix *-ksə).

A similar reshuffling of an unalternating *s-stem into two different paradigms seems to have taken place also in the other example where we can clearly reconstruct a noun ending in pre-PF *-Ać(ə). This is the word for ‘male pig’: Fi. oras ~ orainen ~ oraisa, Krl. orattšu, Veps oraž(a-) ~ oratš(u-). These have their origin in West Uralic *worać(ə) ← Indo-Iranian *warādźa- (cf. Holopainen 2019: 313–314); whence also Moksha /urəś/, dim. /urəź-i/ (with voicing alternation pointing to a pre-Mo. consonant stem *oraś : *oraśə-). From this it seems to me that “reconstructing forwards” would yield PF *oras : *oraise̮-; the first form of these then later gaining an analogous inflected stem *oraha-, the second an analogous nom.sg. *orainen. This last-mentioned form would have been further folk-etymologically interpretable as a derivative of ora ‘awl’, leading to the creation of two further variants ora-isa, ora-ttšu.

Tangentially, I think this mechanism also explains the two different shapes of the word for ‘crow’ in Finnic: Fi. Ing. Krl varis (: varikse-), Lud. Veps variž , Livonian vaŗīkš ending in *-is, versus Est. and SW Fi. vares (: Est. varese-, Fi. varekse-), Vt. varõz (: varõ(h)sõ-), SE varõs (: varõ(s)sõ-) ending in *-e̮s. While words for ‘crow’ display a wide variety of different suffixes across Uralic altogether — e.g. Erzya /varaka/, Hungarian varjú, Southern Khanty /wărŋaj/ (< pseudo-PU ? *wara-kka, *warV-ja, *warV-N-woj) — evidence for a suffix with *-ć- can be found in both Samic (*vōre̮ć) and Mordvinic (? *varśəŋ > Er. /varćej ~ varśej ~ varkśij/, Mk. /varśi ~ varći/). It would seem to be possible to reconstruct already a common West Uralic *warə-ć(ə). From this I would expect to see in PF a paradigm *vare̮s : *varise̮-, again with clean depalatalization syllable-finally vs. palatal cheshirization medially. The stem was then maybe reworked to *varikse̮- already early on; there seems to be no evidence for a reanalysis as a diminutive **varinen (maybe avoided due to the crow being a relatively large bird).

2. Samic

The Samic reflexes do not receive a separate discussion in the article. The main question raised is if a suggested Proto-Samic *reapēš should be considered a recent loanword, and if so, where from.

At least the suggestion of *š being a substitute of North Karelian š in a lost **reväš seems anachronistic to me. PS dates to ca. 2500 BP, the shift of *s > š in NKrl. to ca. 1000 BP at the earliest, if taking place right around the split-up of Old Karelian. The distribution of this variant of the word in Samic (South thru North, with no Eastern Samic reflexes) does not match with a Karelian origin either, either old or more recent. Examples that Palmér et al. bring up of the type PS *še̮lmē ‘eye of an axe’ ~ PF *silmä ‘eye’ (~ inherited PS *če̮lmē ‘eye’) or PS *še̮ltē ~ silta < PF *cilta ‘bridge’ (← Baltic), where PS *š seems to continue a Finnic *s, probably represent mostly an allophonic palatalized realization of Proto-Finnic *s as [sʲ] when adjacent to *i. To me the simplest loan source would therefore seem to be the Finnic inflected stem *repäise- (whether or not it already had *repäinen as its nominative). The suffix *-ise- indeed later regains phonemic palatalization in quite many Finnic varieties, already so in Karelian and Eastern Finnish. This interpretation also accounts for the retention of *-p-, as in a hypothetical late loan from an unattested NKrl. **reväš we’d probably expect reflexes like Lule Sami **rievij rather than the attested riebij.

3. Permic

Following Holopainen, Palmér et al. consider Permic *rući̮ an independent back-vocalic loan. — For a preface before continuing, I write the vowels here as they would be in the classic reconstruction of Itkonen and Lytkin; the paper instead follows Zhivlov’s most recent sketch of Proto-Permic reconstruction in reconstructing *roću̇, on which suffice to say I am not especially convinced of it. I do not wish to get bogged down in details of PP vowel reconstruction schemes here though, as I agree with the point that Komi /u/ would regularly reflect a PU non-close back vowel *a/*e̮/*o and not front *e. This would be itself a sufficient reason to not derive PP *rući̮ from the preform *repäć(ə) indicated by Finnic, Mordvinic and Mari.

The authors however also advance the claim that medial *-ć- should have been voiced and that therefore a preform with a geminate is required, along the lines of *rApaćća. I believe this is an overreach. An underappreciated fact of Permic historical phonology is that word-medial lenition only fully applies post-tonically! The best-known examples of the development later in a word are the possessive suffixes: cf. Komi-Permyak 2PS /-ɨt/, 3PS /-ɨs/ << PU *-(n)tə, *-(n)sa, and the ordinal suffix: KP /-ət/, Udmurt /-et/ << PU *-mtə, which remain voiceless (with secondary voicing of *t in Zyrian Komi /-ɨd/, /-əd/). The possessive suffixes do end up as /-ɨd/, /-ɨz/ in Ud., possibly originating e.g. as positional variants after secondary stress; but in any case note that despite voicing, we do not find this feeding into further lenition *-d- > *-ð- > ∅ as is the fate of root-medial *-t-. Some derivational suffixes show this same development too, most clearly the adjectival suffix /-ɨt/ << PU *-ətA, as in examples like Ud. /peľmɨt/ << PU *piďm-ətä >> Fi. pimeä ‘dark’; perhaps also the adjectival suffix Ud. /-eś/, K. /-e̮ś/ (from PU *-ća?). There also appear to be examples among the few trisyllabic word roots that can be reconstructed for PU, such as K. /rɨnɨš/ < PP *ri̮ŋi̮š ‘threshing ground’ < PU *riŋəšə, PP *ľaŋes ‘birch bark vessel’ < PU *ďäŋäsə. [2]

Lack of voicing of the affricate in *rući̮ is therefore no problem even if going back to something like *rApaća, borrowed already roughly from Proto-Indo-Iranian. We do need to date it as younger than the deaffrication *ć > *ś that is represented in oldest II loans like late common Uralic *ćarwə > *śarwə >> PP *śur ‘horn’, though. This “new” *ć that survives into modern Permic probably also should be able to continue not just a PII *ć but also a slightly later Proto-Iranian depalatalized *c. Permic has never had a native dental affricate, and even some early Russian loans into Komi end up substituting ц as /ć/ (IIRC including in nonpalatalized positions, but I don’t have a list of these readily around).

4. Hungarian

In Hungarian, ravasz ‘cunning’ (OHu. ‘fox’) and róka ‘fox’ represent additional clearly independent loanwords. Following Holopainen, who in turn follows early less assertive suggestions by Sköld and Joki, we can easily agree that at least the former is likely to come from later Alanic, insted of by any kind of ad hoc backing development from *repäć(ə).

I would indeed also rule out an early loan with PU *s. The example of fészek ‘nest’ < PU *pesä is not really itself well-explained enough to make a precedent for retention of *s as sz /s/. The only real suggestion that has been advanced for this is a somewhat ad hoc blocking of *s > *h before a word-initial fricative f-, which is itself not clear without knowing how early *p- > f- is exactly, nor does it not strike me as clear if voiced -v- or slightly earlier *-β- could be assumed to have had the same effect as voiceless *f-. There is one seemingly exact parallel to this dissimilation, fasz ‘penis’ ← PII *pásas (a loan etymology re-defended by Holopainen, 185–186); but this has an apparent Samic cognate *pōče̮, pointing to PU *ć and not *s, which IMO leaves also the loan etymology uncertain. For this word I would actually not even entirely rule out the suggestion of Rédei, who in one of his last papers [3] suggested relatively recent loaning from an archaic but unattested Old High German reflex *fas; which is certainly at a disadvantage though since only a derived reflex, in OHG fasal ‘offspring’ (> modern German Fasel) seems to be actually attested in Germanic, and with not much trace of the meaning ‘penis’.

(I have also wondered if all this is maybe barking up the wrong phoneme and fészek should not be segmented as fész-ek, but rather fé-szek; where the second component could then perhaps represent a reduced reflex of szék ‘chair, seat’, cf. in Indo-European nest << *ni-sd-os ≈ ‘down-seat’. However this is not quite matched by the oblique stem fészke-, demonstrating that also the nominative singular continues earlier *fészk < *fēskĭ. Dialect forms such as fécek with an affricate might be an additional problem, though really equally also for any proposal that sz < PU *s.)

Back to foxes though: for modern róka it is indeed easy to analyze -ka as a diminutive suffix added to an earlier *raw-. This would on first look seem to represent similar “clipped” derivation as Finnic *rep-oi. While this is not the typical application of -ka in Hungarian, there are still examples, say JóskaJózsef, this usage perhaps motivated by the homographic and “homophonological” (even if not exactly homophonic) Slavic diminutive -ka. But I do like Palmér et al.’s proposal via reanalysis: ravasz would have been analyzable as containing the rareish suffix -asz and would have allowed *raw-ka to be formed by suffix alternation instead. [4] If I’m not mistaken, most examples of -asz and also the front variant -esz are nouns though — and the phonologically closest match is maybe tavasz ‘spring’ — so dating this change specifically after the shift ‘fox’ > ‘cunning’ in ravasz does not strike me as necessary at all. For that matter, this might be also too late for root-medial *aw > ó to be operative even analogically anymore, since the sense ‘fox’ is still attested for ravasz as late as 1403, ‘cunning’ only from about there on out.

Postscript: ‘Wildcat’ in Uralic (?)

After finding the Indo-European ‘fox’ borrowed, thru Indo-Iranian, directly or indirectly into half a dozen Uralic branches (including also relatively straightforward reflexes in Mordvinic and Mari that I don’t comment on specifically here), it is interesting to note that probably also *wl̥pi- ‘wildcat’ seems to have made the leap. These are the Samic and Finnic words for ‘lynx’: PS *e̮lpe̮s (narrowly distributed: North albbas, Lule albas) ~ PF *ilbes (pan-Finnic: Es. SE. Fi. Krl. ilves, Vt. ilvez, Lud. Veps ilbez, Liv. īlbõks). This time, retained *l and front-vocalism seem to point towards Baltic (Lith. vilpišys) and not Indo-Iranian. Only the loss of *w- would readily create a problem.

To my knowledge the comparison of these with Indo-European remains unpublished, but I’ve heard it from a couple of colleagues (for the time being please do not cite me on this). Its first public presentation might have been by Mikko Heikkilä at the 2017 conference Contextualizing historical lexicology — narrowly missed by Kroonen, who was scheduled to participate but IIRC had to cancel entirely. I’m not sure if his proposed routing thru an additional Uralic substrate in the northwest is at all necessary though. If the word was originally loaned as *wülpəs/š- or the like, the Samic word would reflect this entirely natively (*wü- > *ü- feeding into *ü > *i > *e̮ is known also in *wülä- > *e̮lē- ‘up, above’ — possibly the only word in Samic that retains a trace of PU *i/*ü contrast). In light of the apparently rare suffix -iš- in Lithuanian, final *-s maybe more likely continues earlier *-(k?)š, which again regularly gives Samic *-s, but would be expected to give **-h in Finnic. (One of Palmér et al.’s two other examples of this suffix is takišys ‘weir’, whose preform has also been borrowed into Finnic as *tokəš > *toge̮h > Fi. Ing. toe (: tokee-), Vt. tõgõ, Es. tõke ~ tõge, Liv. to’ggõd; or, since apparently there’s no good IE or even Balto-Slavic etymology, is it perhaps a loan from (pre-)Finnic into Baltic instead? [5])

A Samic loan already into Proto-Finnic would be unexpected though. All known words of Samic origin in Estonian have made it there late thru the mediation of Finnish, and I don’t think any are known in Livonian at all. The same is the case for “Language X” of some supposed non-Samic and non-Finnic hydronyms however, which (by the current evidence) seems to have arrived in Karelia / inner Finland / Sápmi via a northwestern route, not thru the Baltic. It’s also not clear to me why wouldn’t Finnic have simply borrowed the word itself from Baltic straight away? when it’s commonly thought that even Baltic loans in Samic were mostly mediated by early Finnic.

Word-initial *wi- does in general survive in Finnic, e.g. *viici ‘five’, *viimä(-) ‘end’, *viska- to throw’, which at first seems to weigh against direct derivation from IE. However I wonder if there could simply have been a conditional loss here. Proto-Uralic is known to have lacked word roots of the shape *PV(C)PV with two bilabials *m, *p in consecutive onsets; and also *w…m seems to lack any good examples for it. (Note that PF *viimä is a derivative with a contracted long vowel < *wiŋə-mä; ditto for e.g. *vaima ‘heart’ < *wajŋ(ə)-ma.) Perhaps by early Finnic, this constraint was then further extended to *w…p. This sequence still occurs natively in PU *woppə- ‘to observe’, but then *wo- simplifies to *o- in Finnic anyway, indeed also Samic, Mordvinic and in most cases Mari. Thus it does not seem out of the question to me that we have simply an early Baltic *wilpi(k)ši- borrowed into pre-Finnic as *ilpəksə. It would be also possible to then treat the Northern and Lule Sami words as (earlyish?) loans from Finnic rather than archaisms dating already to pre-Proto-Samic times. But this remains a hypothesis that could still use parallels, especially since there are also some Baltic loans into Finnic that do retain *w…p or similar pairings, e.g. *virpi ‘branch, rod’.

[1] For loads more examples, see e.g. Rapola, Martti (1920), “Kantasuomalaiset pääpainottomain tavujen i-loppuiset diftongit suomen murteissa“.
[2] I would suspect that this general point has been made before, but offhand I can only find partial statements, e.g. the classic Uotila, T. E. (1933), Zur Geschichte des Konsonantismus in den permischen Sprachen only really discusses the case of PU *t (p. 92 on).
[3] Rédei, Károly (2005), “Szófejtések 351–358“, Nyelvtudományi Közlemények 102.
[4] Another point that I also could believe to have been made before already, but I’ve not gone digging into the Hungarian literature.
[5] There is one nominally compatible PU root that could be considered as a source for this: *čoka ‘shallow, dry’; weirs are best built in relatively shallow rivers. This is reflected only in Samic and Selkup though, the latter reflex also being *če̮kə- ‘to dry’ rather than expected *čwe̮kə(-), which leaves almost everything in this comparison not especilly compelling.

Tagged with: , , , , , , , ,
Posted in Commentary, Reconstruction

Koibal Addenda

In the recent years, Tamás Janurik has been releasing online numerous papers, small surveys and reference materials on the Uralic languages, particularly Samoyedic and Hungarian (all mainly thru his academia.edu page). Last week the roster has been joined by what seem like two particularly notable works: Kamassz szótár and Kojbál szótár, two “doculectal-comparative” dictionaries that aim to arrange together and morphologically analyze all currently available lexical material on these extinct Samoyedic languages. Despite titles and introductions in Hungarian, the bulk of both dictionaries actually use German as their main metalanguage. Conveniently (if not for anglomonoglots), basic glosses are also provided in no less than three languages: German, Hungarian and Russian.

The haul is respectable: 1456+114 word groups for Kamassian and 570 for Koibal (with Russian loanwords in Kamassian listed separately from the “native Siberian” word stock [1]). A comparison that easily springs to mind is with the etymological lexicon of Helimski’s Die matorische Sprache, documenting 1134 word groups across all varieties of Mator, and at least the Koibal dictionary might reach similar status as a standard lexical source. For Kamassian there still remain unpublished archive materials though, some already from the main field researchers Castrén, Donner and Künnap. Given their close relationship, in principle it might be also a good idea to eventually arrange all Kamass–Koibal material in a single etymological database or the like.

So far I’ve been poring over the Koibal data and its etymological remarks. Going back to the original sources of Spasskiy and Pallas (and also cataloguing their later appearences especially in the works of Klaproth), Janurik turns out to identify a good couple dozen more Koibal cognates for Kamassian and other Samoyedic languages than are listed in earlier reference works. No more than four of these lack Kamassian equivalents altogether, though: from Spasskiy корламъ ‘to ask’ (PS *kå-), пысва ‘rotten’ (PS *poså- ‘to rot’), тугуламъ ‘to gnaw’ (PS *t¹okɜ-); from Pallas chailàn ‘gull’ (PS ? *kələjə). This could be though in part due to how Janurik does not seem to propose any entirely new Proto-Samoyedic roots, and limits himself to adducing new Kamass and Koibal reflexes for previously known ones. This still leaves a good number of unetymologized vocabulary awaiting further research. All these are now at least well identified and collected together. Janurik employs an admirably detailed scheme of marking each word group with an etymological code: P1–P5 for words that seem native to some extent within Samoyedic, L1–L3 for post-Proto-Samoyedic loanwords, XX for entirely isolated words. The distinction between his layers P1 (Proto-Uralic) and P2 (Proto-Samoyedic) is not quite up to speed on 21st-century research, but this is a minor detail here. Similarly I wonder about at least the naming of his group P3 (Proto-South Samoyedic), when it is Janurik himself who has presented one of the clearest arguments against assuming such a subgroup. [2] But it is certainly of some value to distinguish Kamass–Koibal words with and without northern Samoyedic cognates, as the latter e.g. might be more likely to turn out to be areal loanwords rather than actual common inheritance.

The newly identified cognates so far already provide food for thought anyway. For a simple example, the aforementioned chailàn ‘gull’ seems to be slightly off compared to the earlier PS reconstruction, suggesting rather something like *kəjələ. A slightly better match in root structure could be actually UEW’s *kaja(-ka) ‘gull’; or, since PU *a > PS *å > Kamass–Koibal a is a minority development (normally *å > o, u) and incompatible with the potential Nenets and Selkup cognates that certainly require *ə, maybe the best solution would be independent formation after all from a mimetic root √kaj-.

A second bird name that leaves me thinking is Km. šēgə ~ Kb. сега ‘cuckoo’. This could be derived entirely regularly, together with cognates in Selkup, from PS *käkV. Clearly this is another old mimetic term, at least predating the assibilation of PS *k to *š; but how old exactly? Several compareable words for ‘cuckoo’ turn up again also further west in Uralic, including Khanty *käɣii, Udmurt /kikɨ/, Komi /kɤk/ and Finnic *käki (the first three reported but considered improbable in SSA). The medial consonants and vowel correspondences do not entirely behave though. At best Khanty and Finnic would point to *käkə, Samoyedic and Permic to *käkkä; or maybe Samoyedic and Khanty to *käkä. This all might not be fatal in a bird name; some of this could be reshaping to retain a more iconic shape for the word (whereas e.g. from *käkə we would otherwise expect *kä in Samoyedic). But then we could ask as well if this is not due to the words being independently formed; or borrowed even: the Finnic words have been often considered to be loaned from Baltic (cf. Lithuanian gegužė with a dialect variant gegė), though this remains uncertain too for similar reasons. — Really the entire distinction between “reshaping” and “independent formation” seems somewhat vacuous when dealing with words of this sort that have had an iconic motivation available all along. Quite likely Proto-Uralic did have a name for the cuckoo that was something like #kVkV, but if this has actually survived in an expected regular shape anywhere would have to be guesswork. [3]

Next up, the case I find the most interesting are the Kamassian and Koibal words for ‘son-in-law’. I’ve already noticed earlier that the former would go well with a hypothesis I have on the reconstruction of this word in Proto-Uralic, and Janurik’s newly adduced Koibal cognate seems to support the idea further. Actually even the Kamassian cognate has not appeared in etymological references earlier as far as I can tell. This is not a major surprize, since the form is malmi, quite far from either SW’s Proto-Samoyedic reconstruction *wiŋə or UEW’s Proto-Uralic reconstruction *wäŋe.

The first key to this puzzle is provided by Kamassian alma ‘dream’. Nominally this comes very close to Ugric forms for the same (e.g. Hungarian álom : álmo-), and UEW goes as far as to support a wild proposal of a loanword from Khanty. Janhunen in SW however suggests a different solution. Within Samoyedic a clearly different root can be reconstructed for ‘dream’: *äŋwå, and the Kamassian word could be derived from this via assimilation–then–dissimilation, *ŋw > *ŋm > lm. Such a sound change series would already provide more grounds for comparing malmi with PS *wiŋɜ (note also that *w- > *b- > m- before a word-internal nasal is a known regular sound law). The Koibal cognate identified by Janurik comes in at exactly this point: we find here the form манмемъ (most likely an 1PS possessed form ‘my son-in-law’), suggesting that also this instance of Km. /lm/ has indeed evolved from *ŋm. I would not be certain on if this should be taken as still containing /ŋm/ however (thus Janurik) or, as it can be read prima facie, /nm/. This latter could be still archaic with respect to Kamassian of course, i.e. in more detail we would have *ŋm > /nm/ > /lm/. (The other possible routing I guess is *ŋm > *ɫm > /lm/, slightly more awkward since there seems to be no reason to assume a distinct velarized *ɫ at any point in the history of Kamassian.)

Where would this word-internal *m < *w come from then? I suspect it has actually been there all along. For one, we already have various forms like Finnish vävy and Mator mijüh (миюгмэ) pointing to some kind of an original labial element near the stem vowel, which has already led to newer reconstructions along the lines of PU *wäŋəw(ə) rather than bare *wäŋə. [4] For two, the Samic reflexes of this word shows a long-standing minor problem: they indicate Proto-Samic *vivë, with a seemingly Finnic-like development *ŋ >> *v. I would suggest that this issue is due to incorrect segment alignment: that Samic *v does not continue the original 2nd-syllable *ŋ, but instead the 3rd-syllable *w, and original *ŋ has been instead lost to a vocalization process of some sort. If correct, this would show direct evidence for a reconstruction *wäŋəwə (i.e. ruling out anything like **wäŋü with a labial vowel in PU already), making the PU shape of the word actually a relatively good fit at least for the consonant skeleton of Kamassian and Koibal. I could even suggest reconstructing for PU a morphophonologically alternating paradigm, with a vowel stem *wäŋwə- (> Samic, Km–Kb) : consonant stem *wäŋəw- (> Finnic, Nenets, Nganasan etc.); though this is motivated also by some other considerations that would take us fairly well afield from the current topic.

There is definitely still room for skepticism about this however, and in particular the vowel correspondences continue to be quite irregular: in the first syllable, none of PU *ä, PS *i and Kamass–Koibal a regularly corresponds to each other, while in the 2nd syllable, Km. -i ~ Kb. -e most typically continues PS / PU *-ä, not *-ə.

So far I have not started any systematic investigation of the entirely unetymologized Kamassian and/or Koibal vocabulary remaining. However, for closing, one simple observation on this front: kuro- ‘to be angry’ (in both Km. and Kb.) probably continues PU *kurə ‘anger’.

[1] i.e. native Samoyedic words, Turkic and Mongolic loanwords, and all vocabulary of unknown origin.
[2] Janurik, Tamás. 2012. Volt-e a déli-szamojéd (PSS) alapnyelv?Per Urales ad Orientem. Iter polyphonicum multilingue: 145–162.
[3] A further complication still is the potential Mator cognate / reflex: géihe in Pallas, кига in Müller, per Helimski suggesting PS *-jk- rather than plain *-k-. However the precedent of PS *äjmä ‘needle’ > Kamassian ńīmi ~ Koibal неме would maybe then seem to predict ˣšīgə for ‘cuckoo’, and we are right back in not knowing which way irregular correspondences in iconic or onomatopoetic vocabulary should be interpreted.
[4] This final *-w(ə) is strictly speaking not segmentable, but it is probably originally the same formant as also in two other in-law terms: PU *käləw(ə) ? ‘sister-in-law’ and *nataw(ə) ? ‘brother-in-law’.

Tagged with: , , , ,
Posted in Etymology, News, Commentary

Analogy Is Not Phonology

While my blogging here has been firmly within historical linguistics, every once in a while I do go poking around self-styled formal linguistics blogs too. [1] This tends to be a frustrating exercise though. By now, supposedly deep problems discussed around such parts tend to strike me as, frankly, dumb questions that only exist due to particular “theoretical commitments”, and which could be trivially resolved or avoided within better-grounded frameworks of understanding language. People stuck in generativist bubbles in particular, however, seem to be often unaware that any other types of approaches would exist at all.

As I’m rather more informed about the ground-level facts of phonology than e.g. syntax, this is going to be the more profitable area for me to comment on in any real detail, though generative syntax has also struck me as having foundational flaws roughly analogous to the foundational flaws of generative phonology. (I presume open-minded syntacticians should be even able to figure out these, ahem, analogies themselves, without me having to do all their work for them.)

At any rate, a good majority of questions attracting protracted debate in phonological theory that I have seen are immediately solved under the traditional non-generativist approach: “phonological processes” or “deep structures” do not exist as such. They are only grammatographical shorthand; rules of thumb, not rules of Grammar. [2] Where non-allophonic “phonological alternations” actually exist is within the lexicon, not within phonology.

A standard counterargument to this seems to be the fairly simple observation that loads of obviously non-allophonic alternations are, in fact, still productive to this or that extent in loads of languages. Checkmate, lexicalists?

No, of course not. This simply shows a particularly pernicious systematic failure of generative linguistics — a lack of understanding of language change, particularly that language change, including linguistic creativity, does not take place solely inside a box of “Grammar”, but also within the lexicon. Phonological alternations are easy to approach in this fashion, as they are generally not actually productive in the sense of immediate, universal applicability (as they say in Generativistland, they can be opaque). Moreover quite typically they are “productive in spats”, creating new forms one by one, now and then in the speech of particular speakers, not everywhere constantly. And the range of applicability for any one process is very finite really: while everyone creates novel noun phrases practically daily, I would wager that most people do not create any entirely novel strong verb forms over their entire life. [3] In historical and historically-informed linguistics, our default assumption is to attribute these kind of changes to the process of lexical analogy, and understanding it is vital to understanding patterns that arise and exist in language.

What we can actually observe is that any arbitrarily deep alternations can indeed inspire the coinage of new instances of the same, and therefore they can remain “productive”. If desired, I can readily coin all sorts of cases like longlengthoblongoblength or singsungwingwung. But then nothing stops me from creating folk-etymological examples either, say choosechoicesnoozesnoice. These also fade organically into snowcloneish blends, e.g. thanks, antsthantshello, horseshellorses; spelling pronunciations, e.g. tentacles ∶ /ˈtɛntəkəlz/ ∷ Pericles/ˈpɛɹɪkəlz/; or (mis)etymological nativization, e.g. English wrong ∶ Swedish vrång ∷ En. to wring ∶ Sw. vringa. Crucially, what needs to be noted is that this is an extralinguistic cognitive skill that should not have any bearing on the development of purely linguistic theory. Already etymological nativization refuses to respect the confines of a single language, and I think most theories of mental grammar would likewise not attempt to account for spelling pronunciations. We can also easily advance loads of more or less formal analogies in areas that have nothing to do with language, from mathematics (2 ∶ 20 ∷ 5 ∶ 50; square ∶ cube ∷ triangle ∶ tetrahedron) to the natural world (nitrogen ∶ ammonia ∷ oxygen ∶ water; the Congo ∶ leopards ∷ the Amazon ∶ jaguars) and human society (evolution ∶ Darwin ∷ relativity ∶ Einstein; punks ∶ pop punk ∷ ravers ∶ happy hardcore). This, I think, demonstrates beyond reasonable doubt that analogy in fact is a general skill that humans possess, and hence there’s no point in trying to reduce its applications in language into some kind of specifically linguistic primitives.

(Note BTW that while all my examples above are phrased as classic proportional analogies, this also should not be assumed to be the only possible or even the main mechanism of analogy.)

Once we accept the existence of analogy as an explanation for some cases of morphophonological productivity, this provides also a direct path into rich gains in parsimony. My linguistic examples above have been chosen to be on the “clever” side, i.e. building on only marginal precedents, partly to be sure that they’re indeed novel (at minimum to me!), partly to make it seem more convincing that they should not be modelled by inserting additional epicycles into English (morpho)phonology. But the mechanism of analogy works perfectly well also on any pedestrian phonological alternations out there. What is, say, the plural of oblength? It’s clearly oblengths — but then we could model this conclusion as having been drawn purely on the analogy of lengths, or also tenths, shibboleths, Beths, etc., without needing to assume any distinct, exclusively linguistic machinery behind this. The putative outcome oblengthes, just like also morphologically clearly different options like oblengthim or oblengtha, can be predicted to be unlikely already due to the lack of bases of analogy that could lead to them. [4] That all sorts of other coinages also follow the same pattern could be likewise explained already by the extremely strong precedent for the English plural marker to be -s. In principle even the regular phonologically conditioned allomorphy between -s and -es could then turn out to be simply emergent within the English lexicon, if we enrich it with sufficiently many plural forms stored as lexemes. This approach allows cutting out a hefty amount of costly theoretical complexity assigned to phonology in theories that fail to recognize that analogy exists.

Spending one further moment within philosophy of science, there is certainly also an apparent countercost of presuming the existence of some words like lengths as separate from length (or sung from sing, etc.). However, given that lexicons already indisputably exist, and contain many, many thousands of items anyway (and that, given the phenomenon of suppletion, these indisputably can be syntactically specified as particular inflected forms, etc.), just a few hundreds more to “seed” it with generators of morphophonology should be unambiguously considered the superior solution. Extra stuff is free.

It would be indeed possible to go further still and to propose that e.g. even the realization of oblengths as specifically /ɒblɛŋθs/ with /-s/ (and not /-z/) will be inferred by analogy from other English plural forms. It’s hard to rule out that this could not be the case for some people. [5] But I do grant that this at least is not an approach that could be fully generalized. Analogy generally allows for multiple solutions, some of them perhaps much less probable but still possible (e.g. if we take a cube as a prism with a square base, not as a polyhedron entirely made of squares, then the triangle analogue will be a triangular prism, not a tetrahedron; and maybe it should be heorses /hɛəɹsɪz/ rather than hellorses). Allophony by contrast is, by all appearences, subconscious enough that speakers find it difficult to create or perceive forms departing from it, and it clearly calls for a different kind of cognitive machinery.

[1] That’s the {self-styled formal linguistics} blogs; what they call themselves is, apparently, just “linguistic blogs”, with the common if vaguely cultish stance that only their branch of work actually constitutes Real Linguistics.
[2] As far as I can tell, a lot of trouble indeed comes already from the failure to fully distinguish descriptive grammar from mental grammar. Much of the early history of morphology and syntax quite transparently consists of attempts to formulate rigorous definitions for concepts of traditional Greco-Latinate grammatography like “subject” or “word”, but with little attention paid on if this even should be done: a priori there is no reason to expect mental grammar to have any building blocks at all in common with traditional descriptive grammar (much like how, say, biochemistry is not under the obligation to follow any views of Aristotelean natural philosophy). Modern theory of phonological processes indeed also looks like as if it largely amounts to applying the same mistake ultimately to Pāṇini’s descriptive (morpho)phonology of Sanskrit, although the road from there to Chomsky & Halle is not clear to me.
i.e. “novel to English (or German, etc.) as a whole”. E.g. (a soup has been) wung might be a new creation for me just two days ago (‘prepared without a prior plan or recipe’, if you must know), but even before checking I am certain that others have stumbled on this same territory before. — Oh yes, no question about it: it’s even on Wiktionary already, with attestations going back to 1881.
[4] But, of course, not impossible. As e.g. advanced linguistics students faced with the wug test will readibly demonstrate, sufficiently large numbers of contrarian smartasses will eventually end up creating any form imaginable, no matter how “ungrammatical”. Almost nothing in language is actually impossible. This is perhaps the most clearly so when a phenomenon is “impossible” (rather, inacceptable) in one language variety but business as usual in another.
[5] Definitely not for me though. As an L2 speaker whose native language has no voiced fricatives, I ended up adopting the English plural marker(s) as just /-(i)s/ back in the day, and though I can by now make conscious effort to use [z] instead, I will be still quite content to speak of [windous], [siːliŋs], [hɑusis], [tʃʰiːzis], [nɑiʋs], [dɔgs], etc…

Tagged with: , , , , , , ,
Posted in Methodology, Commentary

Enter your email address to follow this blog and receive notifications of new posts by email.