Revisiting Setälä’s *pk

In 1907, E. N. Setälä published one of his last comparative linguistic works: [1] “Finnisch-ugrisches pk (~ βk)” (in FUF 6; nominally dated to 1906), on a minor addition to the cluster canon of Proto-Finno-Ugric. This was a follow-up to some discussion in early 1907 in Virittäjä by Paasonen and Setälä. [2] The idea has since then gone without much attention, either for or against. At least one of the proposed comparisons, supported also by Paasonen — Finnic *tukka ‘hair’ ~ Mari tupka, təpka (*tŭpka) ‘tuft, bunch’ — survives as late as Collinder’s Fenno-Ugric Vocabulary (1955, 1977: 63) and Comparative Grammar (1960: 87–88). Even this last case is, however, quietly dropped in later references, I think starting with Suomen kielen etymologinen sanakirja (tukka in vol. 5 from 1975) and absent also in the UEW. A look at the original work reveals that also cognates from Komi were proposed by Setälä: tup-jura ‘tuft-haired’, tup-jur ‘owl’ (= “tup-head”), tupka ‘owl’. Its removal does make sense, as by Collinder’s time it was already known that Komi /u/ normally does not correspond with Finnic *u ~ Mari *ŭ < PU *u (we would rather expect /ɨ/).

A cluster occurring only in one word could be surely deemed fairly uncertain, and other etymological directions seem to exist also for both the Finnic and Mari words. Before looking more into these though: what of the rest of Setälä’s data? He presents no less than 9 examples in his articles, which would be already more than there are examples of some regardless generally accepted PU clusters. I summarize the data below in a table (reordered, glossing simplified somewhat, some variant forms omitted):

‘to beat,
‘(to) kiss’Fi. suukko
SE tsiuku
N cuvkit
‘to smack’
(S–N *hāvkkë-
‘to suffocate’)
K. /šupkɨ-/
‘to throw’
‘to block’*tukkë-Lu–I *tëvkkë-K. /tupkɨ-/
‘to drip’Fi. tiukku-Ud. /ťopkal-/,
‘to beat’
(of heart)
Fi. tykki-Er. tykno-K. /ťopkɨ-/
‘to vomit’
K /ɨpkɨ-/
‘to sigh’
‘hair, tuft’*tukka*tŭpkaK. /tup/

The consonant center representation indeed looks fairly regular, especially Finnic *kk and Permic medial *-pk-. Reflexes elsewhere are more scanty, and in particular no Ob-Ugric data appears at all. Unfortunately, even besides this we have several reasons right off the cuff to suspect that these are not reliable etymologies.

  • No regular reflex is established for Hungarian. We have instead one case of p, one of k.
  • An abundance of onomatopoeia / ideophones or at least meanings susceptible to this kind of origin: ‘beat’, ‘kiss’, ‘drip’, ‘vomit’. Many would have parallel variants, e.g. Fi. sykkiä besides tykkiä.
  • Poor within-branch distribution is common: we have just Finnish in two of the Finnic cases, just Northern Sami in one of the Samic cases, and just Komi in five and just Udmurt in one of the Permic cases. Some could be supplemented by newer data though, e.g. Moksha does seem to have /təkna-/ ‘to beat (heart)’, and Komi /ťopkɨ-/ ‘to drip’.
  • Some lax semantics. In the 3rd I have no idea what the basis for the comparison between Finnic and Komi is supposed to be. The ninth is not very promising at all either: a bit off to begin with, and per the more detailed data of Moisio & Saarinen, kopka does not mean simply ‘plough’, but rather ‘flat center part of the plough, where the ploughshares are attached’, rather further away from ‘hoe’.
  • Onset mismatches; at least S. *c- and Mo. *č- (suggests PU *č) versus Permic *ć- and Hu. cs- (suggests PU *ć) in the first and second; Fi. t- ~ Permic /ť-/ in the fifth and sixth. General Samic /h-/ in the 3rd is also strictly a loanword consonant, and Setälä does proceed to propose borrowing from Finnic, but still before his proposed assimilation *pk > *kk.

More trouble still comes indirectly from finer details. For one, Permic morphophonology: though consonant clusters often simplify to their first member word-finally, there are no known cases with an alternation /-p/ : /-pk-/ as would be predicted to exist from this (including not in *ćup ‘kiss’). For two, the Mari stem structure CVCCA looks suspicious: 2nd syllable vowels are usually lost in nouns, and even when they do survive, it’s usually as Proto-Mari *-ə, not as *-a. Same in a few cases in Finnic: overheavy syllable structures like *suukko, *tiukku-, *öökkä-t- are not typical for native vocabulary. And even Mordvinic: unpalatalized /t/ + front vowel /i/ (with an allophone [ɨ] that Setälä notates as y) has no known native origin. So altogether a bunch of this data does not even look native.

Even after these observations, some basis for *pk could be perhaps still salvaged. But the death knell I think is the near-complete absense of any regularity in first-syllable vowels. Only one of the eight load-bearing Finnic / Permic comparisons has good parallels: uu ~ *u, regular from PU *ow. A few cases of y ~ K. /o/ are known too, but these have a conditional explanation: assimilation *e-ü > *ü-ü early on in Proto-Finnic. [3] Many other correspondences like Finnic *u ~ Samic *ā are also firmly irregular. Hence I will be happy to think that, yes, this article and its etymologies were in error and no cluster **pk is to be reconstructed for P(F)U.

What then of the existence of a cluster /pk/ in Mari and Permic? I think this is explainable, just by morphology rather than phonology. In Mari these should be clearly considered derivatives *tŭp-ka, *top-ka, with a reflex of the common PU diminutive suffix *-kka. This source of /pk/ is already clearly evident in other cases, e.g. lap : lapka ‘low(-lying)’, šapə : šapka ‘faded’. In Permic, then, note first that /pk/ is primarily attested in verbs. I would similarly segment here roots ending in /-p/, plus the PU momentane suffix *-kə-. This is not generally productive in Permic, but traces of it have already been identified in various cases (UEW even derives *ćapkɨ- from its *ćappɜ- ‘to hit’… *a > *a remains irregular though). This seems clear at least for ‘to drip’, where even /ťop/ ‘drop’ has been attested. The involved word roots, as mentioned above, do seem to be largely simply onomatopoetic.

One more Uralic variety is also known to have /pk/: Southern Sami, among this data only in hapkedh ‘to choke’, but other cases exist too. In Setälä’s view, /pk/ ~ /vhk/ (< *vkk) would be different generalized grades of an old alternation pattern *pk ~ *βk, but no direct evidence whatsoever exists of such an alternation. I wonder if a phonological solution could be still sought: *vkk > /pk/ might be an old regular sound change in SS.

One clear loanword, SS haapkie ‘hawk’, is alas not evidence for such a change. Other Samic reflexes like Lule hábak point to loaning already from Proto-Scandinavian *habukaz (→ PS *hāpëkkē + later syncope in SS), [4] not from attested Old Norse haukr (which could have yielded PS **hāvkkē). However, I suppose that also western Samic *hāvkkë- is still a loanword from Finnic; the source is just not Setälä’s *hukku- ‘to disappear, drown’, but rather *haukki- ‘to gasp for breath’ (+ other meanings). This has been attested from most of Eastern Finnic, e.g. Karelian haukkie, also dialectal Finnish haukkia; its standard Finnish variant haukkoa seems to be actually more narrowly distributed altogether (and thus younger?).

Still on the other hand, a development *pk > *vkk or maybe straight to /vhk/ would make more sense within the general dialectology of Samic, where we also have innovations like *šk > /jhk/ across all western varieties. [5] A likely intermediate looks to be *fk, which is actually the normal Kola Sami reflex of *vkk. Lehtiranta’s Proto-Samic reconstruction already takes this stance, giving PS *cōpkë- for a similar correspondence in SS tsuopkenidh ‘to break (intr.)’ ~ other Samic *-vkk-, e.g. NS cuovkut ‘to break (tr.)’. This would also allow supposing that attested cases of /vk/ in Southern Sami do continue PS *vkk and are not newer loans from other Sami varieties: from Lehtiranta we have jaavk-udh ‘to appear’ ← *jāvkkë- ‘to disappear’, raavkedh < *rāvkkë- ‘to demand (back)’. But then we face again the question of explaining the origin of SS /pk/. For *cōpkë- ‘to break’, the same approach as in Permic is perhaps not impossible: is it also a relict *-kə-momentane from an onomatopoetic root *cōp- < *čap(ə)-? Alas for hapkedh this will not readily work. Still for that matter it also shows short /a/ which does not match the cognates I’ve proposed to reflect Finnic ⁽*⁾haukki-… Extremely speculatively I could entertain an idea of this to be instead from a PS *θëp(pë)-, as a cognate of Finnish–Karelian *tüppe-htü- ‘to be extinguished, out of breath’ [6] that has indeed been amended with *-kə-; i.e. pseudo-PU *ďüppə-kə-?! Usually though this Finnic verb has been considered a parallel derivative to *tüpp-i- ‘to block, close’ which furthermore also has a known Samic cognate *tëppë- ‘id.’ > SS dahpedh (with /t-/, not /h-/ < *θ- < *ď-). I am not sure if the existence of words like Es. läppama ‘to choke’, NS lahppasit ‘to be out of breath’, Mordv. *ľäpija- ‘to choke’ are worth anything: they don’t correspond well with each other, but they could suggest an old ideophone of lateral + *pp for ‘to choke’, and my *ďüppə- could also fit under this pattern if *ď- had been originally lateral. But for now this is at best a stretch.

This post was originally inspired by some observations on a possible different etymological origin of one of the involved words… it would be, by now, however an entirely different tangent, and I may return to that topic instead later.

[0] In case this post seems like an excessive amount of effort to spend on forgotten crappy etymologies from 115 years ago, cf. further my older discussion of “anti-etymologies“. It is very possible that the poorness of these comparisons would not be apparent to some people happening upon Setälä’s work! They also have given me an opportunity to talk a little about some other topics that have been on my mind, such as /-kɨ-/ as a Permic verb suffix.
[1] In his later years he would be much more involved instead in the politics of newly independent Finland.
[2]Alkuperäisestä -pk-sta on suomessa tullut -kk-“; “Alkuperäistä -pk-ta ja sen heikkoa astetta edustaa suomessa -kk- ja -uk-.” (Neither currently available online, but perhaps in the future.) Setälä’s article also reports that the editing of FUF 6 had been finished, but the issue had not yet gone to print, by the time Vir. 11/1 appeared in late February 1907.
— TBH, to me it would seem like an amazing coincidence that both scholars had been planning to publish on the same minor sound change at almost exactly the same time. Since in our time it is known that Setälä in his later years had a track record of stealing discoveries from other scholars, I do have to wonder if he is here too trying to claim priority from Paasonen on the three comparisons he advances (those of tukka, tukkia, kokka) by sneaking a small article in at the last minute into his own journal. He did clearly come up with the idea of the correspondence F *-kk- ~ P *-pk- though: the comparison of suukko ~ cuvkit ~ K. ćupköd- appears already in his 1896 article on consonant gradation (SUSA 14).
[3] PF *lülü ~ K. /lol-/ ‘hard heartwood’; PF *süntü- ‘to be born’ ~ K. /sod-/ ‘to multiply’; PF *süttü- ‘to be ignited’ ~ PP *sɔtɨ- ‘to burn’; see recently Aikio (2021). To be fair, in the cluster of tykki- we do have Fi. tykyttää with cognates also in Karelian and Ludian; but also a morphologically primary-looking variant tykkä- is attested, stretching wider still to Ingrian, Veps and Estonian.
[4] Perhaps also not directly from Scandinavian, but thru Finnic *habukka (> standard & western Fi. haukka but e.g. eastern Fi.–Krl. havukka, Lu.–Veps habuk).
[5] Traditionally considered the defining innovation of a Western Samic subgroup, but I would agree more with a division into South–Ume versus Rest being older, as argued in recent times (future blog post on this perhaps coming).
[6] Inspiring also modern Finnish typpi ‘nitrogen’ as a back-derived coinage.

  1. Otso Bjartalíð says:

    I wonder if such a cluster instead lurks behind the rare correspondence “Finno-Permic” *-p(p)- ~ Ugric *ɣ/*w. Such across-the-board simplification would favor a ‘lighter’ combination, perhaps *-wp-, though.

  2. Christopher Culver says:

    It bears mentioning that Mari *tŭpka ended up on Agyagási’s “Late Gorodets” list of words of unknown etymology due to its Chuvash counterpart *tŏpka.

    • sansdomino says:

      If you would like the unembellished version re Mari: PII *stuHpa- ~ *stupa- > Sanskrit stū̆pa- ‘lock, tuft; mound; top; etc.’ seems to fit well as a loan source here.

    • David Marjanović says:

      The Topkapı Saray does not bear mentioning, but I feel an urge to mention it anyway. :-)

  3. David Marjanović says:

    It is very possible that the poorness of these comparisons would not be apparent to some people happening upon Setälä’s work!

    I approve, BTW. Of course “90% of everything is crud”, but even so, it is never good to let “we’ve been thinking this is crud for 100 years” become its own justification.

    Unrelatedly, I also approve of representing protracted barfing as *öökk-.

    In the 3rd I have no idea what the basis for the comparison between Finnic and Komi is supposed to be.

    …well, I guess if you throw something, you cause it to get lost… but then I’d expect a long chain of derivational suffixes somewhere, and that’s evidently not there.

    no cluster **pk is to be reconstructed for P(F)U.

    What is the inventory of reconstructed plosive clusters? *kt is reconstructed; is *kp?

    • Otso Bjartalíð says:

      *-kp- is not reconstructed either. Counting only *p, *t, and *k as plosives, securely attested clusters are *-pp-, *-pt-, *-tt-, *-tk-, *-kt- and *-kk-.

      • David Marjanović says:

        An interesting distribution that makes me wonder if *-tk- is an error or oerhaps *-tp- has been overlooked.

        • Proto-Uralic had more cases of *pp than *tt or *kk (these two are quite rare in PU words). So I think that PU *pp may partly go back to pre-PU *kp and *tp.

        • Otso Bjartalíð says:

          *-tk- is safely of PU dating, though all old cases might be explainable as consonant stem—nominal/momentane suffix combinations. The only PU suffix with a bilabial stop is *-pA ‘present participle’, of which no examples adjacent to *t are known. I wouldn’t outrule the possibility of such a sequence having been permitted but being untraceable due to idiosyncratic developments. Nonetheless, PIE seemingly also sported a decent number of *#TK-clusters whilst *#TP- was virtually absent.

          • sansdomino says:

            I’m not aware of any morphology that might be applicable to *kutkə ‘ant’ or *totkə ‘tench’.

            edit: and yeah, very right how the existence of *tk but not **tp also extends to PIE. Seems to go for many of proposed Nostratic groups really: ditto e.g. Kartvelian, Turkic, Tungusic, Yukaghir, Chukotkan… actually does this occur commonly in any Eurasian language family? Closest I’ve seen it is in Algonquian.

            • Otso Bjartalíð says:

              A nominal suffix *-kE could be found in PU *će̮rke̮ ‘gray, white’ ~ PU *će̮rå ‘id.’ (cf. Uralic Etymological Dictionary), though the stem vowel alternation is indeed also obscure.

        • sansdomino says:

          Seems robust to me: the number of known cases of *-tk- is not that high, but stem-medial *-tp- continues to be absent in all subgroups I think, even the ones that do innovate e.g. *-pk- as seen here (or variants; Selkup has a few words with *-pq-, also *-tq- *-qt- *-qč-). Basically any non-nasal coronal + *k is attested in PU actually, even *dk, *ďk … maybe another reason to think there is some productive suffixation behind them.

  4. Ante Aikio says:

    Lu–I *tëvkkë- also has a South Saami cognate with the cluster pk: dapkedh ‘slutte tett til, bli tett sammentrukket slik at stingene legger seg flatt inni (om søm)’.

