What’s important for what in historical Uralistics

A question from an email discussion, the answers to which I think would be interesting to others as well:

Are certain branches more valuable than others when it comes to their relevance to Uralic historical linguistics?

I cannot offer any kind of rigorous rankings; only my own impressions, and they will not be the most detailed or best-researched. But they will be something I hope.

1. Vowel phonology
To this day based primarily on Finnic, Samic and northern Samoyedic — which are also the languages that best preserve unstressed vowels and bisyllabic root structure, giving an additional good reason to think they might be more archaic than the others. Mordvinic fits well in with F&S, Mari is messier, Permic and Khanty are huge messes (but still far from terra incognita). Hungarian and Mansi behave reasonably well again IMO, but this is really a bit hard to see from published research when almost everyone insists on comparing them with Khanty in the first place and not with the rest of Uralic. Southern Samoyedic is quite simply understudied, usually treated as just an appendix to Northern.

2. Consonant phonology
This has a relatively even basis, the rough details for every language have been known already since the late 1910s. I guess Permic has retained the most phoneme-wise distinctions, and Finnic + Samic followed by Samoyedic and Mordvinic are the most important for reconstructing consonant clusters, but every language matters for something.

3. Inflectional morphology
Also a relatively even playing field with the rough details well-known. Samic and Samoyedic could be said to stand out somewhat for having fairly archaic possessive suffix and case systems. Hungarian has almost completely upturned its noun inflection and Permic is not too far behind, but even in these the verbs retain a typical enough Uralic shape.

4. Derivational morphology
There have been some general overviews in the past, but this is a topic that needs more work all around. Word derivation has been described well only for Finnic and Hungarian (individually; not for the two of them in comparison, and not even for Proto-Finnic). Tundra Nenets clearly has the third-best coverage thanks to Tapani Salminen’s A Morphological Dictionary of Tundra Nenets, but then to my knowledge this has not been worked into any kind of a historic framework so far. I have the impression there’s some good literature in Russian at least on Erzya and Komi? No idea how much of a historical angle they would have.

5. Lexicon
No contest here. Finnish is surely the lexically best documented language in the world, with the Dictionary of Finnish Dialects archives covering 8.5 million records across perhaps 350 000 lemmas (contrast with “only” 300 000 lemmas in the Oxford English Dictionary, or 570 000 in English Wiktionary, despite orders of magnitude more speakers). Lexical documentation on Finnic in more widely is mostly in a pretty strong shape too, and so is etymology within the family. (For decades now most progress has come from loanword research though.)

Samic and Khanty are the next-most important. Huge dialect dictionaries have been available for a long time, and there is also a lexical reconstruction of Proto-Samic as well as an etymological dictionary of Khanty. Mansi could maybe eventually join this club if someone were to assemble a single comparative resource from all the individual ones. Komi and Udmurt are also similar but less diverse. Mari and Hungarian have been well documented, the latter also researched, but they are frankly not very rewarding due to lots of loanwords. Mordvinic falls between these and Permic. Samoyedic is documented quite heterogeneously, and also all reasonably large dictionaries other than for Tundra Nenets are very recent, there is surely going to be a lot of fodder for Uralic etymological research there still.

6. Syntax
There are some good areal overviews, but then even the theory of syntactic reconstruction is not very advanced. (Towards this goal, I believe that a lot of internal reconstruction work has been accomplished that is kind of “hiding” in generative syntax, and is just waiting to be rewritten in actual historical terms… but this is generally almost all on languages other than Uralic.)

7. Anything else?
There would be some other but less strictly linguistic approaches to “historical Uralistics”, e.g. poetry, mythology, other ethnography; genetics and archeology even. This all falls outside my expertise however. The most I can say is that all this was a big part of the Uralic / Finno-Ugric studies paradigm a hundred or so years ago. Today much less so, now that we know that language, culture and genes are not quite as tightly bound together as romantic nationalists once assumed. Of course I still warmly support following such neighboring disciplines and continuing to integrate their results with linguistics where applicable.

Altogether then we end up with the three most diverse branches Finnic, Samic and Samoyedic (esp. Nenets) represented the best in research; Hungarian and Mari represented noticeably poorly, unless I have missed entirely some kind of major sources.

Tagged with: , ,
Posted in Commentary
62 comments on “What’s important for what in historical Uralistics
  1. David Marjanović says:

    570 000 in English Wiktionary

    The goal of Wiktionary seems to be to be a dictionary of all languages in all languages. en.wiktionary is in English, but it’s not limited to words that have ever been used in English.

    • j. says:

      Yes, I know. That’s specifically the number of English lemmas, the total number of entries (though a lot of this is inflected forms in various languages) is about 6.1 million. FWIW Finnish is the fourth-best covered language currently with 100K lemmas (very closely behind Italian and Chinese).

  2. Y says:

    The issue of robustness of reconstruction is one that IMHO is not covered well enough. Often a significant part of a subsystem of a language can be reonstructed only thanks to the fortuitous survival of one language, or even a fragment of a language.
    Which are the most significant “fragile” parts of the reconstruction of PU? That is, ones that are based on the evidence of one lucky surviving attestation?

    • j. says:

      That wasn’t really the question, but I can mention some points about that too. In overall phonology there would be the old reconstructions that proposed an *ś : *š́ (i.e. [sʲ] : [ɕ]) contrast supposedly retained only in Mansi, and *l͔ *n͔ (i.e. [ɭ ɳ]) supposedly retained only in Khanty, but then those have fallen out of acceptance already many decades ago. I don’t think anything else about the generally accepted look of Proto-Uralic has ever depended on literally one attestation, Uralic studies have tended to be a bit conservative about this due to the lack of early attestations. If anything, there was opposite problem circa 1990s when people like Ago Künnap were literally proposing that only features found everywhere in Uralic can be safely reconstructed for Proto-Uralic, and everything else should be suspected of being later contact/substrate influence (so no accusative *-m or *wetə ‘water’ because they’re missing from Khanty, no *kala ‘fish’ or nominative plural *-t because they’re missing from Permic, etc.)

      In individual etymologies with scarcer attestation, fragile details start to become much more common of course, e.g. Mordvinic *meńəľ ~ Hungarian menny ‘heaven’ reconstructs as *mińVl(V) and not *meńəl(V) only due to the short vowel in the latter; Samic *koaskē ~ Samoyedic *kåtå ‘(older) aunt’ reconstructs as *koska (in principle maybe also *koška) and not *kosa or *kota only per *-sk- in the former. I suppose in semantics this would come up a lot, though then to my knowledge no-one has done a systematic survey.

      Slightly less fragile but still not that robust features include the reconstruction of *-x- as distinct from *-k- (clearly based only on Finnic, with some not highly systematic evidence in Samic and Mordvinic), the reconstruction of coda glides (clearly based only on Finnic and Samic, in most cases with only some indirect evidence proposed elsewhere) and the reconstruction of *o with a later western shift to *u in some roots of the shape *CoCə (published arguments base this only on Samoyedic; I’ve been collecting some supporting evidence from elsewhere too). All this in phonology, since areal traits in morphology and lexicon have never been accepted as PU, only at most as Proto-Finno-Volgaic, Proto-Ugric etc.

      “Branchwise fragile” features would be a lot more common, e.g. the singular/plural possessed distinction in possessive suffixes is lost everywhere in Finnic except a small area of southeastern Finnish, or the *s/*ś/*š contrast collapses in Mari everywhere except the Malmyzh dialect, but systematic evidence from elsewhere puts these kind of features on a lot firmer grounding otherwise.

      • Y says:

        Fascinating! Thank you. I wish a summary like your main post and your reply to my comment were available as a matter of course in every survey of the historical linguistic of a language family. Is there anything like them for IE?

        • David Marjanović says:

          Yes, but scattered across the literature, never all in one place that I’ve noticed.

  3. M. says:

    Another subfield that seems pretty relevant here is historical semantics. A huge percentage of the etymological proposals in Uralistics (whether Proto-Uralic reconstructions, or IE –> Uralic etymologies) over the past several decades seem to hinge on comparative items that do not match semantically.

    (For example: the claim that Finn. ottaa ”take” is from a Germanic root meaning ”turn”, that vene “boat” is from a word meaning “wood”, that kausi/kaute- “period, season” is from a root meaning “pour”, etc. etc.)

    If a truly convincing case for these and similar etymologies is ever going to be mounted, I think that we will need a more rigorous type of argument than “here are 2-3 documented examples of the relevant semantic change (or a purportedly similar change)”.

    The relevance of this issue isn’t unique to Uralistics, of course – it probably comes up for almost every language family/grouping that has a lot of material of obscure/debatable origin.

    • j. says:

      All three of your examples, as it happens, have also other etymologies, and I lean towards thinking they are indeed mostly incorrect; maybe with the exception of southern Finnic võti(n) ‘key’, which despite synchronically looking like ‘taker’, could still be rather originally ‘turner’ (and if võtta- ‘to take’ then gets its irregular /v-/ from this is at least contemplateable).

      Much of this should surely be considered in connection with historical morphology too. A lot of work on Uralic etymology so far treats derivational morphology as just phonetic lego pieces stuck on to a root, even though each step also offers the opportunity to work out semantic changes. Vene might be a good example of this really. The alternate comparison is with the word family of Fi. venyä ‘to stretch (intr.)’, which while being on the face of it in no way closer than ‘wood’, is a reflexive verb in -y-; likewise the Mordvinic cognate *veńəməms ‘id.’ is a similative verb in *-mə-. This suggests that plain *wenə- should be taken as ‘to be long’, which can be further matched by a few Sami cognates meaning ‘to lie’ (I will sidestep for now the issue that most reflexes of seemingly underived PS *vënë- mean ‘to stretch’). This allows analyzing also vene < *veneh < *wenəš as ‘long object’, at least a little bit more specific analysis than Koivulehto’s etymology as basically ‘wooden object’.

      • M. says:

        A lot of work on Uralic etymology so far treats derivational morphology as just phonetic lego pieces stuck on to a root, even though each step also offers the opportunity to work out semantic changes.

        It’s interesting that you mention this. If I remember right, the attempt to etymologize löytää “find” as a causative derivative of lyöda (lyö-) “hit, beat” has been rejected (by some, at least) on semantic grounds, because “[causation]” + “hit” doesn’t clearly result in the meaning “find”.

        Though this is a valid objection, I don’t think we should always expect such semantic transparency from the causative suffix. In some cases, like kantaa > kannattaa, there isn’t even a clear difference in the valence of the verbs, that I’m aware of (the verbs involve a “carrier”/”supporter” and a “burden”/”supportee”, respectively), though perhaps a deeper knowledge of kannattaa‘s history and/or cognates would show that there once was such a difference.

        • j. says:

          In some cases, like kantaa > kannattaa, there isn’t even a clear difference in the valence of the verbs

          That’s true sometimes for sure, but yes, in this particular case deeper analysis has a very clear answer to give: kannattaa is not derived from kantaa ‘to carry’ at all, it’s from kanta ‘base, basis’, which are two unrelated roots despite later semantic bleeding (PU *kanə-ta- ‘to transport, carry somewhere’ versus *kënta ‘pillar, support’).

          • M. says:

            Hmm. :) I guess I’ve never been clear on how/when the -tta- suffix is appended to nouns. It at least doesn’t seem to be nearly as productive a process as with verbs.

            Perhaps tuoda : tuottaa would qualify as an example of what I was talking about? Semantically, though, it’s not as good an example as kannattaa would have been.

            • j. says:

              I have seen some semantically bleached -tta- derivatives too, but the most visible / prominent cases of semantic bleaching in Finnish verb derivation tend to be formal iteratives, frequentatives & reflexives, usually in cases where the base verb has been marginalized, either entirely or just in the original meaning. So e.g.

              • käydä (today mainly ‘to visit, be suitable, function’ etc. but still also ‘to walk’) → kävellä ‘to walk’, not ‘to walk about’ which is primarily käyskennellä or käveleksiä;
              • vierrä (dated) → vieriä ‘to roll’, the latter still with some shades of ‘to roll repeatedly’;
              • saadasaapua ‘to arrive’ (cf. marginal / poetic joulu saa ‘christmas arrives’).

              I guess one example of similar -tta-bleaching might be Old Finnish loppea ‘to end something’ → lopettaa ‘to quit, conclude’.

              Relationships of this sort often enough also appear comparatively within Finnic, e.g. Fi. uida ‘to swim’, ujua ‘to float, be aswim’ ~ Estonian ujuma, Veps ujuda ‘to swim’.

              • j. says:

                There’s also a different type of semantic bleaching, with parallel derivatives that show some derived semantics but seem to lack expected semantic differentiation from one another. An example that comes to mind is tuutia ~ tuutua where formally we might expect the latter to be reflexive ‘to be lulled’, but which are actually both transitive ‘to lull’. (Still not the best example though since the latter is maybe rather just a univerbation of the interjection tuu tuu used for lulling babies.)

  4. M. says:

    Could there be a similar explanation for the second -u- in puhua “speak”?

    I.e., is it thought to point to an originally onomatopoeic sequence *pu-hu-?

    puhua has never seemed particularly reflexive/passive to me, although it’s not hard to imagine that it was originally a fully intransitive verb.

      • j. says:

        Right, puhua also clearly looks like a semantically bleached passive / reflexive. There might be a couple way to analyse this as onomatopoetic too; but then there’s a fairly large number of Uralic cognates that point to a *pušə(w)- ‘to blow’, so any kind of onomatopoetic origin has to date back to PU times. (And note that Komi /puš-ky-/ moreover has a more exact cognate in Fi. puhku- ‘to blow hard, huff and puff’.)

        Phonologically it is interesting to note that Finnic has no *CV(V)hE- verb stems, I wonder if we could be dealing with *V₁hE > V₁hV₁ (which is after all regular in unstressed syllables, e.g. the illative). But then *CV(V)hE- nouns like ⁽*⁾lohi ‘salmon’, *voohi > vuohi ‘goat’ do nothing of the sort.

  5. M. says:

    Another general principle that I’m not sure is followed enough in Uralistics (and again, this criticism could no doubt be applied to many other fields):

    Before asserting that a given structural feature (a grammatical or sound pattern, etc.) was imported from a neighboring language, engage with some alternative possibilities as to its origin.

    For example, take the theory that the Finnic pattern of consonant lenition (kaute– : kauden, etc.) is due to a garbled “copying” of Verner’s Law in Germanic. Though this theory is not prima facie impossible, it seems convoluted enough that I can’t understand why anyone regards it as the “leading hypothesis” for the origin of Finnic lenition (as some seem to do).

    Word-medial consonant lenition doesn’t seem particularly rare, cross-linguistically (Spanish, Insular Celtic, dialectal Italian, etc.). Even though the Finnic version of it is somewhat distinctive, because it adds the criterion of syllable closure – a criterion that’s also completely absent from any version of Verner’s Law that I know of – there are various mechanisms through which this factor might have been introduced that don’t require the positing of any outside influence.

    (E.g. perhaps, at some key point in the history of Finnic, the presence of a coda affected the syllabification of the preceding onset, and this, in turn, affected the onset’s vulnerability to intervocalic voicing/lenition effects.)

    Have proponents of the Verner’s Law theory taken the time to evaluate such alternative possibilities (or even one such possibility) before drawing their conclusions?

    • j. says:

      For this particular theory, I think the general position by now in Fennistics is that its original proponents (all long dead) were indeed not accounting well enough for internal innovation. I would additionally think that consonant gradation in Samoyedic is a strong argument for its common Uralic origin in some form.

      There could still be a Finnic–Germanic connection in the realization of the “lenited grade” as spirants rather than stops, but that’s much weaker already, could be only counted as probabilistic evidence alongside various other possible but uncompelling Germanicisms (like *kt > *ht), and could be also due to a common third (substratal?) source rather than a direct connection.

      • M. says:

        possible but uncompelling Germanicisms (like *kt > *ht)

        Wow, some people consider that to be a Germanic-specific change?

        I always thought that the reason why e.g. modern Icelandic shows this change in newer clusters (cf. sekta [‘sex.ta] “to impose a fine”) was that it’s a phonetically likely development, but perhaps it’s actually Icelandic’s ancestral “Germanicity” reasserting itself. :)

        • j. says:

          I take it you have not seen Posti 1953, one of the more influential proposals of gradation being connected to Verner’s Law: he considers even developments like *č > *t, *š > *h or *ŋ > *w to be due to Germanic.

          I would think though that a change *kt > [xt] (which cannot be explained thru simple feature spreading) should be more probable in a language that already has a substantial number of fricative consonants, including in the coda; e.g. it does not seem entirely like a coincidence that this appears also in Greek only after φ θ χ become spirants.

  6. M. says:

    Wow, some people consider that to be a Germanic-specific change?

    Or, if not “Germanic-specific”, at least distinctive enough to be diagnostic of Germanic influence.

  7. M. says:

    Another broad trend worth examining:

    Some researchers seem to have the view that, for a given word/affix, any etymological proposal that meets a minimal threshold of plausibility (i.e., as long as it does not fly in the face of accepted facts) is preferable to a position of uncertainty about this word/affix’s origin – regardless of whether the proposal is actually strong/convincing in its own right.

    E.g., I once questioned another Fenno-Ugricist about the semantic solidity of the proposal whereby Finn. köyhä and its cognates come from Germanic *skeuG- (> English shy, etc.). In response to my questions, one of the main criteria he used to defend this etymology of köyhä was the fact that there were “no proposed alternatives” to it, i.e. no credible competing etymologies (at least that he was aware of).

    This evaluative framework makes no sense to me. ”Plausible” is not the same thing as ”probable”, and in the absence of a genuinely probable etymological hypothesis, it seems to me that the best available etymology is ”We don’t know”.

    In other words, ”Etymology unknown” is a stronger position than an unconvincing etymology, and this should be made clear in any discussion (e.g. in etymological reference sources) of words for which no probable etymology yet exists.

    (Granted, this issue may sometimes be difficult to separate from disputes about what makes an etymology probable/well-supported, which brings us back to issues like semantic laxity and when it is permissible, etc.)

    • M. says:

      Whoops, those “[“s and “]”s in my post should be “”s respectively.

      • M. says:

        Apparently, when I’m not forgetting to use angled brackets for formatting, I’m forgetting that this blog renders them invisible when I post them. :)

    • j. says:

      What an etymology being “preferrable” means depends on context, I think. Popular-science etymological accounts probably don’t need to mention many unclear suggestions. Scholarly reference works often should, if the problems are of a scale or type that could be overcome by future work. E.g. LägLoS ranks this etymology as “? Germ.”, and the popular etymological dictionary of Proto-Finnic vocabulary in Finnish that I’m working on (among other people) would probably end up unpacking this to something like “a Germanic origin has been proposed, but this is uncertain due to…”.

      The most difficult call is maybe when compiling etymological material for studying a different question of language history, such as historical phonology or morphology, since even poor etymologies can provide nonzero support for some follow-up hypothesis. Doing so with explicit caution is IMO better than either citing etymologies uncritically or passing them over in silence. But how much caution is the proper amount?

      Other other hand, one of the big lessons gathered in Finno-Ugric studies over the last 50 years (esp. in Finland) has been that loaning is the most common source of words that clearly lack an etymology by inheritance or derivation, especially for nouns and adjectives, and this might be what you have run into. A lack of anything resembling a native etymology (no distant cognate candidates, no resemblance with known derivative patterns, no evidence of any wealth-related vocabulary in Proto-Uralic) definitely isn’t evidence in favor of *keühä coming from Germanic in particular, but it does already improve the odds that it might be a loan of some age from somewhere. And every word does have to have some etymology ultimately… If you will, “etymology unknown” is sometimes a strong position but never a strong result.

      • M. says:

        Other other hand, one of the big lessons gathered in Finno-Ugric studies over the last 50 years (esp. in Finland) has been that loaning is the most common source of words that clearly lack an etymology by inheritance or derivation, especially for nouns and adjectives

        It seems to me that this conclusion could not have been reached without a significant relaxation of semantic criteria, among other things. (Cf. my initial comment on this post.)

        Maybe there is a rigor of some kind to this relaxation, but as long as that rigor remains obscure to me, I’m rather hesitant to accept “Unknown Finnic etymologies tend to be loans” as a premise to found future conclusions on.

        And every word does have to have some etymology ultimately… If you will, “etymology unknown” is sometimes a strong position but never a strong result.

        OK, but it bears emphasizing that etymological researchers don’t have any kind of “right” to strong results – particularly when they’re working in a field with meager attestation of ancient data, in which much of the truly “meaty” evidence was picked clean by the earliest scholars over a century ago.

        • j. says:

          You could be simply overestimating the number of semantically questionable loan etymologies proposed in recent decades; on the median they are far from the level of vene or kääntää. Many hundreds do not have much semantic problems of note to them, and the reason they’ve been discovered so late come either from requiring prior work in philology (based on Germanic etc. loan originals or meanings no longer found in the modern languages) or the history of the Uralic languages themselves (surfacing in seemingly dissimilar forms to their sources due to intervening phonological and morphological changes). BTW, on the oldest and clearly very speculative cases, you might enjoy checking out Simon (2020): Urindogermanische Lehnwörter in den uralischen und finno-ugrischen Grundsprachen: Eine Fata Morgana?

          By “clearly not inherited” I also do not mean just “unknown”. There are plenty enough words left whose semantics or root structure have nothing un-Uralic to them and where I would want to keep inheritance open as an option, or indeed plenty enough where I believe I have a new native etymology to present or new arguments to re-defend some old comparison. E.g. I did not mention it in a separate blog post yet, but in the latest FUF I have an article collecting a little over a dozen of the latter for Permic, i.e. old etymologies individually dismissed as “irregular” but actually forming a relatively coherent group when re-examined as a whole.

          • M. says:

            Regarding semantically questionable IE > Finnic etymologies, see my prior comments on this blog, passim.

            For example, joukko, heittää and kauko- seem to be pretty widely accepted as Germanicisms now, but their convincingness (i.e., their superiority to “Etymology unknown”) hinges on semantic leaps that I have yet to see justified.

            There are other etymologies from recent decades that don’t seem quite as widely accepted – such as syö(dä) < IE "chew" and juo(da) < IE "pour" – but the objections to which (that I've seen) are purely phonological, rather than focusing on the patent problem of the semantic gaps these proposals involve.

            and the reason they’ve been discovered so late come either from requiring prior work in philology (based on Germanic etc. loan originals or meanings no longer found in the modern languages)

            At a branch level (Germanic/Baltic/etc.), I’m not aware of any major advances in reconstruction over the past 100 years that would substantially increase the number of candidates for Finnic loan sources.

            In regards to general IE, it’s true that laryngeal theory has significantly changed the potential field of candidates. But I strongly suspect that even conservative laryngeal theory has been overapplied as a reconstructive tool in IE, and the use of laryngeals in recent Finno-Ugric etymologies inherits this lack of caution.

            E.g. the IE etymologies of ihminen, kesä and suku require the reconstruction of a laryngeal that is not directly attested anywhere to my knowledge, and has been inferred from phenomena that have straightforward alternative explanations.

            • j. says:

              I’m not aware of any major advances in reconstruction over the past 100 years that would substantially increase the number of candidates for Finnic loan sources.

              Correct, but then philology ≠ reconstruction. For one kind of a concrete example of what I mean, consider Koivulehto’s most recent posthumously published etymology which happens to be completely transparent on all fronts: Finnish kulli ‘penis, dick’, (Old Finnish) ‘testicles’ ← Middle Low German kull(e) ‘testicles, penis’. The only issue is that we have to look at specifically MLG or maybe descendants in e.g. colloquial Dutch to be able to propose the comparison. Gothic, Old Norse, Modern German, Modern Swedish and other such “key languages” of comparative Germanistics will be of no help at all. This is the “changes in data” you’re wondering about — it has been “there” all along, but for 150+ years no one has looked at it with Finnic loanword research in mind. There are simply too dang many Germanic varieties (and too few actually thorough etymological reference works) for any kind of “it would have been found already” arguments to have much evidential value in my mind.

              I think you are indeed right about loan etymologies for e.g. syödä and juoda being poor also semantically (e.g. the latter actually hinges on the proposal that PF *jooksë- ‘to run’ is a derivative from it, supposedly showing that its earlier meaning was ‘to flow’ also in Uralic). Though when semantic arguments are hard to do as rigorously, I suppose people have deemed it a bit unnecessary to go much into this when already phonological arguments work just fine for obsoleting them. The set of joukko, heittää and kauko- I’d also agree are not unproblematic (but also not too bad to be discarded completely).

              Still, any claim that not just some but a majority of the “new” IE loanwords in Uralic (or just in Finnic?) are specifically semantically dubious would require a review of the full evidence. Even “just” a review of LägLoS that tracks how well etymologies from a particular period have been accepted in it would be quite interesting. That’s already approx. 2000 etymological proposals though! As is stands the closest thing we have is Junttila’s 2016 PhD thesis very thoroughly reviewing the topic of old Baltic loanwords in Finnic (about 1000 cases). It clearly emerges that after a long dry spell, since the 70s there has been a steady influx of a couple dozen good Baltic loan etymologies discovered per decade again, found among a somewhat larger number of bad or disputed ones (including one particularly large concentration of bad etymologies from Liukkonen 1999). Among disputed etymologies there’s also about a third where the only other candidate is “no etymology”, so it’s clearly not as if anything goes in the absense of competing hypotheses.

              • M. says:

                My sense is that the cornerstone of most of the recent (post-1970) IE > Finnic etymologies is not the lesser-examined daughter languages (Low German, dialectal Baltic, etc.), but the deeper reconstructed ancestors (archaic stages of Germanic, unspecified “northwest IE” dialects, etc.), whose reconstructions can purportedly be tested and vindicated by Finnic data. But I admit that I haven’t systematically reviewed enough of the recent literature to be sure of this.

                Also, the fact I have questioned all of these IE > Finnic etymologies doesn’t mean that I think the Finnic terms in question (köyhä, etc.) must go back to proto-Finno-Ugric or similar. They could instead originate in one of the vanished substrates/adstrates that are responsible for making Finnic a distinctive branch.

                Or, do you see reason to doubt that such sub-/adstrates made any significant contributions to Finnic/Finno-Permic vocabulary?

                (In which case I suppose that it’s only a matter of time before all the unetymologized words shared by most Finnic languages – including (correct me if I’m wrong) such basic terms as kysyä, nukkua, vasta-, musta, kuuma, etc. – will be reclaimed from the orphanage by their (probably IE) parents?)

                • j. says:

                  There are basically no believable unknown substrate loans into Finnic as a distinct branch. The primary substrates of Finnic are certainly Baltic (for Proto-Finnic) and Samic (for later northern Finnic). My assumption would be that anything of unknown provenance that isn’t just unidentified ~native vocabulary was probably transmitted thru these (i.e. pre-IE → Baltic → Finnic or pre-Uralic → Samic → Finnic). That said, this is not exactly commonplace knowledge yet; the argument for Baltic as a substrate, due to Kallio, is only a few years old and its most explicit version has been stuck in editor limbo for several years now.

                  There probably is also an old layer of adstrate vocabulary that does not have to be routed thru any known language group, but this is old enough that it goes to the common West Uralic stage along the upper Volga already. The candidates are mainly identified by irregular or phonotactically novel cognates in the other Uralic branches (esp. Mordvinic and Mari), and they develop thru all but one or two of the oldest Finnic sound changes.

                  Putative “northwest IE” loans are an absolutely marginal proposal. The ones that are actually new (i.e. not rewritten versions of older Indo-Uralic proposals) and clearly distinct from both of later Germanic and Balto-Slavic can be counted on one hand.

              • M. says:

                Among disputed etymologies there’s also about a third where the only other candidate is “no etymology”, so it’s clearly not as if anything goes in the absense of competing hypotheses.

                OK, but there’s a wide gulf between “anything goes” and truly rigorous caution.

                For example, it’s possible to be quite phonologically rigorous, but have a largely hand-waving approach to semantic problems (“This is but a minor gap”, etc.).

                It’s also possible to recognize the need for semantic justification, but to overlook methodological criteria that can compromise the reliability of one’s evidence.

                For example, some IE > Finnic proposals involve appeals to the purported dialectal semantics of Finnic words in order to bridge semantic gaps that would otherwise stand in these etymologies’ way. The aforementioned IE etymology of Finn. joukko employs this kind of reasoning, and if I recall right, the currently popular IE etymology of Finn. pinta hinges on the claim that in some Finnic dialects, the equivalent of pinta has the sole or primary meaning of “sapwood”.

                However, even though dialectal evidence is (in principle) as valid as any other kind of evidence, it is a tougher task to narrow down the primary semantics of a dialectal word/phrase than to do the same with words from a widely used, well-codified standard language.

                Thus, if you are going to claim that “sapwood” is the uncontested primary meaning of pinta‘s dialectal relatives – that it doesn’t co-exist with any other common meanings, and that there’s no evidence of “sapwood” being an outgrowth of a more general meaning of “surface” – then it seems to me that you should be able to produce solid field data backing this assertion up. (That, or you should be able to refer to a very precise and comprehensive entry in a dialect dictionary – a dialect dictionary with unstructured entries, which makes no attempt to rank the different meanings of words, simply doesn’t cut it.)

                • j. says:

                  No major disagreement in principle, though your choice of example seems poor when Koivulehto’s treatise of pinta indeed considers the semantics in a fair bit of detail. Of course this takes a few pages of discussion, which I’m sure you realize cannot be included in every summary of or reference to the etymology.

                  Outright proving a negative like ‘sapwood’ not going back to ‘surface’ is however still not possible even in principle. Defaulting to just taking the standard language sense as primary is generally a bad idea, especially when dealing with relatively abstract terms and relatively recently standardized languages. It is even possible that “divergent” dialectal senses of words could turn out to be just homonyms, e.g. in this case it would be conceivable that ‘sapwood’ is indeed a Germanic loan but unrelated to general ‘surface’. (This much could be maybe supported by the fact that the proposed Udmurt cognate ped- shows no trace of any concrete senses and just means ‘out-, outer’ — though in turn I actually have also a different idea entirely for the etymology of this, which probably could not be made work for Finnic.)

                • M. says:

                  Responding to https://protouralic.wordpress.com/2019/10/16/whats-important-for-what-in-historical-uralistics/#comment-2972:

                  No major disagreement in principle, though your choice of example seems poor when Koivulehto’s treatise of pinta indeed considers the semantics in a fair bit of detail.

                  I actually have read that article, albeit the last time was a while ago.

                  Rereading it, I think I was right to call it a case where dialectal semantics (as reported by the author’s sources) are the lynchpin of an etymological claim.

                  This by itself doesn’t invalidate the claim, but as mentioned, it does raise questions about the supporting data — namely, how representative and complete this data is — that are harder to resolve than in the case of well-codified standard languages.

                  As long as we’re discussing the pinta article, I’m curious what you think about the following passage:


                  Pinta tavataan koko itämerensuomen alueella: karj. pinta, pinda ‘iho, maan pinta, pintapuu; (eläimen) lihavuus’; aun. pindu ‘pinta, pintapuu; lihavuus’ […]; lyyd. pind ‘pinta (puun, veden), iho (eläimen)’ […]; veps. pind 1. ‘pintapuu’, 2. ‘petäjän lohko, josta kiskotaan päreitä’ (Zaitseva—Mullonen 1972: 419) […]; vatj. pinta, pinnekez ‘paksupintainen’ (SKES); vir. pind ‘pinta, kuori; pintapuu (»Splint»), pintalauta’ (SKES; Wiedemann 1893: 821) […]; liiv. pinda ‘pinta; pintapuu’ = »äuBere Schicht, Baumsplint» (SKES; Kettunen 1938: 297). Kuten näkyy, ‘pintapuun’ merkitys on keskeinen koko itämerensuomessa.

                  I don’t understand how most of this data supports the article’s conclusion.

                  Other than the Veps term, this is a list of words that mean “surface” in general, or at least many kinds of surfaces — why is it remarkable that “sapwood” (i.e., the surface wood of a tree trunk) is among these words’ meanings?

                  Or, is the author (implicitly) arguing that “sapwood” doesn’t straightforwardly follow from the general meaning of “surface”, and therefore that the presence of the meaning “sapwood” requires a special explanation?

                • j. says:

                  I’m curious what you think about the following passage (…) why is it remarkable that “sapwood” (i.e., the surface wood of a tree trunk) is among these words’ meanings?

                  If you mean the other Finnic words brought up, I don’t think this is of any major importance for the broad validity of the etymology, it’s merely evidence on its dating. Both the Finnic words and the Germanic loan original have after all remained in basically the same shape all the way up to the present day. If the other Finnic evidence were not there, we could date this word just as well e.g. as a later Proto-Scandinavian loanword in Finnish only; but since it does exist, we’re better off considering this a Proto-(NW-)Germanic loanword into Proto-Finnic. Any problems of representativeness are to my mind already cleared by Koivulehto’s early statement that the sense ‘sapwood’ is indeed common in Finnish dialects, with reference to the extremely comprehensive SMSA (= Suomen murteiden sana-arkisto, the base data collection behind the ongoing Suomen murteiden sanakirja) and the observation that this is even the only sense that has been reported in 18th-century dictionaries.

                  As I see it, one important implicit argument of Koivulehto’s is not even as much as that the difference between ‘sapwood’ and ‘surface’ has to be explained, it’s merely that deriving ‘sapwood’ (in Finnic) from a word also meaning ‘sapwood’ (in Germanic) is a better explanation than deriving it nontrivially from a word meaning ‘surface’. Again, you could point out that ‘sapwood’ to ‘surface’ is an equally nontrivial semantic shift that Koivulehto has not motivated in much detail; but I fail to see how this affects that pinta in the sense of ‘sapwood’ or also ‘lard, subcutaneous fat’ can be trivially explained as one or more Germanic loanwords, once we do consider the existence of words like German Spint that allow reconstructing a PNWG *spinda- ‘sapwood; lard’. Demanding extreme rigor in semantics hence does not invalidate the loan etymology, at most it cuts the pinta words into two groups where one of them is still best considered a loan.

                • M. says:

                  Replying to https://protouralic.wordpress.com/2019/10/16/whats-important-for-what-in-historical-uralistics/#comment-2974:

                  Any problems of representativeness are to my mind already cleared by Koivulehto’s early statement that the sense ‘sapwood’ is indeed common in Finnish dialects, with reference to the extremely comprehensive SMSA

                  Let me expand a bit on what I meant by “representative”.

                  If you ask one speaker of Karelian what the word pinda means, he may say “the surface of a body of water”; if you ask another speaker, he may say “sapwood”; if you ask a third, he may state both these meanings and more.

                  As far as I can see, a dialect researcher needs to draw upon many different speakers’ accounts, and collate these accounts as necessary, to get a good picture of a given word’s semantics.

                  Thus, when we read something like “veps. pind 1. ‘pintapuu’, 2. ‘petäjän lohko […]’”, it seems highly relevant to ask whether this is just one speaker’s answer to a linguist’s question, or data carefully collated from a large sample of speakers.

                  Does Koivulehto’s reference to SMSA allow us to lay any doubts about this question to rest?

                  As I see it, one important implicit argument of Koivulehto’s is not even as much as that the difference between ‘sapwood’ and ‘surface’ has to be explained, it’s merely that deriving ‘sapwood’ (in Finnic) from a word also meaning ‘sapwood’ (in Germanic) is a better explanation than deriving it nontrivially from a word meaning ‘surface’.

                  I don’t follow.

                  If the derivation of “sapwood” from “surface” really is nontrivial, this means that the semantic difference in question does have to be explained, and the Germanic loan theory is one possible explanation (albeit not the only one).

                  By contrast, if we accept that “sapwood” and “fat” are meanings that straightforwardly follow from the meaning “surface” – that no semantic change is required in order to link them to “surface” – then how does the Germanic loan theory still have any explanatory advantage over its counterparts?

                  (To be clear: the above only applies to cases where the Finnic cognates of pinta mean “surface”. It doesn’t cover cases such as Veps, where the general meaning of “surface” is absent, at least according to this article’s account.)

                • M. says:

                  By contrast, if we accept that “sapwood” and “fat” are meanings that straightforwardly follow from the meaning “surface” – that no semantic change is required in order to link them to “surface” […]

                  This could use a bit more elaboration as well.

                  If a speaker sometimes refers to sapwood as pintapuu (“surface wood”), and sometimes as pinta (“surface”), this is not clearly a case of semantic change, whether a widening(“surface wood” -> “surface”) or a narrowing (“surface” -> “surface wood”). It could simply be a regular alternation, in which the longer expression (pintapuu) is sometimes clipped for brevity’s sake.

                  As I see it, in order for this to be a case of semantic change (requiring an etymological explanation such as Koivulehto’s), other conditions have to be met.

                  For example, if any given type of “surface” is not eligible for this alternation – if it is only seen with tree trunks, and perhaps a few other things like bodies of water – then this suggests (among other possible interpretations) that one of the more specific meanings is the older one, and that “surface” is a widening from this older sense.

                  Going back to the passage I cited earlier, it’s possible that all the listed languages (Karlelian, Ludian, etc.; Veps is a separate case) fulfill this and other conditions, but we can’t simply assume that they do. The issues I mentioned in the last comment (about the sourcing and representativeness of dialect data) seem highly relevant to the search for evidence in this matter.

                • j. says:

                  To #2977, #2983:
                  yes, SMSA covers several hundreds of Finnish locales (see list at SMS). I lost an earlier comment a few days ago to a WordPress bug, but for short, I did double-check a few dialect dictionaries of smaller Finnic languages (you can check out at least KKS right away too) and they do also show multiple locales, including also turning up a Ludic adjective pindaińe from two locales that means specifically ‘made of sapwood’. Not that this seems highly relevant to me however: having attestations from eight different Finnic languages already means we have at least eight separate informants anyway. Any probability that ‘sapwood’ would have been just an ad hoc unconventionalized usage drops exponentially with the number of informants and seems close enough to zero to ignore already after a handful of data points.

                  As for the (non)triviality of the semantic relationships: remember that puu does not equal ‘wood’, but also means ‘tree’. Hence we should indeed note that ⁽*⁾pintapuu and/or similar use of just ⁽*⁾pinta always turns out to mean specifically ‘sapwood’ (‘surface of wood’) and by all reports never ‘bark’ (‘surface of tree’)! Same goes for most of the usages meaning ‘lard’: starting from ‘surface’ we’d surely expect sian pinta to mean primarily ‘pig skin’.

                  I do not know what you mean by “the counterparts” (plural) of the Germanic loanword explanation. The Udmurt comparison would make one other etymology, but as far as I can tell no further explanations that reach beyond Finnic have been proposed.

                • M. says:

                  Replying to https://protouralic.wordpress.com/2019/10/16/whats-important-for-what-in-historical-uralistics/#comment-2986:

                  yes, SMSA covers several hundreds of Finnish locales (see list at SMS). I lost an earlier comment a few days ago to a WordPress bug, but for short, I did double-check a few dialect dictionaries of smaller Finnic languages (you can check out at least KKS right away too) and they do also show multiple locales […]

                  So when we read an entry in these sources about (e.g.) Veps pind, we can be fully confident that this reflects data from numerous locations?
                  (I don’t understand all the abbreviations in your linked list, so I’m not sure if it covers Veps.)

                  As for the (non)triviality of the semantic relationships: remember that puu does not equal ‘wood’, but also means ‘tree’. Hence we should indeed note that ⁽*⁾pintapuu and/or similar use of just ⁽*⁾pinta always turns out to mean specifically ‘sapwood’ (‘surface of wood’) and by all reports never ‘bark’ (‘surface of tree’)! Same goes for most of the usages meaning ‘lard’: starting from ‘surface’ we’d surely expect sian pinta to mean primarily ‘pig skin’.

                  Valid point, but I’m not sure we would necessarily expect “skin” (etc.) to be the primary meanings. Insofar as there were pre-existing words that specifically meant “bark” and “skin (of an animal)”, a blocking effect may have come into play.

                  Also, is it actually the case that the meanings “bark” and “skin” are absent where we’d expect to find them? For example, “iho” is the second listed meaning in the entry for pinta that you cited from the Karelian dialect dictionary.

                  I do not know what you mean by “the counterparts” (plural) of the Germanic loanword explanation. The Udmurt comparison would make one other etymology, but as far as I can tell no further explanations that reach beyond Finnic have been proposed.

                  As mentioned in my argument that started this current discussion, “Etymology unknown” is another entirely valid alternative to the Germanic proposal – unless the evidence for the Germanic proposal is so compelling that “Etymology unknown” would have to be rejected by any reasonable standard. (E.g. if “Etymology unknown” could only be sustained by applying a standard so stringent that it would require relationships such as Eng. mother : Latin mater to be called “uncertain” as well.)

                • j. says:

                  To #2987: obviously it’s not certain that every single word in every single language that has been quoted in literature will be found in multiple locales. When an established scholar explicitly states that some particular sense is common across Finnish dialects though, I do think that we should expend enough goodwill to assume that it has been indeed attested in multiple dialects.

                  If you are interested in Veps in particular, there I cannot help you: it’s the one Finnic language that still could use a comprehensive dialect dictionary, and I do not have any of the narrower dictionaries of written Veps or particular dialects around to see what their compilation methodology has been exactly. My starting assumption would be that some little-edited dialect vocabularies might have words in them from just one informant, but dictionaries of written Veps generally would not.

                  General ‘skin’ is indeed present to some extent, what seems to be absent / taken instead by ‘lard’ is specifically ‘pig skin’.

                  All minor details aside: “origin unknown” is of course a possible fallback option, but it’s very definitely not an explanation. So when you ask how the Germanic theory has explanatory advantage — if it has any explanatory power at all, then it ipso facto has more explanatory power than “origin unknown”.

                  It should be sometimes possible to disprove formally flawless etymologies just by semantic considerations (I might be able to think of a few examples if I give it some time), but I think that will require strong directional evidence, of the kind where a sense A could give rise to but not be derived from a sense B.

                • M. says:

                  Replying to https://protouralic.wordpress.com/2019/10/16/whats-important-for-what-in-historical-uralistics/#comment-2988:

                  When an established scholar explicitly states that some particular sense is common across Finnish dialects though, I do think that we should expend enough goodwill to assume that it has been indeed attested in multiple dialects.

                  I’m not accusing these dictionaries’ compilers of dishonesty/sloppiness. I’m just not sure that dialect dictionaries, in general, are compiled with the same ends in mind as the dictionaries of codified standard languages.

                  A standard language’s dictionary is generally meant as a practical guide for people trying to communicate in that language. Thus, considerations like the primary (vs. secondary/tertiary/etc.) meanings of words are generally given a lot of attention in such dictionaries.

                  By contrast, a dialect dictionary is meant as a snapshot documenting how certain people use their language, and isn’t necessarily compiled with the aforementioned practical dimension.

                  If you are interested in Veps in particular, there I cannot help you: it’s the one Finnic language that still could use a comprehensive dialect dictionary, and I do not have any of the narrower dictionaries of written Veps or particular dialects around to see what their compilation methodology has been exactly.

                  The reason why I’ve continued to mention Veps is that, of the Finnic languages cited by Koivulehto, it’s the only one for which a general meaning of “surface” (or at least several kinds of surfaces) isn’t mentioned, only the single, specific meaning of “sapwood”.

                  Granted, if we accept Koivulehto’s citations of old Finnish dictionaries (compiled when the attestation of Finnish was comparable to that of today’s dialectal Finnic) at face value, then Finnish pinta also referred purely to sapwood until some time in the late 18th/early 19th century – but this claim merits careful investigation.

                  All minor details aside: “origin unknown” is of course a possible fallback option, but it’s very definitely not an explanation.

                  “Etymology unknown” is not a fallback option, it is the default option: all etymological proposals are, by definition, in competition with it.

                  It should be sometimes possible to disprove formally flawless etymologies just by semantic considerations (I might be able to think of a few examples if I give it some time), but I think that will require strong directional evidence, of the kind where a sense A could give rise to but not be derived from a sense B.

                  I’m not sure I catch your drift.

                  Are you saying that, in order to cast doubt on a phonologically unproblematic etymology like the Germanic pinta proposal, I’d have to show that the meanings “sapwood”/”animal fat” couldn’t plausibly transition to the meaning “surface”?

                  Not only is that setting a very high bar (requiring me to prove that this semantic change is effectively *impossible*), but it also seems to place all or most of the burden of semantic argumentation on the doubter of the etymology, while demanding minimal (if any) semantic rigor from its proponent.

          • M. says:

            Small edit to the last comment: instead of “their convincingness”, I should have written “the convincingness of their Germanic etymologies”.

            Also, maybe I should rephrase some parts of my response as questions:

            You could be simply overestimating the number of semantically questionable loan etymologies proposed in recent decades; on the median they are far from the level of vene or kääntää. Many hundreds do not have much semantic problems of note to them,

            Would you consider the aforementioned Germanic etymologies of joukko, heittää and kauko- to be examples of proposals with no notable semantic problems?

            the reason they’ve been discovered so late come either from requiring prior work in philology (based on Germanic etc. loan originals or meanings no longer found in the modern languages)

            What advances have there been in Germanic/Baltic/etc. etymology over the past ~50 years that justify the proliferation of IE > Finnic loan etymologies during this time period?

            I.e., why have recent Fenno-Ugricists been able to extract so many new loan etymologies that their predecessors of 100+ years ago couldn’t, if there have been no substantial changes (in the intervening century) to the data they’re working from, other than perhaps laryngeals, nor any relaxation/changes to the basic methodology used on this data?

      • M. says:

        All minor details aside: “origin unknown” is of course a possible fallback option, but it’s very definitely not an explanation

        This also seems incorrect, now that I think about it. “Etymology unknown” is just as much of an explanation as any Germanic/Proto-Uralic/etc. etymology: it’s equivalent to saying, “This word comes from somewhere, we just don’t know exactly where.”

        Compare this to the case of (e.g.) porsas, where we can determine that it comes from the same source as Spanish puerco, English farrow, etc., but we can’t (as far as I’m aware) determine the precise branch affiliation of the donor language. This is the same type of non-specificity seen in “etymology unknown” – only the degree is different.

        I don’t think it’s trivial to point this out, because your objection to ”Etymology unknown” (throughout this exchange) seems to depend heavily on the idea that it can’t even be formulated as an explanation, when in fact it quite easily can.

        • j. says:

          Indeed, I depend on this idea, and I hold that “comes from somewhere” is likewise not an explanation. All words come from somewhere of course (at least if “somewhere” includes also outright creation mechanisms like derivation or coining) (and this general fact probably vaguely counts as an explanation of the likewise very general fact that words exist at all in the first place). But to even remotely count as an etymology, we would need to be able to pin down at least some further details: the affinity of the source language, its geographic area, something about its earlier shape or meaning, some cognates, etc. “Zero information” is not just a difference in degree from “some information”.

          Imagine an etymological dictionary where the only explanation given for every word was “it comes from somewhere, no idea where”. Would you consider this to be even remotely informative about the history of the language?

          • M. says:

            But to even remotely count as an etymology, we would need to be able to pin down at least some further details

            Most (if not all) cases of “Etymology unknown” do contain further unexpressed details: e.g. in the case of Finnic, even if a word’s etymology is unknown, it is very unlikely that it comes from a Mon-Khmer language, a vanished close relative of Basque, etc.

            (I still don’t think that these further details are required in order for a proposal to qualify as an etymology, but even if I did concede this, it doesn’t disqualify “Etymology unknown” in actual practice.)

            Imagine an etymological dictionary where the only explanation given for every word was “it comes from somewhere, no idea where”. Would you consider this to be even remotely informative about the history of the language?

            Yes.

            Hearing (from a reputable dictionary, or similar source) that all of a language’s vocabulary is of unknown origin would tell me a great deal: namely, that etymological research has been done into that language, and has failed to turn up any definitive results. The language in question is therefore an extreme isolate (until further notice).

            Unless I’m missing something, you’ve basically described what an etymological dictionary of the ancestral Basque vocabulary (i.e., the vocab that can’t be straightforwardly traced to Latin/Spanish/etc.) would look like.

            • j. says:

              General knowledge about a language’s position in the world could exist but still does not provide an explanation for any particular word in it. Basque is not the gotcha you might have been angling for, either. It does have several etymological dictionaries written for it and turns out to have loads of words that are etymologizable as derivatives or fossilized compounds, which sometimes also has internal reconstruction implications for their parent words.

              What’s the point of this sophistry anyway? Surely you understand what “explanation” normally means; if you ask someone to explain how a device works or how to accomplish a task or how to get from place A to place B, you would not be satisfied with the answer “I don’t know”, nor would you accept insistence that this is the actual explanation and that there simply is nothing to know.

              • M. says:

                General knowledge about a language’s position in the world could exist but still does not provide an explanation for any particular word in it.

                I’m baffled by your use of the term “explanation” here.

                It seems that the following is an explanation by your criteria:

                “The word porsas is from an unknown relative of Latin porcus, Lithuanian paršas, etc.”

                but the following somehow isn’t:

                “The etymology of the verb kysyä beyond Finnic is unknown, though it is probably from Uralic, IE, an extant Siberian family, or a vanished substrate/adstrate”

                Can you clarify what the second of these is missing that disqualifies it from being an “explanation”?
                [You may know something about the origin of kysyä/porsas that I don’t, but hopefully you see what I’m getting at with these examples, regardless.]

                Surely you understand what “explanation” normally means;

                No, as detailed above, I don’t think I understand how you’re using it here at all.

                if you ask someone to explain how a device works or how to accomplish a task or how to get from place A to place B, you would not be satisfied with the answer “I don’t know”, nor would you accept insistence that this is the actual explanation and that there simply is nothing to know.

                These are examples of pragmatically significant questions, which differ in many ways from a (largely) speculative question like “What is the origin of word [X]?”

                For example, with pragmatically significant questions, there are criteria for what constitutes a satisfying or unsatisfying answer. Some answers are too meager, others are excessive, etc.

                With questions such as “What is the origin of this word?”, such criteria don’t apply, as far as I can see, unless you have a very specific goal in mind.

                (E.g. perhaps you’re trying to collect examples where a certain phoneme can be reconstructed, and the reconstruction of a given word could either succeed or fail to contribute to this purpose.
                In that case, though, “What is the origin of this word?” wouldn’t be the most careful formulation of the question to begin with.)

                • j. says:

                  Yes, it’s missing any information about the history of kysyä in particular. From your answer we don’t know if the *s is from a Middle Proto-Finnic *s or *ś > **ć, we don’t know if the 2nd syllable *ü is the reflexive-passive suffix or from some different source, we don’t know if it has had the same meaning for ages or at some point shifted from something else like ‘to wish’ or ‘to call (for)’… All of these would have answers to them in principle, and all of these would be a part of a complete etymology for it. “An etymology” is not just a statement of a word’s origin language, it’s an analysis of all of its facets. Going from “how is this word explained” to “what is the origin language of this word” is already moving the goalposts. Besides, you always need some of these details anyway to draw any kind of a claim about a word’s source language, since individual meanings / suffixes / phonemes are not language-specific. I don’t think it’s ever possible to conclude that a word e.g. comes from Germanic (or doesn’t come from Germanic) without giving it any Germanic (or non-Germanic) cognates (attested or reconstucted).

                  In the case of porsas you’re of course right that it has not been completely explained either, since the etymology disappears for a while into a jungle of unattested early Indo-Iranian varieties. I really wouldn’t know for sure if it underwent *ć > *ś or even *ś > *s in Finnic or in the loangiving language (though I do know it did have a *ć at least back in Proto-Indo-Iranian and in the lineage leading to the Mordvinic reflexes). The lesson to draw from this though is not that these are entirely optional details, it’s that if a word has an etymology exists on a spectrum.

                • M. says:

                  “An etymology” is not just a statement of a word’s origin language, it’s an analysis of all of its facets.

                  *All* of its facets?

                  By that logic, the above-discussed proposal regarding pinta is neither an “explanation” nor an “etymology”, since it is silent on the question of whether pinta‘s “t” is descended from a word with apical or dental t, whether it comes directly from Germanic or from an intermediary language, whether the initial syllable’s vowel was ever nasalized, what stress pattern it had when it was first acquired, etc. etc. etc.

                  I completely fail to see where you are getting your checklist from here.

                  To be clear, I recognize that there are standards typically applied to etymologcal proposals, e.g. when one is deciding whether to submit or accept them for publication. But that’s a question of the quality of etymologies (their non-triviality, the amount of new information they contribute, etc.), not of whether they count as etymologies to begin with.

                  The lesson to draw from this though is not that these are entirely optional details, it’s that if a word has an etymology exists on a spectrum.

                  I’m not sure what you meant to say with the last part of this.

                  If you’re saying that there is a spectrum of completeness for etymologies, then how is that in disagreement with what I’ve been saying?

                  And how is e.g. my statement about kysyä‘s origin (insofar as it’s an accurate statement of the evidence we currently have) not a point on this spectrum?

                • j. says:

                  *All* of its facets?

                  Qualitatively, yes. At minimum an etymology needs to have something to say about the following four:
                  – a word’s meaning (we cannot etymologize kysyä from Portuguese queijo because ‘to ask’ is way too many semantic connections off from ‘cheese’);
                  – its morphology (justify the segmentation proposed in case it relies on just a shorter root comparison; -y- is a known verb suffix and so could be readily stripped off if needed);
                  – its phonology (we cannot etymologize kysyä as cognate with Hungarian kérd(ez) because s does not correspond with rd);
                  – and if we propose loaning, the contact situation (we also cannot etymologize kysyä from queijo before having any reason to assume contact between Finnic and Ibero-Romance); mutatis mutandis also for novel genealogical relationships.

                  Almost always these cannot be known in theoretically maximum detail of course. E.g. we cannot know what exact harbor was the original one that Germanic speakers called *staþa- and Finnic speakers adopted as *satama, nor can we work out the exact month or year or decade that this happened; we cannot know if the shift ‘shore’ > ‘harbor’ happened within Finnic or the a loangiving variety of Germanic; we cannot know if or how long did bilingual Finnic speakers maybe maintain some form with *st-; we cannot know when exactly was it suffixed from *s(t?)ata to *s(t?)atama, nor how heavily this was motivated by homonym avoidance with *sata- ‘to rain’. Maybe you are some sort of a nihilist who wants to claim that this lack of perfection already makes an etymology so unreliable that it should be discarded, in favor of calling satama to be just of unknown origin. But if you do so, I ask that you do so consistently, not just at some single etymology or single layer of etymologies that you apparently have pre-decided to dislike.

                  how is e.g. my statement about kysyä‘s origin not a point on this spectrum?

                  It is a point on the spectrum, it’s just the point of zero information which is at most a degenerate technical sense of “an etymology”. This is a classic version of the continuum fallacy. We might not have an exact threshold for how many rocks it takes to constitute a “pile of rocks” or how many details it takes to constitute “an etymology”, but pragmatically this still doesn’t give you any grounds to speak of zero rocks, or even merely one rock, as a pile.

                  Anyway, if you check SSA or UEW, we do know that kysyä comes at least from West Uralic *küśə-: it has a regular cognate of the same meaning in Samic (*këčë-) plus an at least consonantwise regular cognate in Moksha (/kiźə-fťə-/). Your “etymology” would never have pointed us to these conclusions, if taken as a license to assume that whatever was not yet found at point X will never be found.

                • M. says:

                  At minimum an etymology needs to have something to say about the following four:

                  Respectfully, I’ve never seen a definition of etymology that requires this level of stringency, and it is very easy to find dictionary definitions that comport with the kinds of examples I’ve given.

                  But if you think that only proposals that meet your criteria should be called etymologies, then fine – it doesn’t seem worthwhile to continue this particular debate.

                  Your “etymology” would never have pointed us to these conclusions, if taken as a license to assume that whatever was not yet found at point X will never be found.

                  I never said that it should be taken as such a license, and I apologize for whatever I may have said to give that impression.

                  My core point, throughout this exchange, has been that “Etymology unknown” is a SUPERIOR position to an unconvincing etymology, and that one should not prefer flimsy etymologies over “Etymology unknown” just because they seem more interesting, etc.

                • j. says:

                  My four-way analysis here of the necessary aspects of an etymology follows Junttila’s thesis on the Baltic loanwords in Finnic. It’s less of a definition and more of an observation on what kind of arguments are advanced in etymological research at all (and I should admit that the phonology/morphology distinction is fuzzy at times, he combines these under a single main category of “formal arguments”). Many of them can be of course left implicit or highly abbreviated. A half-a-line statement like kala < PU *kala ‘fish’, or Kalb < PIE *gʷolbʰo- ‘womb’ already suffices to cover all bases. The point is not in the amount of detail on any of these aspects but if there is any claim made about them at all.

                  Junttila does introduce an IMO valuable distinction between “etymologies” versus “claims of origin” (alkuperäväite) where the latter fail to give anything resembling or pointing to a source form, contrasted also with a “comparison” where no claim is made about the reason of the resemblance. At its most basic an etymology is per this view a comparison + a claim of origin. This is probably more relevant in loanword research since at least two directions of loaning are always possible. Any comparative research within a language family kind of already carries the implicit claim of origin from a common ancestor (and some implicit claims of sound correspondences, etc.)

                • David Marjanović says:

                  My core point, throughout this exchange, has been that “Etymology unknown” is a SUPERIOR position to an unconvincing etymology

                  “Etymology unknown” has a greater chance of not being disproved than an unconvincing etymology does.

                  But that doesn’t make it superior. Historical linguistics is a science. If you’re just stating a null hypothesis and leaving it at that, you aren’t doing science. List the alternative hypotheses, test them, and ideally list their p values.

  8. Y says:

    Is the statement “etymology unknown” about a word equivalent to saying that a language or a language group has no known relatives? I’m fine with either.

  9. M. says:

    “Etymology unknown” has a greater chance of not being disproved than an unconvincing etymology does.

    But that doesn’t make it superior.

    The issue here isn’t disprovability — it’s probative weight/value.

    If we can agree that a given etymological proposal is unconvincing, then this means, *by definition*, that the proposal’s probative value is insufficient, at present (i.e. until/unless the proposal is updated with new evidence or argumentation).

    This in turn means that any etymological entry for this word ought (until further notice) to state its origin as “Unknown”, regardless of what other details it includes about the proposals that have been made so far. (And I’ve never said that such proposals shouldn’t be made/pursued.)

    • M. says:

      And as far as what I mean by “(un)convincing”, see my earlier remark upthread:

      E.g. if “Etymology unknown” could only be sustained by applying a standard so stringent that it would require relationships such as Eng. mother : Latin mater to be called “uncertain” as well.

    • David Marjanović says:

      I don’t understand what you mean by “probative value”; I’m a biologist and haven’t encountered the concept. If “unconvincing” isn’t subjective, what exactly do you mean by it?

      • j. says:

        As far as I use the term, “probative value” is always instrumental: an otherwise strong etymology might lend some nontrivial amount of support also to (if we’re being exact: the parsimony of) some result of classification, historical phonology, the reconstruction of a contact situation, typology of semantic change, etc. Some others might be in an awkward middle territory though to be the most believable explanation so far but not much support on most or any other purpose.

        (These are also results that would need to be themselves retooled or retired as soon as something in their “logical neighborhood” changes. FWIW I feel a lot of mis-steps in fields like historical linguistics with loooong inferential chains come from not properly recognizing these cases: not from building on sand per se, but from Wily E. Coyote-like not noticing that the rock is not there anymore and the foundation is now on sand.)

        But I think I would take more issue with the view that “an etymological entry” by default should state only whatever is the current best understanding. It is vital to note any important competing or partially weak proposals, if they exist. Even the one-word extreme synopsis often should not be “unknown” (“we have no idea”) but rather “unclear” (“we do have ideas but they’re not worked out to satisfaction”).

      • M. says:

        I don’t understand what you mean by “probative value”;

        In this case, it refers to whether a proposal merits being called convincing or not.
        (And also how close a proposal comes to being convincing, to the extent that this is measurable.)

        To expand on how I’m using the term “(un)convincing”:

        In historical linguistics, there’s a large body of uncontroversial relationships (such as the core IE cognates, and the cognates that establish the relationship between Finnish and the other Finnic languages, between Old Japanese and modern Japanese, etc.) that I think serves as a reasonable basic threshold for what can be called convincing.

        (By “basic threshold”, I don’t mean that a proposal must meet this standard in order to be convincing (that’s still an open question as far as I’m concerned), but that if a proposal does meet this standard, it’s reasonable to call it convincing.)

        It may not be an indisputable principle that the aforementioned relationships are convincing, but I don’t see how one can reject this principle without rejecting all the findings of historical linguistics as well – because historical linguistics, as we know it, is based on a shared acceptance that certain kinds of systematic correspondence (as exemplified by the core IE cognates, etc.) are too compelling to be coincidences.

  10. M. says:

    In historical linguistics, there’s a large body of uncontroversial relationships (the core IE cognates, the cognates that connect Finnish and the other Finnic languages, etc.) that I think serves as a reasonable threshold for what can be called convincing.

    An example of this yardstick in action:

    If one claims that the relationship between Finnish nuoli and Estonian nool (both meaning “arrow”) is “unconvincing” because the first has a final vowel and the second doesn’t, one is applying a standard that would also unsettle the relationship between Latin piscis and Eng. fish, between Spanish madre (disyllabic) and French mère (now monosyllabic), etc.

    By contrast, if one objects to (e.g.) the proposed relationship between Finnish köyhä “poor” and English shy/German schüch(tern)/etc. on the grounds that the two groups of words don’t match semantically, then one is in the clear: the uncontroversial relationships of historical linguistics have been established based on semantically matching data, not on unsystematic conjectures about semantic change.

    There may be some other solid line of argumentation that would connect köyhä and shy/etc., but at least the semantic objection doesn’t run afoul of the aforementioned standard for when “unconvincing” can be used.

Leave a reply to M. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.