I’m only starting out on real scientific publishing (it looks like my first squib-size article, currently in peer review, will be out in early 2019), but during the years I’ve run this blog and worked on my thesis, I’ve already racked up a fair-sized publication plan and stack of article drafts. There will be roughly one for each of the various conference presentations I’ve given so far, maybe a dozen that would expand on various blog posts, and a handful of thesis work leftovers. Many others have not been announced in any fashion.

Looking at the far end of the list though, I think I’ve been tacking on also ideas that aren’t really research plans as much as things I wish someone would do. Many of them call for substantial background work, and in the foreseeable future of 5-10 years, they will be unlikely to fit on my plate. The following are free for grabs, if anyone reading by any chance happens to be looking for research project ideas:

  • An updated handbook on the history of Finnish — the last updated version of Hakulinen’s Suomen kielen rakenne ja kehitys came out in 1979, and a lot has happened since then. In particular the overview of native vs. borrowed components in the Finnish lexicon seems long out of date.
    — I would likely start on some component of this myself if nothing has happened by let’s say 2030, but that’ll be a while still.
  • A study of the lexicon of Kukkuzi Ingrian/Votic. Researchers have waffled back and forth on if this Finnic variety should be considered a variety of Votic with an Ingrian superstrate, or a variety of Ingrian with Votic substrate, mostly on phonological and morphological criteria. With the 2012 release of the extensive Vadja keele sõnaraamat, it should be possible to investigate if there is also an anomalous amount of vocabulary that’s either present in Kukkuzi and absent elsewhere in Votic, or absent from Kukkuzi but well-represented in the other Votic dialects.
  • Similar studies to the previous could be probably also done to check how coherent “Ingrian” really is (with or without Kukkuzi) — the main varieties are clearly delineated, and all show their own similarities varyingly also with Votic, Ingrian Estonian, the two Ingrian Finnish varieties (Savakko and Äyrämöinen), Southeastern Finnish, and Karelian. There could be other Kukkuzi-esque misanalyzed varieties mixed in here as well.
  • A comparative reconstruction of Proto-Hungarian, based on not just the philological Old Hungarian evidence but also the evidence of the various Hungarian dialects. Handbooks sometimes state that all the modern dialects could be derived from “Middle Hungarian” circa 1500–1600, but this is obviously nonsense at least in the case of the Székelys. Many other dialects could also have diverged earlier, only to be later assimilated back towards the mainstream. Loanword evidence would be also important (for one thing, they completely destroy the theory that Old Hungarian would not have had vowel length), and obviously Uralic ancestry would have to be kept an eye on too. — Sometimes the term “Proto-Hungarian” is used instead for the prehistorical pre-migration ancestor of Hungarian, but I cannot recommend this practice: this time depth is firmly within the “single-branch” phase of Hungarian and cannot be probed by the comparative method.
  • A study of substrate in Ob-Ugric. Mansi and Khanty gain their similarity from at least three sources: the two are related (minimally within Uralic), they form a common language area (as shown by isoglosses that only cover parts of both languages) and share later contact influences (most importantly from Komi, Tatar and Russian), but on archeological and anthropological grounds, an additional fourth source could be a pre-Uralic substrate of western Siberia (Helimski’s “Yugra”). What comes up if we apply modern methods of substrate language research to the two?
  • A comparison of the Ugric and East Uralic hypotheses. There is by now a good amount of data that has been collected purportedly in support of a common Ugric (Hungarian–Mansi–Khanty) group within Uralic; but it has been pointed out that the original and clearest point of evidence, the rearrangement of the PU sibilant system (traditional formulation: *s *š *ś > *θ *θ *s, later *θ > Hung. ∅ ~ Mansi *t ~ Khanty *ɬ) applies also to Samoyedic, leading to a larger grouping recently named “East Uralic”. This is the case for at least a few other features too. Does all this end up showing that either or both of these groups should be considered areal?
    Some other possible sub-angles include: is some of the common Ugric vocabulary better considered loanwords e.g. from Hungarian into Ob-Ugric? Can previously unidentified OU-Samoyedic cognates be found? How many of the commonalities could be potentially interpreted as shared retentions rather than shared innovations? How does the alleged Ob-Ugric subgroup compare with either hypothesis?
    — I will be doing at some point at least the related comparison of the East Uralic hypothesis with its clear opponent, the long-standing Finno-Ugric hypothesis (which, as far as I can tell, has always remained merely a glorified assumption that has never been studied in detail, either pro or con).
  • A bibliography of Indo-Uralic studies, either a simple list of works, or a more detailed breakdown by etymology. It would be interesting to see e.g. how much of the compared material across the times is individually reconstructible within the two families … there is sometimes “cherrypicking” of words from just one subfamily, and in at least some cases they turn out to be clearly better analyzed as loanwords from IE into Uralic.
  • Studies on the history of extensively spread areal sound changes. Two that come to mind easily are w > v, found pretty much everywhere between the Atlantic Ocean and the Urals; and p > ɸ > h/f, found across Eurasia roughly in a belt from Hungary to the Aleuts, as well as across most of Northern Africa plus Arabia. It is not clear to me if the two last-mentioned are really two separate areas, or rather just one, or perhaps more than two.
  • A look at what level of language Zipf’s Law follows — orthography, phonology, phonetics? (This could have been done already, I have not searched for this in detail.)
  1. Howl says:

    “A bibliography of Indo-Uralic studies, either a simple list of works, or a more detailed breakdown by etymology” That is one of the directions I would like to take with my Indo-Uralic paper. But there is so much work that still needs to be done for that paper.

  2. Blasius B. Blasebalg says:

    Just curious: How would you distinguish areal innovations from substrate with an unknown language, Yugra?

    Substrate can affect almost all aspects of a language; only in lexicon, there may be tendency towards a certain “corner”. But phonetically, syntactically, even morphologically, virtually everything can happen – because, by definition, every reasonable feature could have been present in that mysterious language lost on us.

    Incidentally, innovations that can propagate areally also comprise – everything.

    So how would you actually do anything but speculate?
    Or is there some assumption of the genetic alignment of Yugra?

    • j. says:

      Yes, this would be mostly through lexicon. I don’t think there are any substantial enough results yet on how to pick apart areal or language-internal innovations from substrate influence in other cases (though of course Johanna Nichols has some ideas).

      I won’t rehash the methodology for the study of substrate lexicon in detail here (for some sources, I wrote a small Wikipedia section on this a while ago), but in brief, the typical markers are specific semantics, irregular or unetymological sound correspondences, and general bulk of unetymologized material.

  3. David Marjanović says:

    a pre-Uralic substrate of western Siberia

    How far west do Yeniseian hydronyms go?

    p > ɸ > h/f, found across Eurasia roughly in a belt from Hungary to the Aleuts

    Well, the presence in Hungary is due to the introduction of Hungarian through a large area where [p] has been quite stable. On the other hand, Celtic (attested stages: 0, apparently some [h]) has done it, and of course Germanic (attested stage: exclusively [f]) as part of a large-scale reorganization.

  4. David Marjanović says:

    Whoa, thanks for that thesis!

