Linkday #1: On computational phylogenetics

I think I’d like to have more content up on this site, despite being tied up with studies and life’s other little distractions from research. Showcasing some interesting articles might work for that, even when I don’t have detailed critique to offer myself on their topics. (There might be some drift away from exclusively Uralic topics, while I’m at that.)

For starters, I’ll bring up a post from the evolutionary linguistics blog Replicated Typo: Reconstructing linguistic phylogenies — a tautology?

This is a relatively old post (2011), and yet captures a number of problems that I keep seeing in computational typological studies.

Two comments, though:

  • The diffusability of sound changes across lineages means that no, establishing a sound correspondence is not quite the same thing as establishing phylogeny. After all, if reconstructing a proto-language automatically also generated a phylogenetic tree of its descendants, there would likely be no need for computational phylogenetic studies in the first place!
  • I’m not sure if I agree that identifying words as cognates presupposes that the languages they occur in are related. There’s a narrower and a wider sense of “cognate” out there: the first is indeed restricted to words related by common descent — but when we’re talking about more hypothetical relationships, the word can also mean “of the same origin thru some means, possibly but not necessarily involving borrowing”. A typical example would be the Uralic and Indo-European words for ‘water’ or ‘name’, for which there is a widespread consensus for some kind of a relationship, but two different camps on how they should be explained.
Tagged with: , ,
Posted in Links, Methodology
8 comments on “Linkday #1: On computational phylogenetics
  1. Capra Internetensis says:

    Somewhat tangentially, is there any quantitative way of estimating how confident we should be that languages are related/sets of words/features are cognate? Of course after a certain amount of linguistic change the evidence for relationship will have eroded away, but it would be nice to have some sort of objective standard.

    • j. says:

      We have no strictly quantitative way to do anything that detailed yet, I’m afraid. We’d first need a standard for estimating how likely it is that a given linguistic change (sound change, semantic change, loaning, etc.) will occur over a given time. From these we could build estimates for how likely it is that two words are related in a certain fashion (or are accidentally similar), and only after these are in place, we could finally assess the probability that two languages have shared a certain type of history.

      I hope to see a detailed model of this sort built eventually, though.

  2. Capra Internetensis says:

    Thanks! I didn’t really think there would be, but you never know.

    There are an awful lot of edge cases where whether you take a proposed relationship seriously or not seems to be entirely a matter of taste.

    I hope someone qualified does make a real attempt at it.

  3. David Marjanović says:

    I can’t read the Replicated Typo post now, because it’s 2 in the morning. I’ll need to read it at some point.

    And as a biologist, let me say we’ve had all these discussions long ago.

    If reconstructing a proto-language automatically also generated a phylogenetic tree of its descendants

    It’s the other way around: reconstructing the tree involves reconstructing the states at all nodes. Once you have the tree, you can just read them out.

    There’s a narrower and a wider sense of “cognate” out there:

    Yep, secondary and primary homology.

    • j. says:

      And as a biologist, let me say we’ve had all these discussions long ago.

      Within biology, I am sure. I have not seen nearly enough discussion on how well the conclusions can be ported over to linguistics.

      [R]econstructing the tree involves reconstructing the states at all nodes.

      For one example, this situation is complicated by that effectively all biological reconstruction involves subgroups. Meanwhile a whole lot of linguistic reconstruction involves reconstructing an earliest known common ancestor, which will leave it a lot harder (often unresolvable) to tell which, if either, of two traits found among descendants is an innovation and which, if either, is a retention.

      (And then there is that your average historical linguist, until recently, did not have the faintest clue about what cladistics is and how it works, and would have happily proposed subgroups based on any superficial similarity, up to and including non-exclusively shared retentions.)

      Yep, secondary and primary homology.

      No, I don’t think loanings can be said to be homologies. It is a phenomenon that has no exact parallel in biology (although horizontal gene transfer has some of the same features).

  4. David Marjanović says:

    Meanwhile a whole lot of linguistic reconstruction involves reconstructing an earliest known common ancestor, which will leave it a lot harder (often unresolvable) to tell which, if either, of two traits found among descendants is an innovation and which, if either, is a retention.

    The fun part of this is that the methods of biological phylogenetics produce an unrooted tree and then root it on the outgroup which was specified in advance. There’s no outgroup to life as a whole, and as you can guess there are several proposals about where the root of that tree is.

    That should actually be easier in linguistics, because many changes there are much easier in one direction than in the other.

    (And then there is that your average historical linguist, until recently, did not have the faintest clue about what cladistics is and how it works, and would have happily proposed subgroups based on any superficial similarity, up to and including non-exclusively shared retentions.)

    Biology used to work the same way. Up to, depending on the subdiscipline, the 1960s to 2000s at least (mid-80s and early 90s in my fields), phylogenetics was an art, not a science; and classification was an altogether separate art from phylogenetics (despite a large overlap) and routinely recognized taxa based on retentions.

    Up to then, historical linguistics was better off than biology, because it had a method, even though the Comparative Method isn’t quite as explicit and repeatable as it could be.

    No, I don’t think loanings can be said to be homologies. It is a phenomenon that has no exact parallel in biology (although horizontal gene transfer has some of the same features).

    It all behaves the same way in phylogenetics. If it looks the same, it must be coded as the same state of the same character = is a primary homology. To find out whether it’s also a secondary homology, you make a tree and trace the character on the tree. If the instances of that state are not secondarily homologous, that still won’t tell you if they’re due to convergence or borrowing/horizontal gene transfer; you need additional evidence for that.

  5. Eli Nelson says:

    I recently came across an interesting discussion of comparable biological terminology for the ways that genes can be related, which introduced to me the terms “orthology”, which is used to mean inherited from a common ancestor (=strictly cognate when applied to linguistics), and “xenology”, which is used for genetic material related by horizontal genes transfer, which could be seen as equivalent to loanwords. Both of these are subsets of the broader category of homology. I like this terminology because I feel it fills a gap in the terminology for words of uncertain relationship like the ones in Proto-Uralic and PIE.

  6. David Marjanović says:

    Orthology comes with paralogy, which is the result of gene duplication…

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.