A few too many of my blog posts seem to end up ballooning into mini-articles and consequently spend months if not years languishing in my drafts. Let’s see if I can keep this one brief.
An adage sometime seen in historical linguistics is “classification before reconstruction”. On one level, I agree. But, on a few others, this seems to be often abused as an excuse to skimp on proper rigor.
What this means, in my opinion:
- It’s not possible to do comprehensive comparative reconstruction work with data from unrelated languages. Reconstruction can only be attempted once we have a reasonable amount of certainty that some particular language family exists at all.
What this does not mean:
- Classification having to precede work in historical phonology entirely. Realiable classification cannot be done by vague casual eyeballing of data. “A reasonable amount of certainty” for the relatedness of some particular languages requires being able to locate regular sound correspondences within their shared vocabulary (preferrably non-trivial ones, but any regularity is a start).  In the absense of regular sound correspondences, all vocabulary comparisons can potentially be suspected to be either coincidental, or loanwords rather than strict cognates.
In other words: sound correspondences are not reconstructions, in themselves. In the case of binary comparison, this distinction may end up blurred, since it’s possible to kind of put together an initial “trivial reconstruction” by just listing all your correspondences, and giving each of them some kind of a vague phonetic label.  If the family has more members, though, the bare sound correspondences typically end up looking more like networks — since sound correspondences are not transitive. If /tʃ/ in language 1 can correspond to /s/ in language 2, and /s/ in language 2 can correspond to /h/ in language 3, this does not automatically guarantee that a correspondence /tʃ/ ~ /h/ between 1 and 3 would be demonstrable, or even expected at all. Perhaps /s/ in language 2 is a merger of two separate proto-phonemes; perhaps these correspondences do continue the same proto-phoneme, but under mutually exclusive conditions; perhaps one of these correspondences indicates loanwords after all and not native vocabulary.
- Subclassification having to precede reconstruction. On the contrary, it is reconstruction that often allows us to put together arguments in favor of subgroups, by providing a root for our sound correspondences. If we have a correspondence such as t ~ t ~ s ~ s, it’s likely that either the t-group or the s-group has innovated, and constitutes a subgroup. But it is also very possible that the other group has not, and is paraphyletic. Without reconstruction work, this is not resolvable.
- Reconstruction being unable to inform classification. A reconstruction of the parent of a set of languages might end up coming out closer to some other language, that we may have suspected (but haven’t dared to declare) to be also related. It could even turn out that this language newly under comparison is not only related, but it is indeed a direct descendant of this same proto-language; just a very divergent one! — Or maybe the proto-language turns out to be substantially less similar to the other language being compared, and the earlier suspicion of a relationship evaporates entirely, or has to be reanalyzed as a late loanword layer.
- Language isolates‘ history being unreconstructible. Internal reconstruction combined with loanword evidence can allow identifying probable sound changes and lexical intrusions just fine… though I suppose it will be unlikely to get especially far with this technique.
A more detailed workflow for historical linguistics, if starting from zero, would therefore look something like the following:
- Acquire data; sort out some initial vocabulary comparisons that look promising.
- Analyze sound correspondences; use these to look for more comparisons.
- Look at the big picture to see if some particular subset of languages should be indeed considered related.
- Attempt reconstructing the proto-language.
- Use the proto-language POV to clarify the status of issues like problematic etymologies, possible external relatives, or possible subgroups.
- Use modified analyses of data to improve the proto-language reconstruction.
- Iterate 5 and 6 until you’ve run out of insights to gain from the data.
This could also work as a kind of a typology of how far along research on a particular language family is. To date, I don’t think any language family has yet exhausted stage 7. Most are stuck in limbo somewhere around stage 3; only a few have reached stage 5, and Indo-European might be the only one to have indisputably gone through one cycle of stage 7. Big disputed hypotheses grouping well-accepted families together can probably be divided according to if they’re closer to stage 1 (e.g. Amerind, Nilo-Saharan) or stage 2 (e.g. variations of Nostratic). Smaller disputed hypotheses often seem to be either at stage 2 or stage 4, depending on who you ask (e.g. Altaic). (To which I might reply: if these really are supposed to be already at stage 4, bring on stage 5, please.)
Of course there are many major facets of historical linguistics still missing here. We also want to account for typology at some points, morphology too at others, semantics three, periodically research loanwords and then weed them out of the proto-language, maybe entertain some substrate hypotheses.
 Some people will claim that vocabulary is strictly optional and you can show relatedness solely on the basis of grammar. I am skeptical; but if this were to be the case — then the implication is that we will not be doing any lexical reconstruction work at any point at all.
 Maybe with subscripts to disambiguate overlapping sets if you’d prefer, but anything goes in principle. If your heart desires to see more wingdings in linguistics papers, there is nothing formally wrong in re-labeling a t ~ tʰ correspondence as *☕.