Wednesday, February 5, 2014

Can parsimony analyses be mislead by 'budding' speciation?

In several places the Framework for a Post-Phylogenetic Systematics argues in favour of inferring relationships between groups of organisms with the informed intuition of the 'classical taxonomist' instead of 'mechanistic' phylogenetic analyses. It presents several arguments, among them the observation that classical taxonomy is generally based on more specimens per species than phylogenetics and the claim that neither parsimony analyses of morphological data nor molecular phylogenies can be trusted to infer relationships correctly.

I kind of get the first of these arguments, at least in principle. Yes, we could mention the large number of newly described species in classical taxonomy that are, at first at least, based on just the type collection. Yes, there seems to be some kind of conflation going on here between species delimitation, which to be done well indeed needs large numbers of specimens, and phylogenetics, which should preferably use only characters that are not variable within the species anyway and thus should theoretically work fine with only one specimen. But still one can at least grant that deep sampling is an advantage of the classical taxonomist's approach. (I would never argue against it anyway - alpha taxonomy and phylogenetics are complementary, not alternatives.)

But I do not at all understand the distrust of phylogenetic analyses. Consequently I would like to take a closer look at these methods, starting with the supposed vulnerability of parsimony analyses of morphological data. Specifically, the claim is that they can be mislead into inferring the wrong relationships if speciation is by budding, in other words if one of the two daughter lineages in a speciation event is morphologically indistinguishable from the ancestor (Framework p. 25, p. 39).

So let's see. In the following we accept, despite my misgivings about it, the composite species concept. We build an example dataset with six species, labelled A to F. They have six binary morphological characters which we represent as 0 or 1. The ancestral composite species of the entire group is A, and the ancestral character states are 000000.

A buds off a population X that fixes state 1 for character 1. For variety, we now have the only case in this group where we don't have a budding event but a lineage split of X in which both daughter lineages acquire new characters. One of them is B (with character 2 switched to state 1), the other is C (with character 3 switched to state 1). Nothing looking exactly like X survives. Now C buds off D through change in character 4 and then E through change in character 5. Finally, E buds off F through change in character 6.

This gives us the following character by species table:

  123456
A 000000
B 110000
C 101000
D 101100
E 101010
F 101011


What the author of the Framework would like to see is a result that shows which species is whose 'ancestor' (again, for the sake of argument we ignore the question whether treating one group of organisms living today as the ancestor of another group of organisms living today makes sense). One of the ways in which 'evolutionary' systematists like to illustrate 'ancestor-descendant relationships' is as directional networks with extant taxa at internal nodes. For our group, such a diagram would have to look like this (I hope the arrows show as intended on your screen):

A→X→C→E→F
  ↓ ↓
  B D


We can now fire up some software that can do parsimony analysis, feed the character matrix into it, and compare the resulting phylogenetic tree with the above diagram. I have quickly done so using the good old workhorse PAUP. What is the result?


We receive one most parsimonious tree from the analysis, as expected. In the above representation, I saved it from PAUP and used TreeView to turn it into this unrooted 'circular tree' with branch lengths. Subsequently, I merely made the font larger and added the label of the ancestral species X to the relevant internal node. Now compare this diagram with the previous one - as expected, they are completely identical. Because they have no autapomorphies, the 'ancestral' composite species are even sitting on the internal nodes, as preferred by 'evolutionary' systematists.

The only thing that is missing is directionality. So let us root the tree with the oldest species A and draw the tree as a rectangular phylogram (i.e. again with branch lengths). Time flows from left to right in this representation:


Again, this shows exactly the relationships we set our little group up to have. The 'ancestors' are sitting on internal nodes, so you can go through my description above one "buds off" after the other and confirm that the representation is correct. So to conclude:
  1. A parsimony analysis of morphological characters does reconstruct the topology of the tree correctly even if there is speciation by budding.
  2. If the phylogram shows a terminal sitting on an internal node without branch length, this indicates that the terminal has a character combination identical to the inferred ancestor at this node.
  3. Therefore, if we operate under the composite species concept and (for some unclear reason) want to see extant species as the ancestors of other extant species, we merely need to use a phylogram view of the tree and we will see which terminal is the ancestor of which other terminal.
  4. Indeed displaying the tree as unrooted with branch lengths will essentially produce the same network that an 'evolutionary' systematist would draw.
Note that I am not saying that this is a sensible interpretation of a parsimony analysis, merely that it is a possible one. If the 'evolutionary' systematist wants to, they can use it to arrive at their 'macroevolutionary series'.

The point is then: What was the problem with parsimony analyses again?

One possible answer is the strange claim also repeatedly found in the Framework that phylogeneticists supposedly delete all characters that are not phylogenetically informative from the dataset (e.g. p. 40). If that were the case, then B, D and F would also be sitting directly on an internal node without any branch length to separate them from their 'ancestors'. Perhaps that is what is meant here? But in that case the problem is not with the parsimony analysis as such because the 'evolutionary' systematist could simply add those characters, as I did above. Also, this is the first time I came across this odd claim. I am a phylogeneticist and I am surely not in the habit of deleting all autapomorphies - sure, they don't help to infer relationships, but they don't hurt either. And hopefully it will be obvious that nobody bothers to remove them from molecular datasets at least; if we did, it would mess up estimation of branch lengths and molecular clocks.

Another possible answer is provided by plate 6.1 of the Framework and the accompanying text, especially on page 57. It discusses the possibility of reversals in some characters leading to the inference a wrong tree topology. But well, that is a very different claim than the one on pages 25 and 39, which was simply that 'budding' may throw parsimony off.

What is more, if the parsimony analysis is supposed to be mislead by character distributions then there are only two possibilities. First, you have some character data that allow you to infer the correct relationships but you have withheld them from the parsimony analysis. In other words, you gave the computer incomplete information and then claim that it cannot find the right solution because the method it uses is wrong. That is cheating and not a valid argument against parsimony analyses.

Second, you have no additional data, you merely feel in your gut that a certain branching order is correct; or perhaps you have drawn a hypothetical example with a certain branching order but there are no data in your example that can be used to guide the analysis to the answer you want. Sorry, but if that is the case then you do not know the right branching order either. You are only making stuff up as opposed to doing science. It is simple as that.

If it had all the information, the parsimony analysis would find the same answer as you did. And that is how it must be, because our own understanding of the true relationships (that we would test the computer against) will also have to be based on a parsimony analysis we did in our head - if there is any real answer to be had.

So as of now, colour me unconvinced. If there is a problem with formal parsimony analyses beyond those that we have to solve by using even more sophisticated 'mechanistic' approaches then I am unable to find it.

No comments:

Post a Comment