Sunday, May 1, 2016

That editorial in BMC Evolutionary Biology

On Friday I looked at the website of the open access journal BMC Evolutionary Biology, after a colleague mentioned it as an option. Apart from the whopping article processing fee I noticed the little field "submitting a phylogenetic study? Please consult our editorial for guidance on the methodologies we consider to be of a suitable standard". That sounded interesting.

The editorial published in 2013 lists "common pipeline steps" as follows:
  1. Detecting homologs
  2. Multiple Sequence Alignment
  3. Quality control
  4. Model selection
Ah. What if one is not using a model-based approach? At that point I pressed ctrl + F and entered "parsimony" to see what they had to say on it. I found this:
Until the early nineties, parsimony and distance-based tree-building methods were preferred. More recently, probabilistic model-based methods, namely the maximum likelihood (ML) and the Bayesian approaches have grown to prominence due to their statistical properties and inferential powers. Moreover, these approaches go beyond simple phylogeny inference, providing a convenient statistical framework for further model selection and biological hypothesis testing. While parsimony is sometimes justified as model-free, it has mathematical properties and is not assumption-free; therefore explicit models should be generated for many biological problems. Likewise, distance-based methods may be unreliable for highly diverged data, yet they are often model-based and have nice mathematical properties and thus they may enable very fast and relatively accurate estimation of relevant biological parameters. Distance-based methods for tree reconstruction, such as neighbor joining, are extremely fast, and can provide reasonable solutions for extremely large data sets, something that would be much more computationally challenging with ML or Bayesian methods, even with recent computational advances.
Well, they say "many" biological problems instead of all, and maybe I am missing some nuances here - I am not a native speaker of English, after all is said and done - but to the best of my understanding this seems to say that BMC Evol Biol accepts any phylogenetic method except parsimony analysis.

I want to make perfectly clear that personally I have nothing against model based, statistical approaches. My first instinct when faced with a single, small DNA sequence alignment would be to run it through PhyML as packaged in my version of SeaView. For large supermatrices I use RAxML, and for smaller multi-gene datasets BEAST. For morphological datasets parsimony analysis in PAUP is my default approach, and for population genetic type data I would use distance methods. Really, I am a methods pragmatist and not irrationally attached to parsimony analysis as the proverbial hammer that makes everything look like a nail.

So that being out of the way, I have to say that I just do not see how the above section is anything but the mirror image of the much-maligned Cladistics editorial from earlier this year.

Cladistics was widely bashed for stating that "phylogenetic data sets submitted to this journal should be analysed using parsimony", allowing for alternative methods if the author can defend them "on philosophical grounds"; BMC Evol Biol demands the use of models, allowing, of all things!, for distance methods under some circumstances. It is not immediately clear to me how the first can be dogmatic but the second reasonable.

Cladistics was ridiculed for arguing on philosophy of science grounds, i.e. their view that it is problematic to use statistics to make inferences about one-off events in the past; BMC Evol Biol mentions "nice mathematical properties" and convenience as justifications. Oh, and parsimony is bad because it "has mathematical properties and is not assumption-free" ... just like the acceptable methods. Right.

Strangely, googling for "bmc evolutionary biology editorial" or for the same terms plus "parsimony" does not bring up a Twitter storm from 2013 comparable to the one following the Cladistics editorial this year. Even more strangely, I cannot find any calls to boycott BMC Evol Biol for dogmatically insisting on model-based phylogenetic analysis. Funny that.

Again, don't get me wrong. There are good arguments for the use of models, and I personally do not see any philosophical issue with using statistical approaches to historical processes. (The die cast from ten seconds ago is also a one-off historical event, so by the same logic we wouldn't be able to use statistics for anything.)

But the point here is that the parsimonygate tempest in a teacup was not primarily about whether likelihood is preferable to parsimony, it was very much about calling Cladistics dogmatic and cult-like for requiring the use of one of several available phylogenetic methods.

However, would that not mean that to remain intellectually consistent one would also have to call dogmatic, and call for a boycott of, a hypothetical journal that, say, required the use of Bayesian analyses? As I wrote at the time, I had serious doubts that the most outspoken critics of Cladistics would have any problems with a journal that did that.

The silence across the internet that apparently followed the publication of the BMC Evol Biol editorial in 2013 seems to provide at least a hint of confirmation to me that the recent criticism of Cladistics was really not about its editors preferring a method, but entirely about them preferring parsimony. It seems to confirm, in other words, that much of the outrage that followed derived not from exasperation at one side of the irrational parsimony versus models conflict being unable to move beyond it, but quite simply from being partisan of the other side of that irrational conflict.

And from my perspective, that of pragmatically using what appears to work best for a given problem, that looks like much less of a high ground to complain from...

No comments:

Post a Comment