Tuesday, January 7, 2014

SeaView

As mentioned in my last post, I have recently stumbled across a really great and handy software tool for molecular phylogenetics. When installing programs on Linux, I sought for an open source alternative to the sequence alignment editor BioEdit that I have been using on Windows since I was an undergrad. The one I found, SeaView, may in some regards be even better.

Admittedly, BioEdit has more options for manual sequence editing. Where SeaView gives us only the option of typing and deleting individual bases or indels, BioEdit has a drag-and-drop and a grab-and-drag function, both with useful extra settings such as whether only the marked area should be moved or whether the bases to the right should move as a block.

In all other areas SeaView appears to have the advantage, at least at first sight. It has two automated alignment options, the venerable ClustalW and Muscle, the latter one in my experience the better of the two.

If sample names are identical between datasets (and if they aren't, it's your fault), it is easy to concatenate them. Simply open the datasets in different windows of SeaView and use File -> Concatenate. The only disadvantage is that it does not fill the missing sequences in each block with Ns but merely with gaps, but at least this function makes concatenation much easier than with a text editor.

This would have made some previous projects a bit easier.

Another helpful function is Props -> Statistics because it quickly gives an overview of variability in the dataset, how many sites are parsimony informative and suchlike. Strangely, however, it ignores all sites that have gaps even in only one sequence, so it is not suitable for obtaining the relevant statistics for publication.

If you want to move your finished alignments into a phylogenetics tool, it can be saved in a variety of formats, the most important probably fasta, nexus (for PAUP, MrBayes, BEAST) and phylip (for RAxML or, obviously, PHYLIP). But the best thing is, you do not necessarily have to do that because SeaView itself provides modules to immediately conduct parsimony, distance and likelihood analyses.

I have so far tried parsimony, which uses a PHYLIP module, and likelihood, which uses the PhyML software. The analyses are quite fast and allow all the major settings one would expect, although they do not include TBR branch swapping. Likelihood analyses can be run with seven different substitution models. Branch support can be inferred with bootstrapping and, in the case of likelihood, the aLRT test. The disadvantage of the parsimony module is that it appears to return only the consensus tree. For publication I generally want all the most parsimonious trees so I will continue to use TNT or PAUP instead, but for likelihood analyses or simply for quick and dirty data exploration before the real analysis SeaView is definitely an option.

When you have inferred a phylogenetic tree, SeaView opens it in a separate window where you can reroot it, display it as circle tree, cladogram or phylogram, display support values or branch lengths, and finally export it in Newick format or as SVG or PS vector graphics for further editing e.g. in Inkscape.

As for the design, the display of sequence data looks a bit garishly colourful at first but it is endlessly customizable. You can set anything from the colours the various bases are displayed in to the font size used for the alignment. (Just about the only thing that annoyingly cannot be changed is how many letters of the sample name are displayed, so don't make them too long.)

That just about covers it I'd say.

Lastly, SeaView has some purely technical advantages over other sequence alignment editors I have used: It is a small, easily installed and fast package, so far it seems very stable, and it is available for all major operating systems. One of its strengths is that it is to a large degree a GUI that combines and greatly facilitates the use of established freeware programs provided by other scientists, such as the alignment and phylogenetics modules mentioned above. Instead of using a different tool for each step and then having to constantly export and import data in varying formats, you can do everything from automated alignment to exporting the final likelihood tree as a vector graphic in one comprehensive program.

The major downsides appear to be very restricted space for sample names, the failure of the program to fill missing sequences with Ns or questions marks when concatenating, the failure to retrieve multiple equally parsimonious trees, and, perhaps only in my case, that I cannot figure out why I do not have the 'sites bar' at the bottom that you need to mark groups of characters.

Well, nothing is perfect in real life, there are only trade-offs. If you are interested in trying this alignment software, you may want to check it out at the program website. I will definitely get some use out of it in the future, not least because it allows me to have the same tool on both operating systems I am using.

No comments:

Post a Comment