Date Published: November 28, 2012
Publisher: BioMed Central
Author(s): Marta Casanellas, Jesús Fernández-Sánchez, Anna M Kedzierska.
The selection of an evolutionary model to best fit given molecular data is usually a heuristic choice. In his seminal book, J. Felsenstein suggested that certain linear equations satisfied by the expected probabilities of patterns observed at the leaves of a phylogenetic tree could be used for model selection. It remained an open question, however, whether these equations were sufficient to fully characterize the evolutionary model under consideration.
Here we prove that, for most equivariant models of evolution, the space of distributions satisfying these linear equations coincides with the space of distributions arising from mixtures of trees. In other words, we prove that the evolution of an observed multiple sequence alignment can be modeled by a mixture of phylogenetic trees under an equivariant evolutionary model if and only if the distribution of patterns at its columns satisfies the linear equations mentioned above. Moreover, we provide a set of linearly independent equations defining this space of phylogenetic mixtures for each equivariant model and for any number of taxa. Lastly, we use these results to perform a study of identifiability of phylogenetic mixtures.
The space of phylogenetic mixtures under equivariant models is a linear space that fully characterizes the evolutionary model. We provide an explicit algorithm to obtain the equations defining these spaces for a number of models and taxa. Its implementation has proved to be a powerful tool for model selection.
The principal goal of phylogenetics is to reconstruct the ancestral relationships among organisms. Most popular phylogenetic reconstruction methods are based on mathematical models describing the molecular evolution of DNA. In spite of this, there exists no unified framework for model selection and the results are highly dependent on the models and methods used in the analysis (cf. ).
In this paper, we have dealt with the space of phylogenetic mixtures for evolutionary equivariant models. We have shown that for the case of the Jukes-Cantor model, the Kimura models with two or three parameters, the strand symmetric model and the general Markov model, this linear space is defined by the set of linear equations satisfied by the distributions of the patterns at the leaves of a tree that evolves under that model. It follows that this space completely characterizes the model. The use of tools from group theory and group representation theory played a major role, and allowed us to design a procedure to produce minimal systems of equations for these spaces and for any number of taxa. This procedure has been implemented successfully in a new method for model selection in phylogenetics based on linear invariants (see ), which is available online at http://genome.crg.es/cgi-bin/phylo_mod_sel/AlgModelSelection.pl,.
The authors declare that they have no competing interests.
All authors contributed equally and the author names order is alphabetical. All authors read and approved the final manuscript.