Date Published: May 21, 2012
Publisher: BioMed Central
Author(s): Frederick A Matsen , Steven N Evans.
There are several common ways to encode a tree as a matrix, such as the adjacency matrix, the Laplacian matrix (that is, the infinitesimal generator of the natural random walk), and the matrix of pairwise distances between leaves. Such representations involve a specific labeling of the vertices or at least the leaves, and so it is natural to attempt to identify trees by some feature of the associated matrices that is invariant under relabeling. An obvious candidate is the spectrum of eigenvalues (or, equivalently, the characteristic polynomial).
We show for any of these choices of matrix that the fraction of binary trees with a unique spectrum goes to zero as the number of leaves goes to infinity. We investigate the rate of convergence of the above fraction to zero using numerical methods. For the adjacency and Laplacian matrices, we show that the a priori more informative immanantal polynomials have no greater power to distinguish between trees.
Our results show that a generic large binary tree is highly unlikely to be identified uniquely by common spectral invariants.
Tree shape theory furnishes numerical statistics about the structure of a tree [1,2]. (Because we are interested in applications of tree statistics to trees that describe the structure of branching events in evolutionary histories, we will, for convenience, always take the term tree without any qualifiers to mean a rooted, binary tree without any branch length information or labeling of the vertices.) Such statistics have two related uses. Firstly, they can be used in an attempt to tell whether two trees are actually the same and, secondly, they can be used to indicate the degree of similarity between two trees with respect to some criterion.
Spectral invariants of matrix formulations of trees are a natural way to quantify the shape of phylogenetic trees. However, in this paper we show that a complete classification of tree shapes using common spectral invariants of generalized Laplacian and distance matrices is not possible. For either of these choices of matrix we show that the fraction of binary trees with a unique spectrum goes to zero as the number of leaves goes to infinity, but the rate of convergence of the above fraction to zero appears to be slow. For the adjacency and Laplacian matrices, we show that the a priori more informative immanantal polynomials have no greater power to distinguish between trees.
The authors declare that they have no competing interests.
FAM conceived of the project, proved the theorems, performed the numerical experiments, and wrote the paper. SNE applied the immanant, proved the theorems, and wrote the paper.