Date Published: March 5, 2013
Publisher: BioMed Central
Author(s): Alberto Apostolico, Matteo Comin, Andres Dress, Laxmi Parida.
The large majority of optimization problems related to the inference of distance‐based trees used in phylogenetic analysis and classification is known to be intractable. One noted exception is found within the realm of ultrametric distances. The introduction of ultrametric trees in phylogeny was inspired by a model of evolution driven by the postulate of a molecular clock, now dismissed, whereby phylogeny could be represented by a weighted tree in which the sum of the weights of the edges separating any given leaf from the root is the same for all leaves. Both, molecular clocks and rooted ultrametric trees, fell out of fashion as credible representations of evolutionary change. At the same time, ultrametric dendrograms have shown good potential for purposes of classification in so far as they have proven to provide good approximations for additive trees. Most of these approximations are still intractable, but the problem of finding the nearest ultrametric distance matrix to a given distance matrix with respect to the L∞ distance has been long known to be solvable in polynomial time, the solution being incarnated in any minimum spanning tree for the weighted graph subtending to the matrix.
This paper expands this subdominant ultrametric perspective by studying ultrametric networks, consisting of the collection of all edges involved in some minimum spanning tree. It is shown that, for a graph with n vertices, the construction of such a network can be carried out by a simple algorithm in optimal time O(n2) which is faster by a factor of n than the direct adaptation of the classical O(n3) paradigm by Warshall for computing the transitive closure of a graph. This algorithm, called UltraNet, will be shown to be easily adapted to compute relaxed networks and to support the introduction of artificial points to reduce the maximum distance between vertices in a pair. Finally, a few experiments will be discussed to demonstrate the applicability of subdominant ultrametric networks.
As is well known, most optimization problems related to the inference of distance‐based trees used in phylogenetic analysis and classification are intractable (see [1,2] for a pertinent discussion). One notable exception is found within the realm of ultrametric distances (cf. ). The introduction of such distances in phylogeny was inspired by a model of evolution, now largely abandoned, driven by the postulate of a molecular clock whereby the amount of phylogenetic change observable between any two extant species is directly related to the amount of time that elapsed since their last common ancestor roamed this planet, implying that phylogenetic distances could simply be represented by a weighted tree in which the sum of the weights of the edges separating any given leaf from the root is the same for all leaves.
We study the following, rather abstract, conceptual frame work: We start with a finite set V representing the sequences and an arbitrary weighting
Clearly, given V and W as above, the ultrametric network can be produced in time O(|V|3) by a straightforward adaptation of the Floyd‐Warshall all‐pairs shortest‐path algorithm . In view of Theorem 1, this network could be produced in time O(|V|3) also by first computing one MST by, say, Prim’s algorithm, and then computing W∗ using the paths in this tree. We present here an algorithm to compute the entire network in time O(|V|2). This is optimal since any algorithm must produce Θ(|V|2) values at the outset.
The algorithm of the preceding section lends itself naturally to variants that accommodate some tolerance in the ultrametric distance and relax the notion of ultrametric network. We outline here these two variants, respectively leading to Δ‐ultrametric networks and to the introduction of new artificial vertices.
To conclude our presentation, we report two examples of inference of Human Y‐chromosome phylogeny from Short Tandem Repeats. This can be based on the study of Human migration and the associated relationships among different populations. In typical experiments, we are interested in constructing a network from the STRs information of various individuals and in comparing the results with known paths of migrations. An interesting example of such a phylogeny reconstruction can be found in , which discusses the significance of STRs data as markers for human evolution, but also highlights the difficulties that the analysis of this data derives from the lack of an appropriate methodology.
In conclusion, this paper expands the subdominant ultrametric perspective by studying ultrametric networks. We shown that, for a graph with n vertices, the construction of such a network can be carried out by a simple algorithm in optimal time O(n2). This algorithm can be easily adapted to compute relaxed networks, such as Δ‐ultrametric networks and to support the introduction of artificial points to reduce the maximum distance between vertices in a pair. Finally, we discussed a few experiments to demonstrate the applicability of subdominant ultrametric networks.
The authors declare that they have no competing interests.
All authors contributed equally to this work. All authors read and approved the final manuscript.