**Date Published:** August 22, 2017

**Publisher:** Springer US

**Author(s):** Leo van Iersel, Steven Kelk, Georgios Stamoulis, Leen Stougie, Olivier Boes.

http://doi.org/10.1007/s00453-017-0366-5

**Abstract**

**The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To this end we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an unrooted phylogenetic network displays (i.e. embeds) an unrooted phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the tree bisection and reconnect distance of the two unrooted trees. In the third part of the paper we consider the “root uncertain” variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees.**

**Partial Text**

Within the field of phylogenetics the evolutionary history of a set of contemporary species X, known as taxa, is usually modelled as a tree where the leaves are bijectively labelled by X. One of the central challenges in phylogenetics is to accurately infer this history given only measurements on X (e.g. one string of DNA per species in X) and to this end many different optimality criteria have been proposed [12, 29]. One issue is that algorithms which construct evolutionary trees (henceforth: phylogenetic trees) usually produce unrooted phylogenetic trees as output i.e. trees in which the direction of evolution is not specified and thus the notion of “common ancestor” is not well-defined. Nevertheless, biologists are primarily interested in rooted trees [23], where the root, and thus the direction of evolution, is specified. In practice this problem is often addressed by solving the tree-inference and root-inference problem simultaneously, using a so-called “outgroup” [27]. However, this process is prone to error (see [38] for a recent case-study) and disputes over rooting location are prominent in the literature (see e.g. [11]).

A rooted binary phylogenetic networkdocumentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$N=(V,E)$$end{document}N=(V,E) on a set of leaf-labels (also known as taxa) X, (where documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$|X| ge 2$$end{document}|X|≥2), is a directed acyclic graph (DAG) in which the leaves (nodes with indegree 1 and outdegree 0) are bijectively labelled by X, there is exactly one root (a node with indegree 0 and outdegree 2), and all other nodes are either tree nodes (indegree 1, outdegree 2) or reticulation nodes (indegree 2, outdegree 1). As mentioned in the introduction, the reticulation numberr(N) of N is defined as documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$|E|-(|V|-1)$$end{document}|E|-(|V|-1), which is equal to the number of reticulation nodes in N. In other words, the reticulation number of a network is the number of edges we need to delete in order for the underlying undirected graph to be acyclic (i.e., a spanning tree). A rooted binary phylogenetic network N which has documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$r(N)=0$$end{document}r(N)=0 is simply called a rooted binary phylogenetic tree. Two rooted binary phylogenetic networks documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$N_1$$end{document}N1 and documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$N_2$$end{document}N2 on X are said to be isomorphic if there exists an isomorphism between documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$N_1$$end{document}N1 and documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$N_2$$end{document}N2 which is the identity on X.

Given a rooted binary phylogenetic network documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$N = (V,E)$$end{document}N=(V,E) on X and a rooted binary phylogenetic tree T also on X it is trivial to determine in time documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$O( 2^k cdot text {poly}(n) )$$end{document}O(2k·poly(n)) whether N displays T, where documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$k = r(N) = |E|- (|V|-1)$$end{document}k=r(N)=|E|-(|V|-1) and documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$n=|V|$$end{document}n=|V|. This is because, for each of the k reticulation nodes, we can simply guess which of its two incoming edges are on the image of T. Here we consider the natural unrooted analogue of the problem where both N and T are unrooted.

In this section we study the unrooted hybridization number problem in case the input consists of two trees documentclass[12pt]{minimal}

usepackage{amsmath}

usepackage{wasysym}

usepackage{amsfonts}

usepackage{amssymb}

usepackage{amsbsy}

usepackage{mathrsfs}

usepackage{upgreek}

setlength{oddsidemargin}{-69pt}

begin{document}$$T_1, T_2$$end{document}T1,T2 and we show equivalence to a well-known problem that has been studied before in the literature, namely the tree bisection and reconnect problem.

In this section we turn our attention to the Root Uncertain Hybridization Number (RUHN) problem. We remind the reader that in this problem the input consists of a set of unrooted binary trees and we are asked to choose the root location of each tree, such that the hybridization number is minimized. In the first part of this section we show that RUHN is already NP-hard and APX-hard even when the input consists of two trees. On the other hand, in the next subsection we show that the problem is FPT in the hybridization number for any number of trees by providing a quadratic-sized kernel. We conclude the section by discussing how an exponential-time algorithm can be obtained for solving the kernel.

In this article we have studied two variations of the classical hybridization number (HN) problem: the root-uncertain variant RUHN and the unrooted variant UHN. We have also studied the natural unrooted variant of the tree containment (TC) decision problem, UTC.

Source:

http://doi.org/10.1007/s00453-017-0366-5