Research Article: Exploiting bounded signal flow for graph orientation based on cause-effect pairs

Date Published: August 25, 2011

Publisher: BioMed Central

Author(s): Britta Dorn, Falk Hüffner, Dominikus Krüger, Rolf Niedermeier, Johannes Uhlmann.

http://doi.org/10.1186/1748-7188-6-21

Abstract

We consider the following problem: Given an undirected network and a set of sender-receiver pairs, direct all edges such that the maximum number of “signal flows” defined by the pairs can be routed respecting edge directions. This problem has applications in understanding protein interaction based cell regulation mechanisms. Since this problem is NP-hard, research so far concentrated on polynomial-time approximation algorithms and tractable special cases.

We take the viewpoint of parameterized algorithmics and examine several parameters related to the maximum signal flow over vertices or edges. We provide several fixed-parameter tractability results, and in one case a sharp complexity dichotomy between a linear-time solvable case and a slightly more general NP-hard case. We examine the value of these parameters for several real-world network instances.

Several biologically relevant special cases of the NP-hard problem can be solved to optimality. In this way, parameterized analysis yields both deeper insight into the computational complexity and practical solving strategies.

Partial Text

Current technologies [1] like two-hybrid screening can find protein interactions, leading to protein-protein interaction (PPI) networks, but cannot decide the direction of the interaction. This can be complemented by gene knock-out experiments which constitute a way to determine causal relations in these networks, thus providing additional information on possible directions of information flow in them [2]. Given a list of so-called cause-effect pairs, the challenge consists in deducing an orientation of the PPI network which takes into account the causal relations of as many of these pairs as possible. Medvedovsky et al. [3] formalize this in terms of a graph theoretical problem as follows.

MTO was introduced by Medvedovsky et al. [3]; they showed that the problem is NP-complete even when the underlying tree is a star (that is, a diameter-two tree) or a tree with maximum vertex degree three. Moreover, they provided a cubic-time algorithm for MTO restricted to paths. Seeing MTO as the task to maximize the number of satisfied pairs, Medvedovsky et al. also provided polynomial-time approximation algorithms with approximation factor 1/4 in the case of stars and O(1/log n) in the case of general n-vertex trees. The latter approximation factor was recently improved to O(log log n/log n) by Gamzu et al. [8], who furthermore extended the studies of MTO to “mixed graphs” where some of the edges are already oriented based on causal relations known in advance. Besides these theoretical investigations, Medvedovsky et al. [3] also provided some experimental results based on a yeast PPI network and some synthetic data. Silverbush et al. [9] recently formulated a polynomial-size integer linear program for the generalization of mixed graphs and did some experiments with it. Also recently, Gitter et al. [10] considered graph orientation with the objective of maximizing the weight of all satisfied paths between sources and targets with length at most some constant k. They used approximation algorithms to discover pathways in biological networks. In an earlier work, Hakimi et al. [11] studied the special case of MTO where the list of pairs to be satisfied contains all possible pairs; they developed a quadratic-time algorithm for this case.

We mainly continue and complement previous work on MTO [3,8] by starting a parameterized and multivariate complexity analysis of MTO. That is, we try to better understand the border between tractable and intractable cases of MTO while sticking to optimal (instead of approximate) solutions. In particular, our focus is on the “amount of signal flow” over vertices and edges, respectively, and how this influences the computational complexity of MTO.

For ease of presentation, for a W-MTO instance (T, P, ω), we always assume that ω([s, t]) = 0 for all pairs s, t ∈ V with [s, t] ∉ P. Moreover, subsequently mostly referring to MTO, the presented concepts and definitions clearly apply to W-MTO as well. Note that in a tree T = (V, E), for each ordered pair [a, b] of vertices, there exists a uniquely determined path connecting these vertices. We will therefore often write the path defined by the pair [a, b] when we refer to the unique path in the tree starting in vertex a and ending in vertex b, or talk about pairs and paths interchangeably. Sometimes, we also talk about paths in the tree which do not necessarily correspond to pairs. We denote the undirected path connecting vertices v and w in T by pathT (v, w). Moreover, Pv: = {[s, t] ∈ P | v ∈ V (pathT (s, t))} denotes the set of paths passing through a vertex v (note that this includes paths of which v is an endpoint). An MTO instance is called rooted if the underlying tree T is rooted. In a rooted tree T = (V, E), if vertex a ∈ V is an ancestor of vertex b ∈ V, then we use the notation a ≺ b. The subtree of T rooted at v ∈ V is denoted Tv.

For the correctness of the algorithm note the following. For a leaf w and an ancestor v of w, the tree Twv is identical to the path pathT (v, w). Hence, the sum of the weights of pairs that can be satisfied by orienting the path either from v to w or from w to v is A(v, w) and A(w, v), respectively. Next, consider the case that w is an inner vertex and let v be an ancestor of w. Moreover, let u1,…, ul denote the children of w. We argue that the maximum weight of an orientation of (Twv,Pwv) orienting the edges on pathT (v, w) towards w equals

WEIGHTED MAXIMUM TREE ORIENTATIONon n-vertex paths can be solved in O(n2) time.

Next, we show that W-MTO is fixed-parameter tractable with respect to the parameter qv by extending the dynamic programming algorithm for cross-pair-free instances. Formally, qv is defined as follows. For a rooted W-MTO instance (T = (V, E), P) with root r, let Q denote the set of cross pairs. Moreover, for v ∈ V let Qv := Pv ∩ Q be the set of cross pairs passing through v. With respect to the root r the maximum number qv(r) of cross pairs passing through a vertex is given by maxv∈V |Qv|. Then, qv is the minimum value of qv(r) over all possible choices r to root T.

Let me be the maximum number of paths that pass through an edge. We consider MTO instances where me is limited. We show that the problem is linear-time solvable for me ≤ 2, but NP-hard for me ≥ 3, thereby establishing a dichotomy on the complexity of MTO with respect to me.

The goal in this section is to explore the space of practically meaningful parameterizations, here focusing on biological applications. We first performed experiments based on the same data as used by Medvedovsky et al. [3]. The network is a yeast protein-protein interaction network from the Database of Interacting Proteins (DIP) [24], containing 4,737 vertices and 15,147 edges. The cause-effect pairs were obtained from gene knockout experiments by Yeang et al. [2] and contain 14,502 pairs. After discarding small connected components and contracting cycles, we obtained a tree with 1,278 vertices and 5,569 pairs. (These numbers differ slightly from the ones stated by Medvedovsky et al. [3]. We do not use the additional kinase-substrate data, which is only meaningful to evaluate the orientations obtained, and which requires an arbitrary parameter choice not documented by Medvedovsky et al. [3].)

We started a parameterized complexity analysis of (WEIGHTED) MAXIMUM TREE ORIENTATION, obtaining a more fine-grained view on the computational complexity of this NP-hard problem. In this line, there are still several challenges for future investigations. For instance, it is open whether MTO is fixed-parameter tractable with respect to the parameter “number of satisfied pairs” (n – k). Further, in the spirit of “distance-from-triviality parameterization” [19,20] it would be interesting to study the parameterized complexity of MTO with respect to the parameter “number of all possible pairs minus the number of input pairs”–recall that for parameter value zero MTO is polynomial-time solvable [11]. MTO restricted to stars is still NP-hard, but then at least one quarter of all input pairs can always be satisfied [3]. Hence, it would be interesting to study above guarantee parameterization [15,20] with respect to the number of satisfied pairs. MTO can be translated into a vertex covering problem (see Proposition 1) on a graph class that is K4-free–this motivates to study whether vertex covering on this graph class can be done faster than on general graphs. Clearly, MTO brings along numerous further parameters and parameter combinations which can make a more comprehensive multivariate complexity analysis [20] very attractive. Often, it is desirable to not only list a single solution, but to enumerate all optimal solutions. Our dynamic-programming-based algorithms seem suitable for this. Following Gamzu et al. [8] and extending the studies for MTO as pursued here to the more general case of mixed graphs with partially already oriented edges is of high interest. First steps in this direction have very recently been undertaken by Silverbush et al. [9] and Elberfeld et al. [26]. Finally, it seems promising to examine the parameters based on cross pairs in other networks such as communication networks, and to try to exploit these parameters for other hard network problems.

The authors declare that they have no competing interests.

All authors contributed more or less equally, RN initiating the study of MTO under the viewpoint of multivariate complexity analysis and JU coming up with the major algorithmic ideas which have been worked out in more detail by DK. All authors read and approved the final manuscript.

 

Source:

http://doi.org/10.1186/1748-7188-6-21

 

Leave a Reply

Your email address will not be published.