Hüffner Falk, Komusiewicz Christian, Liebtrau Adrian, Niedermeier Rolf
IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):455-67. doi: 10.1109/TCBB.2013.177.
A popular clustering algorithm for biological networks which was proposed by Hartuv and Shamir identifies nonoverlapping highly connected components. We extend the approach taken by this algorithm by introducing the combinatorial optimization problem Highly Connected Deletion, which asks for removing as few edges as possible from a graph such that the resulting graph consists of highly connected components. We show that Highly Connected Deletion is NP-hard and provide a fixed-parameter algorithm and a kernelization. We propose exact and heuristic solution strategies, based on polynomial-time data reduction rules and integer linear programming with column generation. The data reduction typically identifies 75 percent of the edges that are deleted for an optimal solution; the column generation method can then optimally solve protein interaction networks with up to 6,000 vertices and 13,500 edges within five hours. Additionally, we present a new heuristic that finds more clusters than the method by Hartuv and Shamir.
一种由哈图夫和沙米尔提出的用于生物网络的流行聚类算法可识别不重叠的高度连通组件。我们通过引入组合优化问题“高度连通删除”来扩展该算法所采用的方法,该问题要求从图中删除尽可能少的边,以使所得图由高度连通组件组成。我们证明“高度连通删除”是NP难问题,并提供了一种固定参数算法和一种内核化方法。我们基于多项式时间数据约简规则和带列生成的整数线性规划,提出了精确和启发式的求解策略。数据约简通常能识别出最优解中75%被删除的边;然后,列生成方法可以在五小时内最优地求解具有多达6000个顶点和13500条边的蛋白质相互作用网络。此外,我们提出了一种新的启发式方法,它比哈图夫和沙米尔的方法能找到更多的聚类。