Xu Andrew Wei
School of Computer and Communication Sciences, Swiss Federal Institute of Technology, Lausanne, Switzerland.
J Comput Biol. 2010 Sep;17(9):1195-211. doi: 10.1089/cmb.2010.0106.
In genome rearrangement, given a set of genomes G and a distance measure d, the median problem asks for another genome q that minimizes the total distance [Formula: see text]. This is a key problem in genome rearrangement based phylogenetic analysis. Although this problem is known to be NP-hard, we have shown in a previous article, on circular genomes and under the DCJ distance measure, that a family of patterns in the given genomes--represented by adequate subgraphs--allow us to rapidly find exact solutions to the median problem in a decomposition approach. In this article, we extend this result to the case of linear multichromosomal genomes, in order to solve more interesting problems on eukaryotic nuclear genomes. A multi-way capping problem in the linear multichromosomal case imposes an extra computational challenge on top of the difficulty in the circular case, and this difficulty has been underestimated in our previous study and is addressed in this article. We represent the median problem by the capped multiple breakpoint graph, extend the adequate subgraphs into the capped adequate subgraphs, and prove optimality-preserving decomposition theorems, which give us the tools to solve the median problem and the multi-way capping optimization problem together. We also develop an exact algorithm ASMedian-linear, which iteratively detects instances of (capped) adequate subgraphs and decomposes problems into subproblems. Tested on simulated data, ASMedian-linear can rapidly solve most problems with up to several thousand genes, and it also can provide optimal or near-optimal solutions to the median problem under the reversal/HP distance measures. ASMedian-linear is available at http://sites.google.com/site/andrewweixu .
在基因组重排中,给定一组基因组G和一个距离度量d,中位数问题是寻找另一个基因组q,使总距离[公式:见原文]最小化。这是基于基因组重排的系统发育分析中的一个关键问题。尽管已知该问题是NP难问题,但我们在之前的一篇文章中表明,对于环状基因组且在DCJ距离度量下,给定基因组中的一类模式(由适当子图表示)使我们能够通过分解方法快速找到中位数问题的精确解。在本文中,我们将这一结果扩展到线性多染色体基因组的情况,以便解决关于真核细胞核基因组的更有趣的问题。线性多染色体情况下的多路封顶问题在环状情况的难度之上又带来了额外的计算挑战,而这一难度在我们之前的研究中被低估了,本文将对此进行探讨。我们用封顶的多重中位数问题来表示中位数问题,将适当子图扩展为封顶的适当子图,并证明保持最优性的分解定理,这些定理为我们提供了同时解决中位数问题和多路封顶优化问题的工具。我们还开发了一种精确算法ASMedian-linear,它迭代地检测(封顶的)适当子图的实例,并将问题分解为子问题。在模拟数据上进行测试时,ASMedian-linear能够快速解决大多数包含数千个基因的问题,并且在反转/HP距离度量下,它也能为中位数问题提供最优或接近最优的解。ASMedian-linear可在http://sites.google.com/site/andrewweixu获取。