Shandong University, Jinan.
Montana State University, Bozeman.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):905-13. doi: 10.1109/TCBB.2013.100.
Scaffold filling is a new combinatorial optimization problem in genome sequencing. The one-sided scaffold filling problem can be described as given an incomplete genome I and a complete (reference) genome G, fill the missing genes into I such that the number of common (string) adjacencies between the resulting genome I' and G is maximized. This problem is NP-complete for genome with duplicated genes and the best known approximation factor is 1.33, which uses a greedy strategy. In this paper, we prove a better lower bound of the optimal solution, and devise a new algorithm by exploiting the maximum matching method and a local improvement technique, which improves the approximation factor to 1.25. For genome with gene repetitions, this is the only known NP-complete problem which admits an approximation with a small constant factor (less than 1.5).
支架填充是基因组测序中的一个新的组合优化问题。单边支架填充问题可以描述为:给定一个不完整的基因组 I 和一个完整的(参考)基因组 G,将缺失的基因填充到 I 中,使得生成的基因组 I'和 G 之间的公共(字符串)邻接数最大化。对于具有重复基因的基因组,这个问题是 NP 完全的,最好的已知近似因子是 1.33,它使用了一种贪婪策略。在本文中,我们证明了最优解的一个更好的下界,并通过利用最大匹配方法和局部改进技术设计了一种新算法,将近似因子提高到 1.25。对于具有基因重复的基因组,这是唯一一个已知的 NP 完全问题,它允许使用小常数因子(小于 1.5)进行近似。