Chauve Cedric, Hausd Utz-Uwe, Stephen Tamon, You Vivija P
Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada.
J Comput Biol. 2010 Sep;17(9):1167-81. doi: 10.1089/cmb.2010.0113.
A binary matrix has the Consecutive Ones Property (C1P) if its columns can be ordered in such a way that all 1's on each row are consecutive. A Minimal Conflicting Set is a set of rows that does not have the C1P, but every proper subset has the C1P. Such submatrices have been considered in comparative genomics applications, but very little is known about their combinatorial structure and efficient algorithms to compute them. We first describe an algorithm that detects rows that belong to Minimal Conflicting Sets. This algorithm has a polynomial time complexity when the number of 1s in each row of the considered matrix is bounded by a constant. Next, we show that the problem of computing all Minimal Conflicting Sets can be reduced to the joint generation of all minimal true clauses and maximal false clauses for some monotone boolean function. We use these methods on simulated data related to ancestral genome reconstruction to show that computing Minimal Conflicting Set is useful in discriminating between true positive and false positive ancestral syntenies. We also study a dataset of yeast genomes and address the reliability of an ancestral genome proposal of the Saccharomycetaceae yeasts.
如果一个二元矩阵的列可以按如下方式排序,使得每行中的所有1都是连续的,那么该二元矩阵具有连续1属性(C1P)。最小冲突集是一组不具有C1P的行,但每个真子集都具有C1P。这样的子矩阵在比较基因组学应用中已被考虑,但关于它们的组合结构和计算它们的高效算法却知之甚少。我们首先描述一种算法,该算法可检测属于最小冲突集的行。当所考虑矩阵的每行中1的数量由一个常数界定时,此算法具有多项式时间复杂度。接下来,我们表明计算所有最小冲突集的问题可以简化为为某个单调布尔函数联合生成所有最小真子句和最大假子句。我们将这些方法应用于与祖先基因组重建相关的模拟数据,以表明计算最小冲突集在区分真阳性和假阳性祖先共线性方面是有用的。我们还研究了酵母基因组数据集,并探讨了酿酒酵母科酵母祖先基因组提议的可靠性。