Dipartimento di Informatica Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano-Bicocca, V.le Sarca, 336, Milan 20126, Italy.
IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1582-94. doi: 10.1109/TCBB.2012.100.
The MINIMUM-RECOMBINANT HAPLOTYPE CONFIGURATION problem (MRHC) has been highly successful in providing a sound combinatorial formulation for the important problem of genotype phasing on pedigrees. Despite several algorithmic advances that have improved the efficiency, its applicability to real data sets has been limited since it does not take into account some important phenomena such as mutations, genotyping errors, and missing data. In this work, we propose the MINIMUM-RECOMBINANT HAPLOTYPE CONFIGURATION WITH BOUNDED ERRORS problem (MRHCE), which extends the original MRHC formulation by incorporating the two most common characteristics of real data: errors and missing genotypes (including untyped individuals). We describe a practical algorithm for MRHCE that is based on a reduction to the well-known Satisfiability problem (SAT) and exploits recent advances in the constraint programming literature. An experimental analysis demonstrates the biological soundness of the phasing model and the effectiveness (on both accuracy and performance) of the algorithm under several scenarios. The analysis on real data and the comparison with state-of-the-art programs reveals that our approach couples better scalability to large and complex pedigrees with the explicit inclusion of genotyping errors into the model.
最小重组单倍型配置问题(MRHC)在为系谱上的基因型相位重要问题提供合理的组合公式方面非常成功。尽管已经有几项算法改进提高了效率,但由于它没有考虑到一些重要的现象,如突变、基因分型错误和缺失数据,因此其在实际数据集上的应用受到限制。在这项工作中,我们提出了具有边界错误的最小重组单倍型配置问题(MRHCE),该问题通过将真实数据的两个最常见特征(错误和缺失基因型(包括未分型个体))纳入原始 MRHC 公式来扩展。我们描述了一种基于著名可满足性问题(SAT)的实用算法,并利用约束编程文献中的最新进展。实验分析证明了相位模型的生物学合理性,以及在几种情况下算法的有效性(准确性和性能)。对真实数据的分析以及与最先进程序的比较表明,我们的方法将更好的可扩展性与对模型中基因分型错误的明确纳入相结合,适用于大型和复杂的系谱。