Climer Sharlee, Jäger Gerold, Templeton Alan R, Zhang Weixiong
Department of Computer Science and Engineering, Washington University, St. Louis, MO, USA.
Bioinformatics. 2009 Jan 1;25(1):68-74. doi: 10.1093/bioinformatics/btn572. Epub 2008 Nov 4.
Inference of haplotypes from genotype data is crucial and challenging for many vitally important studies. The first, and most critical step, is the ascertainment of a biologically sound model to be optimized. Many models that have been proposed rely partially or entirely on reducing the number of unique haplotypes in the solution.
This article examines the parsimony of haplotypes using known haplotypes as well as genotypes from the HapMap project. Our study reveals that there are relatively few unique haplotypes, but not always the least possible, for the datasets with known solutions. Furthermore, we show that there are frequently very large numbers of parsimonious solutions, and the number increases exponentially with increasing cardinality. Moreover, these solutions are quite varied, most of which are not consistent with the true solutions. These results quantify the limitations of the Pure Parsimony model and demonstrate the imperative need to consider additional properties for haplotype inference models. At a higher level, and with broad applicability, this article illustrates the power of combinatorial methods to tease out imperfections in a given biological model.
从基因型数据推断单倍型对于许多至关重要的研究而言至关重要且具有挑战性。第一步也是最关键的一步,是确定一个有待优化的生物学合理模型。许多已提出的模型部分或完全依赖于减少解中独特单倍型的数量。
本文使用已知单倍型以及来自国际人类基因组单体型图计划(HapMap计划)的基因型来研究单倍型的简约性。我们的研究表明,对于具有已知解的数据集,独特单倍型相对较少,但并非总是最少的。此外,我们表明通常存在大量简约解,并且随着基数增加,数量呈指数增长。而且,这些解差异很大,其中大多数与真实解不一致。这些结果量化了纯简约模型的局限性,并表明迫切需要考虑单倍型推断模型的其他属性。在更高层面且具有广泛适用性的情况下,本文说明了组合方法在揭示给定生物学模型中的缺陷方面的作用。