Department of Electrical and Computer Engineering, Clarkson University, Potsdam, NY 13676, USA.
J Biomed Inform. 2010 Feb;43(1):51-9. doi: 10.1016/j.jbi.2009.08.009. Epub 2009 Aug 20.
Problems of haplotyping and block partitioning have been extensively studied with regard to the regular genotype data, but more cost-efficient data called XOR-genotypes remain under-investigated. Previous studies developed methods for haplotyping of short-sequence partial XOR-genotypes. In this paper we propose a new algorithm that performs haplotyping of long-range partial XOR-genotype data with possibility of missing entries, and in addition simultaneously finds the block structure for the given data. Our method is implemented as a fast and practical algorithm. We also investigate the effect of the percentage of fully genotyped individuals in a sample on the accuracy of results with and without the missing data. The algorithm is validated by testing on the HapMap data. Obtained results show good prediction rates both for samples with and without missing data. The accuracy of prediction of XOR sites is not significantly affected by the presence of 10% or less missing data.
针对常规基因型数据,已经对单体型分型和块分区问题进行了广泛研究,但称为异或基因型(XOR-genotypes)的数据成本效益更高,研究却相对较少。先前的研究已经开发出用于短序列部分 XOR 基因型单体型分型的方法。在本文中,我们提出了一种新算法,可对具有缺失项的长范围部分 XOR 基因型数据进行单体型分型,并同时为给定数据找到块结构。我们的方法被实现为一种快速实用的算法。我们还研究了样本中完全基因型个体的百分比对有缺失数据和无缺失数据的结果准确性的影响。该算法通过在 HapMap 数据上进行测试得到了验证。对于有缺失数据和无缺失数据的样本,获得的结果均显示出良好的预测率。XOR 位点的预测准确性不受 10%或更少缺失数据的存在的显著影响。