Department of Electrical Engineering, Columbia University, 500 W 120th St, New York, 10027 NY, USA.
BMC Genomics. 2013 Sep 23;14:645. doi: 10.1186/1471-2164-14-645.
Xor-genotype is a cost-effective alternative to the genotype sequence of an individual. Recent methods developed for haplotype inference have aimed at finding the solution based on xor-genotype data. Given the xor-genotypes of a group of unrelated individuals, it is possible to infer the haplotype pairs for each individual with the aid of a small number of regular genotypes.
We propose a framework of maximum parsimony inference of haplotypes based on the search of a sparse dictionary, and we present a greedy method that can effectively infer the haplotype pairs given a set of xor-genotypes augmented by a small number of regular genotypes. We test the performance of the proposed approach on synthetic data sets with different number of individuals and SNPs, and compare the performances with the state-of-the-art xor-haplotyping methods PPXH and XOR-HAPLOGEN.
Experimental results show good inference qualities for the proposed method under all circumstances, especially on large data sets. Results on a real database, CFTR, also demonstrate significantly better performance. The proposed algorithm is also capable of finding accurate solutions with missing data and/or typing errors.
异或基因型是一种比个体基因型序列更具成本效益的选择。最近开发的用于单倍型推断的方法旨在基于异或基因型数据找到解决方案。给定一组无关个体的异或基因型,可以借助少数常规基因型来推断每个个体的单倍型对。
我们提出了一种基于稀疏字典搜索的最大简约单倍型推断框架,并提出了一种贪婪方法,该方法可以在给定一组异或基因型并增加少量常规基因型的情况下有效地推断单倍型对。我们在具有不同个体和 SNP 数量的合成数据集上测试了所提出方法的性能,并将性能与最先进的异或单倍型方法 PPXH 和 XOR-HAPLOGEN 进行了比较。
实验结果表明,该方法在所有情况下都具有良好的推断质量,尤其是在大型数据集上。在真实数据库 CFTR 上的结果也证明了其性能显著提高。该算法还能够在存在缺失数据和/或打字错误的情况下找到准确的解决方案。