Sazonova Nadezhda, Harner E James
Department of Mathematics and Computer Science, Clarkson University, Potsdam, NY 13676, USA.
J Bioinform Comput Biol. 2008 Dec;6(6):1177-92. doi: 10.1142/s0219720008003898.
Multi-population haplotype inference and block partitioning is a difficult task when dealing with mixed genotype samples. A number of studies have shown that the haplotype block structures, as well as the collections of common haplotypes and their frequencies, vary significantly among world populations. These differences are more extreme when the geographical locations for the populations are more distant. Some of the previous studies performed haplotype inference in multi-population samples with known population assignment. Others developed algorithms for clustering of the mixed haplotype or genotype samples with different block structures or genetic marker profiles. We present a new algorithm that performs haplotype inference and block partitioning in a mixed sample of genotypes from two populations when the population assignments are not known. Given a mixed genotype sample, the proposed algorithm (HAPLOCLUST) extracts two clusters of genotypes with different block structures in addition to performing haplotype inference on each of these clusters. When tested on a set of unrelated individuals, our algorithm provides correct assignments comparable to those of two state-of-the-art algorithms for population stratification. The contribution of HAPLOCLUST consists of performing haplotype/block-based population stratification and simultaneously finding the haplotype resolution and block partitioning for the extracted clusters.
在处理混合基因型样本时,多群体单倍型推断和区块划分是一项艰巨的任务。许多研究表明,单倍型区块结构以及常见单倍型的集合及其频率在世界各群体之间存在显著差异。当群体的地理位置相距更远时,这些差异会更加极端。之前的一些研究在已知群体归属的多群体样本中进行单倍型推断。其他研究则开发了用于对具有不同区块结构或遗传标记谱的混合单倍型或基因型样本进行聚类的算法。我们提出了一种新算法,当群体归属未知时,该算法可在来自两个群体的基因型混合样本中进行单倍型推断和区块划分。给定一个混合基因型样本,除了对这些聚类中的每一个进行单倍型推断外,所提出的算法(HAPLOCLUST)还能提取出具有不同区块结构的两个基因型聚类。在一组无关个体上进行测试时,我们的算法提供的正确分类与两种用于群体分层的先进算法相当。HAPLOCLUST的贡献在于执行基于单倍型/区块的群体分层,并同时为提取的聚类找到单倍型分辨率和区块划分。