Zhu Wen-Sheng, Fung Wing-Kam, Guo Jianhua
Key Laboratory for Applied Statistics of MOE and School of Mathematics and Statistics, Northeast Normal University, Changchun, SAR, China.
Hum Hered. 2007;64(3):172-81. doi: 10.1159/000102990. Epub 2007 May 25.
Haplotype frequency estimation is indispensable in studies of human genetics based on haplotypes since studies based on haplotypes are likely to yield more information than those based on single SNP marker. However, most existing algorithms estimate haplotype frequencies under the assumption that all of the genotype data sets are correct. To date, nearly all large genotype data sets have errors, and studies have demonstrated that even a small quantity of genotyping errors can have enormous impact on haplotype frequency estimation.
Although the GenoSpectrum (GS)-EM algorithm which estimates haplotype frequencies incorporating genotyping uncertainty has been presented recently [1], it can only be suitable for independent individuals rather than dependent pedigree data. In this paper, we describe a new EM algorithm, called GS-PEM, that calculates maximum likelihood estimates (MLEs) of haplotype frequencies based on all possible multilocus genotypes (GenoSpectrum) of each member of the pedigrees through making use of the dependence information of relatives.
We evaluate the performance of the GS-PEM by simulation studies and find that our GS-PEM can reduce the impact induced by the genotyping errors in haplotype frequency estimation.
在基于单倍型的人类遗传学研究中,单倍型频率估计是不可或缺的,因为基于单倍型的研究可能比基于单个单核苷酸多态性(SNP)标记的研究产生更多信息。然而,大多数现有算法在所有基因型数据集都正确的假设下估计单倍型频率。迄今为止,几乎所有大型基因型数据集都存在错误,并且研究表明,即使少量的基因分型错误也会对单倍型频率估计产生巨大影响。
尽管最近提出了结合基因分型不确定性来估计单倍型频率的基因谱(GS)-期望最大化(EM)算法[1],但它仅适用于独立个体,而不适用于相关的家系数据。在本文中,我们描述了一种新的EM算法,称为GS-PEM,它通过利用亲属的相关性信息,基于家系中每个成员的所有可能多位点基因型(基因谱)来计算单倍型频率的最大似然估计(MLE)。
我们通过模拟研究评估了GS-PEM的性能,发现我们的GS-PEM可以减少基因分型错误在单倍型频率估计中所产生的影响。