Liu W, Yang T, Zhao W, Chase G A
Global Discovery and Development Stats, Eli Lilly and Company, Indianapolis, IN 46285, USA.
Ann Hum Genet. 2007 Jul;71(Pt 4):467-79. doi: 10.1111/j.1469-1809.2007.00354.x. Epub 2007 Mar 7.
One limitation of the existing tagging SNP selection algorithms is that they assume the reported genotypes are error free. However, genotyping errors are often unavoidable in practice. Many tagging SNP selection methods depend heavily on the estimated haplotype frequencies. Recent studies have demonstrated that even slight genotyping errors can lead to serious consequences with regard to haplotype reconstruction and frequency estimation. Here we present a tagging SNP selection method that allows for genotyping errors. Our method is a modification of the pair-wise r(2) tagging SNP selection algorithm proposed by Carlson et al. (2004). We have replaced the standard EM algorithm in Carlson's method with an EM that accounts for genotyping errors, in an attempt to obtain better estimates of the haplotype frequencies and r(2) measure. Through simulation studies we compared the performance of our modified algorithm with that of the original algorithm. We found that the number of tags selected by both methods increased with increasing genotyping errors, though our method led to smaller increase. The power of haplotype association tests using the selected tags decreased dramatically with increasing genotyping errors. The power of single marker tests also decreased, but the reduction was not as much as the reduction in power of haplotype tests. When restricting the mean number of tags selected by both methods to be similar to the baseline number, Carlson's method and our method led to similar power for the subsequent haplotype and single marker tests. Our results showed that, by accounting for random genotyping errors, our method can select tagging SNPs more efficiently than Carlson's method. The computer program that implements our modified tagging SNP selection algorithm is available at our web site: http://www.personal.psu.edu/tuy104/.
现有标签单核苷酸多态性(tagging SNP)选择算法的一个局限性在于,它们假定所报告的基因型无误差。然而,在实际操作中基因分型错误往往不可避免。许多标签SNP选择方法严重依赖于估计的单倍型频率。最近的研究表明,即使是轻微的基因分型错误也可能在单倍型重建和频率估计方面导致严重后果。在此,我们提出一种允许基因分型错误存在的标签SNP选择方法。我们的方法是对Carlson等人(2004年)提出的成对r(2)标签SNP选择算法的改进。我们用一种考虑基因分型错误的期望最大化(EM)算法取代了Carlson方法中的标准EM算法,试图获得对单倍型频率和r(2)度量的更好估计。通过模拟研究,我们将改进算法的性能与原始算法进行了比较。我们发现,随着基因分型错误增加,两种方法选择的标签数量都有所增加,不过我们的方法增加幅度较小。使用所选标签进行单倍型关联检验的效能随着基因分型错误增加而大幅下降。单标记检验的效能也有所下降,但下降幅度不如单倍型检验的效能下降幅度大。当将两种方法选择的标签平均数量限制为与基线数量相似时,Carlson方法和我们的方法在后续单倍型和单标记检验中产生的效能相似。我们的结果表明,通过考虑随机基因分型错误,我们的方法比Carlson方法能更有效地选择标签SNP。实现我们改进的标签SNP选择算法的计算机程序可在我们的网站获取:http://www.personal.psu.edu/tuy104/ 。