Xu Hongyan, Wu Xifeng, Spitz Margaret R, Shete Sanjay
Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA.
Hum Hered. 2004;58(2):63-8. doi: 10.1159/000083026.
Haplotypes are gaining popularity in studies of human genetics because they contain more information than does a single gene locus. However, current high-throughput genotyping techniques cannot produce haplotype information. Several statistical methods have recently been proposed to infer haplotypes based on unphased genotypes at several loci. The accuracy, efficiency, and computational time of these methods have been under intense scrutiny. In this report, our aim was to evaluate haplotype inference methods for genotypic data from unrelated individuals.
We compared the performance of three haplotype inference methods that are currently in use--HAPLOTYPER, hap, and PHASE--by applying them to a large data set from unrelated individuals with known haplotypes. We also applied these methods to coalescent-based simulation studies using both constant size and exponential growth models. The performance of these methods, along with that of the expectation-maximization algorithm, was further compared in the context of an association study.
While the algorithm implemented in the software PHASE was found to be the most accurate in both real and simulated data comparisons, all four methods produced good results in the association study.
单倍型在人类遗传学研究中越来越受到关注,因为它们比单个基因座包含更多信息。然而,当前的高通量基因分型技术无法产生单倍型信息。最近提出了几种统计方法,用于根据多个位点的未分型基因型推断单倍型。这些方法的准确性、效率和计算时间受到了严格审查。在本报告中,我们的目的是评估针对无关个体的基因型数据的单倍型推断方法。
我们通过将三种目前正在使用的单倍型推断方法——HAPLOTYPER、hap和PHASE——应用于来自具有已知单倍型的无关个体的大型数据集,比较了它们的性能。我们还将这些方法应用于使用恒定大小和指数增长模型的基于合并的模拟研究。在关联研究的背景下,进一步比较了这些方法以及期望最大化算法的性能。
虽然发现软件PHASE中实现的算法在真实数据和模拟数据比较中最准确,但所有四种方法在关联研究中都产生了良好的结果。