Zhang Kui, Zhao Hongyu
Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama 35294-0022, USA.
Genet Epidemiol. 2006 Jul;30(5):423-37. doi: 10.1002/gepi.20154.
Haplotype inference for tightly linked markers from general pedigrees remains a challenging problem. Only a few methods are available to efficiently and accurately estimate haplotype frequencies and reconstruct haplotypes for a large number of tightly linked markers from general pedigrees in the presence of missing data, and their performance has not been carefully and extensively evaluated. In this paper, we compare four published methods for haplotype reconstruction and frequency estimation for tightly linked markers from general pedigrees, including HAPLORE, GENEHUNTER, PedPhase, and MERLIN. We review these methods and discuss the differences between them in terms of the models and computational strategies employed. We assess their performance based on simulations using pedigrees and haplotypes on tightly linked single nucleotide polymorphisms from real studies. We investigate the effect of several factors, including the missing rate, the departure from Hardy-Weinberg Equilibrium, and the sample size, on the accuracy for haplotype inference. We also compare these methods with a widely used method for haplotype inference from unrelated individuals, PHASE, by treating individuals within a pedigree as unrelated samples. This comparison allows us to investigate the relative efficiency in haplotype inference using pedigree data. Our results indicate that incorporation of pedigree information can improve the precision for haplotype frequency estimation and the accuracy for haplotype reconstruction. Among four haplotyping methods capable of analyzing general pedigrees, HAPLORE and MERLIN have comparable performance and outperform the other two methods in almost all situations.
从一般家系中对紧密连锁标记进行单倍型推断仍然是一个具有挑战性的问题。在存在缺失数据的情况下,只有少数几种方法可用于有效且准确地估计单倍型频率,并为来自一般家系的大量紧密连锁标记重建单倍型,而且它们的性能尚未得到仔细且广泛的评估。在本文中,我们比较了四种已发表的用于从一般家系中对紧密连锁标记进行单倍型重建和频率估计的方法,包括HAPLORE、GENEHUNTER、PedPhase和MERLIN。我们回顾了这些方法,并从所采用的模型和计算策略方面讨论了它们之间的差异。我们基于使用来自实际研究的紧密连锁单核苷酸多态性的家系和单倍型进行的模拟来评估它们的性能。我们研究了几个因素,包括缺失率、偏离哈迪 - 温伯格平衡的程度以及样本量,对单倍型推断准确性的影响。我们还通过将家系中的个体视为无关样本,将这些方法与一种广泛用于从不相关个体进行单倍型推断的方法PHASE进行比较。这种比较使我们能够研究使用家系数据进行单倍型推断的相对效率。我们的结果表明,纳入家系信息可以提高单倍型频率估计的精度和单倍型重建的准确性。在能够分析一般家系的四种单倍型分型方法中,HAPLORE和MERLIN具有可比的性能,并且在几乎所有情况下都优于其他两种方法。