Ding Xiangdong, Zhang Qin, Flury Christine, Simianer Henner
Institute of Animal Breeding and Genetics, University of Goettingen, Goettingen, Germany.
Hum Hered. 2006;62(1):12-9. doi: 10.1159/000095598. Epub 2006 Sep 5.
Recent literature has suggested that haplotype inference through close relatives, especially from nuclear families can be an alternative strategy in determining the linkage phase. In this paper, haplotype reconstruction and estimation of haplotype frequencies via expectation maximization (EM) algorithm including nuclear families with only one parent available is proposed. Parent and his (her) child are treated as parent-child pair with one shared haplotype. This reduces the number of potential haplotype pairs for both parent and child separately, resulting in a higher accuracy of the estimation. In a series of simulations, the comparisons of PHASE, GENEHUNTER, EM-based approach for complete nuclear families and our approach are carried out. In all situations, EM-based approach for trio data is comparable but slightly worse error rate than PHASE, our approach is slightly better and much faster than PHASE for incomplete trios, the performance of GENEHUNTER is very bad in simple nuclear family settings and dramatically decreased with the number of markers being increased. On the other hand, the comparison result of different sampling designs demonstrates that sampling trios is the most efficient design to estimate haplotype frequencies in populations under same genotyping cost.
近期文献表明,通过近亲,尤其是核心家庭来进行单倍型推断可以作为确定连锁相的一种替代策略。本文提出了通过期望最大化(EM)算法进行单倍型重建和单倍型频率估计的方法,该方法包括仅有一位亲本可用的核心家庭。将亲本及其子女视为具有一个共享单倍型的亲子对。这分别减少了亲本和子女潜在单倍型对的数量,从而提高了估计的准确性。在一系列模拟中,对PHASE、GENEHUNTER、基于EM的完整核心家庭方法以及我们的方法进行了比较。在所有情况下,基于EM的三联体数据方法与PHASE相当,但错误率略高;对于不完整三联体,我们的方法比PHASE略好且速度快得多;在简单核心家庭环境中,GENEHUNTER的性能非常差,并且随着标记数量的增加而显著下降。另一方面,不同抽样设计的比较结果表明,在相同基因分型成本下,抽样三联体是估计群体中单倍型频率最有效的设计。