Quade Shannon R E, Elston Robert C, Goddard Katrina A B
Department of Epidemiology and Biostatistics, Case Western Reserve University, 2103 Cornell Rd, Cleveland, Ohio 44106-7281, USA.
BMC Genet. 2005 May 19;6:25. doi: 10.1186/1471-2156-6-25.
Maximum likelihood estimates of haplotype frequencies can be obtained from pooled DNA using the expectation maximization (EM) algorithm. Through simulation, we investigate the effect of genotyping error on the accuracy of haplotype frequency estimates obtained using this algorithm. We explore model parameters including allele frequency, inter-marker linkage disequilibrium (LD), genotyping error rate, and pool size.
Pool sizes of 2, 5, and 10 individuals achieved comparable levels of accuracy in the estimation procedure. Common marker allele frequencies and no inter-marker LD result in less accurate estimates. This pattern is observed regardless of the amount of genotyping error simulated.
Genotyping error slightly decreases the accuracy of haplotype frequency estimates. However, the EM algorithm performs well even in the presence of genotyping error. Overall, pools of 2, 5, and 10 individuals yield similar accuracy of the haplotype frequency estimates, while reducing costs due to genotyping.
单倍型频率的最大似然估计可通过使用期望最大化(EM)算法从混合DNA中获得。通过模拟,我们研究了基因分型错误对使用该算法获得的单倍型频率估计准确性的影响。我们探讨了包括等位基因频率、标记间连锁不平衡(LD)、基因分型错误率和混合样本大小在内的模型参数。
在估计过程中,2、5和10个个体的混合样本大小达到了相当的准确性水平。常见标记等位基因频率和无标记间LD会导致估计准确性降低。无论模拟的基因分型错误量如何,都会观察到这种模式。
基因分型错误会略微降低单倍型频率估计的准确性。然而,即使存在基因分型错误,EM算法仍表现良好。总体而言,2、5和10个个体的混合样本产生的单倍型频率估计准确性相似,同时降低了基因分型成本。