Miller Craig R, Joyce Paul, Waits Lisette P
Department of Fish and Wildlife, College of Natural Resources, University of Idaho, Moscow, Idaho 83844, USA.
Genetics. 2002 Jan;160(1):357-66. doi: 10.1093/genetics/160.1.357.
A growing number of population genetic studies utilize nuclear DNA microsatellite data from museum specimens and noninvasive sources. Genotyping errors are elevated in these low quantity DNA sources, potentially compromising the power and accuracy of the data. The most conservative method for addressing this problem is effective, but requires extensive replication of individual genotypes. In search of a more efficient method, we developed a maximum-likelihood approach that minimizes errors by estimating genotype reliability and strategically directing replication at loci most likely to harbor errors. The model assumes that false and contaminant alleles can be removed from the dataset and that the allelic dropout rate is even across loci. Simulations demonstrate that the proposed method marks a vast improvement in efficiency while maintaining accuracy. When allelic dropout rates are low (0-30%), the reduction in the number of PCR replicates is typically 40-50%. The model is robust to moderate violations of the even dropout rate assumption. For datasets that contain false and contaminant alleles, a replication strategy is proposed. Our current model addresses only allelic dropout, the most prevalent source of genotyping error. However, the developed likelihood framework can incorporate additional error-generating processes as they become more clearly understood.
越来越多的群体遗传学研究利用博物馆标本和非侵入性来源的核DNA微卫星数据。在这些低质量DNA来源中,基因分型错误率较高,这可能会损害数据的效力和准确性。解决这个问题的最保守方法是有效的,但需要对个体基因型进行大量重复检测。为了寻找一种更有效的方法,我们开发了一种最大似然法,该方法通过估计基因型可靠性并在最有可能存在错误的位点上有策略地指导重复检测,从而将错误降至最低。该模型假设可以从数据集中去除错误和污染等位基因,并且等位基因缺失率在各个位点上是均匀的。模拟结果表明,所提出的方法在保持准确性的同时,效率有了大幅提高。当等位基因缺失率较低(0-30%)时,PCR重复次数通常可减少40-50%。该模型对于等位基因缺失率均匀性假设的适度违背具有稳健性。对于包含错误和污染等位基因的数据集,我们提出了一种重复检测策略。我们目前的模型仅处理等位基因缺失这一最普遍的基因分型错误来源。然而,随着对其他产生错误过程的理解更加清晰,所开发的似然框架可以纳入这些过程。