Institute of Zoology, Zoological Society of London, London, NW1 4RY, UK.
Heredity (Edinb). 2019 Jun;122(6):719-728. doi: 10.1038/s41437-018-0178-7. Epub 2019 Jan 10.
Marker genotype data could suffer from a high rate of errors such as false alleles and allelic dropouts (null alleles) in situations such as SNPs from low-coverage next-generation sequencing and microsatellites from noninvasive samples. Use of such data without accounting for mistyping properly could lead to inaccurate or incorrect inferences of family relationships such as parentage and sibship. This study shows that markers with a high error rate are still informative. Simply discarding them could cause a substantial loss of precious information, and is impractical in situations where virtually all markers (e.g. SNPs from low-coverage next-generation sequencing, microsatellites from noninvasive samples) suffer from a similarly high error rate. This study also shows that some previous error models are valid for markers of low error rates, but fail for markers of high error rates. It proposes an improved error model and demonstrates, using simulated and empirical data of a high error rate (say, >0.5), that it leads to more accurate sibship and parentage inferences than previous models. It suggests that, in reality, markers of high error rates should be used rather than discarded in pedigree reconstruction, so long as the error rates can be estimated and used properly in the analyses.
在某些情况下,例如低覆盖深度的下一代测序中的 SNPs 和非侵入性样本中的微卫星,标记基因型数据可能会出现较高的错误率,如假等位基因和等位基因缺失(无效等位基因)。如果不适当考虑误配而使用此类数据,可能会导致亲子关系和兄弟姐妹关系等家族关系的推断不准确或不正确。本研究表明,具有高错误率的标记仍然具有信息性。简单地丢弃它们可能会导致大量宝贵信息的丢失,并且在几乎所有标记(例如,低覆盖深度的下一代测序中的 SNPs,非侵入性样本中的微卫星)都受到类似高错误率影响的情况下,这是不切实际的。本研究还表明,一些先前的错误模型适用于低错误率的标记,但不适用于高错误率的标记。它提出了一种改进的错误模型,并使用具有高错误率(例如,>0.5)的模拟和经验数据进行了演示,表明与先前的模型相比,它可以更准确地推断兄弟姐妹关系和亲子关系。它表明,在实际应用中,只要可以正确估计和使用分析中的错误率,就应该在系谱重建中使用高错误率的标记,而不是丢弃它们。