Molecular Ecology Lab, College of Science and Engineering, Flinders University, Adelaide, SA, Australia.
Mol Ecol Resour. 2018 May;18(3):381-390. doi: 10.1111/1755-0998.12739. Epub 2018 Jan 29.
There has been remarkably little attention to using the high resolution provided by genotyping-by-sequencing (i.e., RADseq and similar methods) for assessing relatedness in wildlife populations. A major hurdle is the genotyping error, especially allelic dropout, often found in this type of data that could lead to downward-biased, yet precise, estimates of relatedness. Here, we assess the applicability of genotyping-by-sequencing for relatedness inferences given its relatively high genotyping error rate. Individuals of known relatedness were simulated under genotyping error, allelic dropout and missing data scenarios based on an empirical ddRAD data set, and their true relatedness was compared to that estimated by seven relatedness estimators. We found that an estimator chosen through such analyses can circumvent the influence of genotyping error, with the estimator of Ritland (Genetics Research, 67, 175) shown to be unaffected by allelic dropout and to be the most accurate when there is genotyping error. We also found that the choice of estimator should not rely solely on the strength of correlation between estimated and true relatedness as a strong correlation does not necessarily mean estimates are close to true relatedness. We also demonstrated how even a large SNP data set with genotyping error (allelic dropout or otherwise) or missing data still performs better than a perfectly genotyped microsatellite data set of tens of markers. The simulation-based approach used here can be easily implemented by others on their own genotyping-by-sequencing data sets to confirm the most appropriate and powerful estimator for their data.
对于利用基因型测序(即 RADseq 和类似方法)提供的高分辨率来评估野生动物种群的亲缘关系,人们关注甚少。一个主要的障碍是基因分型错误,特别是等位基因缺失,这种类型的数据中经常会发现这种错误,这可能导致亲缘关系的估计值向下偏,但仍很精确。在这里,我们评估了基因型测序在存在相对较高基因分型错误率的情况下进行亲缘关系推断的适用性。根据经验性 ddRAD 数据集,在基因分型错误、等位基因缺失和缺失数据情况下模拟了具有已知亲缘关系的个体,并将其真实亲缘关系与七种亲缘关系估计量的估计值进行了比较。我们发现,通过这种分析选择的估计量可以规避基因分型错误的影响,其中 Ritland 估计量(遗传学研究,67,175)不受等位基因缺失的影响,并且在存在基因分型错误时最准确。我们还发现,估计器的选择不应仅依赖于估计相关关系与真实相关关系之间的相关性强度,因为强相关性并不一定意味着估计值接近真实相关关系。我们还展示了即使是具有基因分型错误(等位基因缺失或其他情况)或缺失数据的大型 SNP 数据集,也仍然比具有数十个标记的完美基因分型微卫星数据集表现更好。这里使用的基于模拟的方法可以很容易地由其他人在自己的基因型测序数据集上实施,以确认最适合和最强大的估计器适用于他们的数据。