Centre for Ecology, Evolution and Conservation, School of Biological Sciences, University of East Anglia, 14 Norwich, NR4 7TJ, UK.
Mol Ecol Resour. 2015 Jan;15(1):28-41. doi: 10.1111/1755-0998.12291. Epub 2014 Jul 3.
Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.
限制性位点相关 DNA 测序 (RADseq) 为研究人员提供了在非模式生物中记录数千个基因座遗传多态性的能力,有可能彻底改变分子生态学领域。然而,与其他基因分型方法一样,RADseq 容易受到多种来源的错误的影响,这些错误可能对种群遗传推断产生重大影响,但在估计和报告基因分型错误率方面,这些错误只受到了有限的关注。在这里,我们使用个体样本重复,在期望相同基因型的情况下,在没有参考基因组的情况下量化基因分型错误。然后,我们使用样本重复来:(i) 通过最小化错误并最大化检索信息性基因座,在程序 Stacks 内优化从头组装参数;(ii) 量化基因座、等位基因和单核苷酸多态性的错误率。作为一个经验实例,我们使用了来自墨西哥高海拔山区的非模式植物物种 Berberis alpina 的双酶切 RAD 数据集。