Arantes Larissa S, Caccavo Jilda A, Sullivan James K, Sparmann Sarah, Mbedi Susan, Höner Oliver P, Mazzoni Camila J
Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany.
Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany.
Mol Ecol Resour. 2025 Jul;25(5):e13859. doi: 10.1111/1755-0998.13859. Epub 2023 Aug 30.
Genetic non-invasive sampling (gNIS) is a critical tool for population genetics studies, supporting conservation efforts while imposing minimal impacts on wildlife. However, gNIS often presents variable levels of DNA degradation and non-endogenous contamination, which can incur considerable processing costs. Furthermore, the use of restriction-site-associated DNA sequencing methods (RADseq) for assessing thousands of genetic markers introduces the challenge of obtaining large sets of shared loci with similar coverage across multiple individuals. Here, we present an approach to handling large-scale gNIS-based datasets using data from the spotted hyena population inhabiting the Ngorongoro Crater in Tanzania. We generated 3RADseq data for more than a thousand individuals, mostly from faecal mucus samples collected non-invasively and varying in DNA degradation and contamination level. Using small-scale sequencing, we screened samples for endogenous DNA content, removed highly contaminated samples, confirmed overlap fragment length between libraries, and balanced individual representation in a sequencing pool. We evaluated the impact of (1) DNA degradation and contamination of non-invasive samples, (2) PCR duplicates and (3) different SNP filters on genotype accuracy based on Mendelian error estimated for parent-offspring trio datasets. Our results showed that when balanced for sequencing depth, contaminated samples presented similar genotype error rates to those of non-contaminated samples. We also showed that PCR duplicates and different SNP filters impact genotype accuracy. In summary, we showed the potential of using gNIS for large-scale genetic monitoring based on SNPs and demonstrated how to improve control over library preparation by using a weighted re-pooling strategy that considers the endogenous DNA content.
遗传非侵入性采样(gNIS)是种群遗传学研究的关键工具,在对野生动物造成最小影响的同时支持保护工作。然而,gNIS常常呈现出不同程度的DNA降解和非内源性污染,这可能会产生相当高的处理成本。此外,使用与限制性位点相关的DNA测序方法(RADseq)来评估数千个遗传标记带来了挑战,即要在多个个体中获得具有相似覆盖度的大量共享位点。在此,我们提出一种利用坦桑尼亚恩戈罗恩戈罗火山口中斑鬣狗种群的数据来处理基于gNIS的大规模数据集的方法。我们为一千多个个体生成了3RADseq数据,这些个体大多来自非侵入性采集的粪便黏液样本,其DNA降解和污染水平各不相同。通过小规模测序,我们筛选样本的内源性DNA含量,去除高度污染的样本,确认文库之间的重叠片段长度,并在测序池中平衡个体代表性。我们基于亲子三联体数据集估计的孟德尔误差,评估了(1)非侵入性样本的DNA降解和污染、(2)PCR重复序列以及(3)不同的单核苷酸多态性(SNP)过滤对基因型准确性的影响。我们的结果表明,在测序深度平衡后,受污染样本的基因型错误率与未受污染样本相似。我们还表明,PCR重复序列和不同的SNP过滤会影响基因型准确性。总之,我们展示了基于单核苷酸多态性利用gNIS进行大规模遗传监测的潜力,并演示了如何通过使用考虑内源性DNA含量的加权重新分组策略来加强对文库制备的控制。