Suppr超能文献

扩大适用于大量非侵入性样本数据集的RADseq方法:文库构建和数据预处理的经验教训

Scaling-up RADseq methods for large datasets of non-invasive samples: Lessons for library construction and data preprocessing.

作者信息

Arantes Larissa S, Caccavo Jilda A, Sullivan James K, Sparmann Sarah, Mbedi Susan, Höner Oliver P, Mazzoni Camila J

机构信息

Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany.

Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany.

出版信息

Mol Ecol Resour. 2025 Jul;25(5):e13859. doi: 10.1111/1755-0998.13859. Epub 2023 Aug 30.

Abstract

Genetic non-invasive sampling (gNIS) is a critical tool for population genetics studies, supporting conservation efforts while imposing minimal impacts on wildlife. However, gNIS often presents variable levels of DNA degradation and non-endogenous contamination, which can incur considerable processing costs. Furthermore, the use of restriction-site-associated DNA sequencing methods (RADseq) for assessing thousands of genetic markers introduces the challenge of obtaining large sets of shared loci with similar coverage across multiple individuals. Here, we present an approach to handling large-scale gNIS-based datasets using data from the spotted hyena population inhabiting the Ngorongoro Crater in Tanzania. We generated 3RADseq data for more than a thousand individuals, mostly from faecal mucus samples collected non-invasively and varying in DNA degradation and contamination level. Using small-scale sequencing, we screened samples for endogenous DNA content, removed highly contaminated samples, confirmed overlap fragment length between libraries, and balanced individual representation in a sequencing pool. We evaluated the impact of (1) DNA degradation and contamination of non-invasive samples, (2) PCR duplicates and (3) different SNP filters on genotype accuracy based on Mendelian error estimated for parent-offspring trio datasets. Our results showed that when balanced for sequencing depth, contaminated samples presented similar genotype error rates to those of non-contaminated samples. We also showed that PCR duplicates and different SNP filters impact genotype accuracy. In summary, we showed the potential of using gNIS for large-scale genetic monitoring based on SNPs and demonstrated how to improve control over library preparation by using a weighted re-pooling strategy that considers the endogenous DNA content.

摘要

遗传非侵入性采样(gNIS)是种群遗传学研究的关键工具,在对野生动物造成最小影响的同时支持保护工作。然而,gNIS常常呈现出不同程度的DNA降解和非内源性污染,这可能会产生相当高的处理成本。此外,使用与限制性位点相关的DNA测序方法(RADseq)来评估数千个遗传标记带来了挑战,即要在多个个体中获得具有相似覆盖度的大量共享位点。在此,我们提出一种利用坦桑尼亚恩戈罗恩戈罗火山口中斑鬣狗种群的数据来处理基于gNIS的大规模数据集的方法。我们为一千多个个体生成了3RADseq数据,这些个体大多来自非侵入性采集的粪便黏液样本,其DNA降解和污染水平各不相同。通过小规模测序,我们筛选样本的内源性DNA含量,去除高度污染的样本,确认文库之间的重叠片段长度,并在测序池中平衡个体代表性。我们基于亲子三联体数据集估计的孟德尔误差,评估了(1)非侵入性样本的DNA降解和污染、(2)PCR重复序列以及(3)不同的单核苷酸多态性(SNP)过滤对基因型准确性的影响。我们的结果表明,在测序深度平衡后,受污染样本的基因型错误率与未受污染样本相似。我们还表明,PCR重复序列和不同的SNP过滤会影响基因型准确性。总之,我们展示了基于单核苷酸多态性利用gNIS进行大规模遗传监测的潜力,并演示了如何通过使用考虑内源性DNA含量的加权重新分组策略来加强对文库制备的控制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/359a/12142721/6dc69c3344d1/MEN-25-e13859-g005.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验