扩大适用于大量非侵入性样本数据集的RADseq方法：文库构建和数据预处理的经验教训

Scaling-up RADseq methods for large datasets of non-invasive samples: Lessons for library construction and data preprocessing.

作者信息

Arantes Larissa S, Caccavo Jilda A, Sullivan James K, Sparmann Sarah, Mbedi Susan, Höner Oliver P, Mazzoni Camila J

机构信息

Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Berlin, Germany.

Leibniz-Institut für Zoo- und Wildtierforschung (IZW), Berlin, Germany.

出版信息

Mol Ecol Resour. 2025 Jul;25(5):e13859. doi: 10.1111/1755-0998.13859. Epub 2023 Aug 30.

DOI:10.1111/1755-0998.13859

PMID:37646753

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12142721/

Abstract

Genetic non-invasive sampling (gNIS) is a critical tool for population genetics studies, supporting conservation efforts while imposing minimal impacts on wildlife. However, gNIS often presents variable levels of DNA degradation and non-endogenous contamination, which can incur considerable processing costs. Furthermore, the use of restriction-site-associated DNA sequencing methods (RADseq) for assessing thousands of genetic markers introduces the challenge of obtaining large sets of shared loci with similar coverage across multiple individuals. Here, we present an approach to handling large-scale gNIS-based datasets using data from the spotted hyena population inhabiting the Ngorongoro Crater in Tanzania. We generated 3RADseq data for more than a thousand individuals, mostly from faecal mucus samples collected non-invasively and varying in DNA degradation and contamination level. Using small-scale sequencing, we screened samples for endogenous DNA content, removed highly contaminated samples, confirmed overlap fragment length between libraries, and balanced individual representation in a sequencing pool. We evaluated the impact of (1) DNA degradation and contamination of non-invasive samples, (2) PCR duplicates and (3) different SNP filters on genotype accuracy based on Mendelian error estimated for parent-offspring trio datasets. Our results showed that when balanced for sequencing depth, contaminated samples presented similar genotype error rates to those of non-contaminated samples. We also showed that PCR duplicates and different SNP filters impact genotype accuracy. In summary, we showed the potential of using gNIS for large-scale genetic monitoring based on SNPs and demonstrated how to improve control over library preparation by using a weighted re-pooling strategy that considers the endogenous DNA content.

摘要

遗传非侵入性采样（gNIS）是种群遗传学研究的关键工具，在对野生动物造成最小影响的同时支持保护工作。然而，gNIS常常呈现出不同程度的DNA降解和非内源性污染，这可能会产生相当高的处理成本。此外，使用与限制性位点相关的DNA测序方法（RADseq）来评估数千个遗传标记带来了挑战，即要在多个个体中获得具有相似覆盖度的大量共享位点。在此，我们提出一种利用坦桑尼亚恩戈罗恩戈罗火山口中斑鬣狗种群的数据来处理基于gNIS的大规模数据集的方法。我们为一千多个个体生成了3RADseq数据，这些个体大多来自非侵入性采集的粪便黏液样本，其DNA降解和污染水平各不相同。通过小规模测序，我们筛选样本的内源性DNA含量，去除高度污染的样本，确认文库之间的重叠片段长度，并在测序池中平衡个体代表性。我们基于亲子三联体数据集估计的孟德尔误差，评估了（1）非侵入性样本的DNA降解和污染、（2）PCR重复序列以及（3）不同的单核苷酸多态性（SNP）过滤对基因型准确性的影响。我们的结果表明，在测序深度平衡后，受污染样本的基因型错误率与未受污染样本相似。我们还表明，PCR重复序列和不同的SNP过滤会影响基因型准确性。总之，我们展示了基于单核苷酸多态性利用gNIS进行大规模遗传监测的潜力，并演示了如何通过使用考虑内源性DNA含量的加权重新分组策略来加强对文库制备的控制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/359a/12142721/6dc69c3344d1/MEN-25-e13859-g005.jpg

相似文献

Scaling-up RADseq methods for large datasets of non-invasive samples: Lessons for library construction and data preprocessing.扩大适用于大量非侵入性样本数据集的RADseq方法：文库构建和数据预处理的经验教训

Mol Ecol Resour. 2025 Jul;25(5):e13859. doi: 10.1111/1755-0998.13859. Epub 2023 Aug 30.

Genotyping-in-Thousands by sequencing (GT-seq) panel development and application to minimally invasive DNA samples to support studies in molecular ecology.高通量测序基因分型（GT-seq）面板的开发及其在微创 DNA 样本中的应用，以支持分子生态学研究。

Mol Ecol Resour. 2020 Jan;20(1):114-124. doi: 10.1111/1755-0998.13090. Epub 2019 Sep 24.

How "simple" methodological decisions affect interpretation of population structure based on reduced representation library DNA sequencing: A case study using the lake whitefish.简单的方法学决策如何影响基于简化代表性文库 DNA 测序的群体结构解释：以湖白鲑为例的案例研究。

PLoS One. 2020 Jan 24;15(1):e0226608. doi: 10.1371/journal.pone.0226608. eCollection 2020.

Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping.基于下一代测序数据的群体等位基因频率估计：基于池与个体的基因分型。

Mol Ecol. 2013 Jul;22(14):3766-79. doi: 10.1111/mec.12360. Epub 2013 Jun 4.

RADcap: sequence capture of dual-digest RADseq libraries with identifiable duplicates and reduced missing data.RADcap：具有可识别重复序列和减少缺失数据的双酶切RADseq文库的序列捕获

Mol Ecol Resour. 2016 Sep;16(5):1264-78. doi: 10.1111/1755-0998.12566.

Genotype-free estimation of allele frequencies reduces bias and improves demographic inference from RADSeq data.无基因型估计等位基因频率可减少偏差并提高 RADSeq 数据的种群遗传推断准确性。

Mol Ecol Resour. 2019 May;19(3):586-596. doi: 10.1111/1755-0998.12990. Epub 2019 Apr 17.

A bioinformatic pipeline for identifying informative SNP panels for parentage assignment from RADseq data.一种用于从 RADseq 数据中识别用于亲子关系鉴定的信息 SNP 面板的生物信息学分析流程。

Mol Ecol Resour. 2018 Nov;18(6):1263-1281. doi: 10.1111/1755-0998.12910. Epub 2018 Jul 9.

Double Digest Restriction-Site Associated DNA Sequencing (ddRADseq) Technology.双酶切酶切位点关联 DNA 测序（ddRADseq）技术。

Methods Mol Biol. 2023;2638:37-57. doi: 10.1007/978-1-0716-3024-2_4.

Construction of a SNP-based genetic linkage map in cultivated peanut based on large scale marker development using next-generation double-digest restriction-site-associated DNA sequencing (ddRADseq).基于新一代双酶切限制性位点关联DNA测序（ddRADseq）大规模开发标记构建栽培花生的单核苷酸多态性（SNP）遗传连锁图谱。

BMC Genomics. 2014 May 9;15(1):351. doi: 10.1186/1471-2164-15-351.

Commonly used Hardy-Weinberg equilibrium filtering schemes impact population structure inferences using RADseq data.常用的 Hardy-Weinberg 平衡过滤方案会影响使用 RADseq 数据进行的群体结构推断。

Mol Ecol Resour. 2022 Oct;22(7):2599-2613. doi: 10.1111/1755-0998.13646. Epub 2022 Jun 5.

引用本文的文献

Green Turtle Conservation in the Genomic Era-Monitoring an Endangered Mediterranean Population and Its Breeding Habits.基因组时代的绿海龟保护——监测濒危的地中海种群及其繁殖习性

Ecol Evol. 2025 Apr 24;15(4):e71124. doi: 10.1002/ece3.71124. eCollection 2025 Apr.

Genomics Reveal Population Structure and Intergeneric Hybridization in an Endangered South American Bird: Implications for Management and Conservation.基因组学揭示一种濒危南美鸟类的种群结构和属间杂交：对管理和保护的启示

Ecol Evol. 2025 Jan 8;15(1):e70820. doi: 10.1002/ece3.70820. eCollection 2025 Jan.

FecalSeq enrichment with RAD Sequencing from non-invasive environmental samples holds promise for genetic monitoring of an imperiled lagomorph.利用 RAD 测序从非侵入性环境样本中富集粪便宏基因组，有望对濒危兔形目动物进行遗传监测。

Sci Rep. 2024 Jul 30;14(1):17575. doi: 10.1038/s41598-024-67764-6.

Southern marsh deer (Blastocerus dichotomus) populations assessed using Amplicon Sequencing on fecal samples.利用粪便样本的扩增子测序评估南方沼泽鹿（Blastocerus dichotomus）种群。

Sci Rep. 2024 Jul 13;14(1):16169. doi: 10.1038/s41598-024-67062-1.

本文引用的文献

On the causes, consequences, and avoidance of PCR duplicates: Towards a theory of library complexity.关于 PCR 重复的原因、后果和避免：构建文库复杂度理论。

Mol Ecol Resour. 2023 Aug;23(6):1299-1318. doi: 10.1111/1755-0998.13800. Epub 2023 Apr 16.

Testing the effectiveness of genetic monitoring using genetic non-invasive sampling.使用遗传非侵入性采样测试遗传监测的有效性。

Ecol Evol. 2021 Dec 27;12(1):e8459. doi: 10.1002/ece3.8459. eCollection 2022 Jan.

There Is No 'Rule of Thumb': Genomic Filter Settings for a Small Plant Population to Obtain Unbiased Gene Flow Estimates.不存在“经验法则”：用于小型植物种群以获得无偏基因流估计值的基因组过滤设置。

Front Plant Sci. 2021 Oct 14;12:677009. doi: 10.3389/fpls.2021.677009. eCollection 2021.

Facilitating population genomics of non-model organisms through optimized experimental design for reduced representation sequencing.通过优化简化代表性测序的实验设计，促进非模式生物的群体基因组学研究。

BMC Genomics. 2021 Aug 21;22(1):625. doi: 10.1186/s12864-021-07917-3.

RADSex: A computational workflow to study sex determination using restriction site-associated DNA sequencing data.RADSex：一种使用限制性位点相关 DNA 测序数据研究性别决定的计算工作流程。

Mol Ecol Resour. 2021 Jul;21(5):1715-1731. doi: 10.1111/1755-0998.13360. Epub 2021 Mar 9.

Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。

Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.

SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0：Python 中的科学计算基础算法。

Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

Maximize Resolution or Minimize Error? Using Genotyping-By-Sequencing to Investigate the Recent Diversification of (Cistaceae).最大化分辨率还是最小化误差？利用简化基因组测序技术研究半日花科植物的近期分化

Front Plant Sci. 2019 Nov 11;10:1416. doi: 10.3389/fpls.2019.01416. eCollection 2019.

Adapterama III: Quadruple-indexed, double/triple-enzyme RADseq libraries (2RAD/3RAD).衔接子测序法III：四重索引、双酶/三酶RADseq文库（2RAD/3RAD）

PeerJ. 2019 Oct 11;7:e7724. doi: 10.7717/peerj.7724. eCollection 2019.

Stacks 2: Analytical methods for paired-end sequencing improve RADseq-based population genomics.Stacks 2：用于双端测序的分析方法改进了基于 RADseq 的群体基因组学。

Mol Ecol. 2019 Nov;28(21):4737-4754. doi: 10.1111/mec.15253. Epub 2019 Oct 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

扩大适用于大量非侵入性样本数据集的RADseq方法：文库构建和数据预处理的经验教训

Scaling-up RADseq methods for large datasets of non-invasive samples: Lessons for library construction and data preprocessing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献