Suppr超能文献

关于 PCR 重复的原因、后果和避免:构建文库复杂度理论。

On the causes, consequences, and avoidance of PCR duplicates: Towards a theory of library complexity.

机构信息

Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, USA.

Department of Evolution, Ecology, and Behavior, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

出版信息

Mol Ecol Resour. 2023 Aug;23(6):1299-1318. doi: 10.1111/1755-0998.13800. Epub 2023 Apr 16.

Abstract

Library preparation protocols for most sequencing technologies involve PCR amplification of the template DNA, which open the possibility that a given template DNA molecule is sequenced multiple times. Reads arising from this phenomenon, known as PCR duplicates, inflate the cost of sequencing and can jeopardize the reliability of affected experiments. Despite the pervasiveness of this artefact, our understanding of its causes and of its impact on downstream statistical analyses remains essentially empirical. Here, we develop a general quantitative model of amplification distortions in sequencing data sets, which we leverage to investigate the factors controlling the occurrence of PCR duplicates. We show that the PCR duplicate rate is determined primarily by the ratio between library complexity and sequencing depth, and that amplification noise (including in its dependence on the number of PCR cycles) only plays a secondary role for this artefact. We confirm our predictions using new and published RAD-seq libraries and provide a method to estimate library complexity and amplification noise in any data set containing PCR duplicates. We discuss how amplification-related artefacts impact downstream analyses, and in particular genotyping accuracy. The proposed framework unites the numerous observations made on PCR duplicates and will be useful to experimenters of all sequencing technologies where DNA availability is a concern.

摘要

文库制备方案大多数测序技术都涉及模板 DNA 的 PCR 扩增,这使得给定的模板 DNA 分子有可能被多次测序。这种现象产生的读段称为 PCR 重复,会增加测序成本,并可能危及受影响实验的可靠性。尽管这种人为因素普遍存在,但我们对其原因及其对下游统计分析影响的理解仍然主要是经验性的。在这里,我们开发了一个通用的测序数据集扩增扭曲的定量模型,我们利用该模型来研究控制 PCR 重复发生的因素。我们表明,PCR 重复率主要由文库复杂度和测序深度的比值决定,而扩增噪声(包括其对 PCR 循环数的依赖性)仅对这种人为因素起次要作用。我们使用新的和已发表的 RAD-seq 文库验证了我们的预测,并提供了一种方法来估计任何包含 PCR 重复的数据集的文库复杂度和扩增噪声。我们讨论了扩增相关的人为因素如何影响下游分析,特别是基因分型的准确性。所提出的框架将聚合酶链反应重复的众多观察结果统一起来,对于所有需要考虑 DNA 可用性的测序技术的实验者来说都将是有用的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验