• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于估计DNA和RNA测序实验中PCR重复率的计算方法。

A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

作者信息

Bansal Vikas

机构信息

Department of Pediatrics, School of Medicine, University of California San Diego, 9500 Gilman Drive, 92093, La JollaCA, USA.

出版信息

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):43. doi: 10.1186/s12859-017-1471-9.

DOI:10.1186/s12859-017-1471-9
PMID:28361665
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5374682/
Abstract

BACKGROUND

PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments.

RESULTS

In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples.

CONCLUSIONS

The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates .

摘要

背景

PCR扩增是高通量测序前DNA测序文库制备中的重要步骤。PCR扩增会在序列数据中引入冗余读段,估计PCR重复率对于评估此类读段的频率很重要。现有的计算方法无法区分PCR重复与代表独立DNA片段的“天然”读段重复,因此会高估DNA测序和RNA测序实验的PCR重复率。

结果

在本文中,我们提出了一种计算方法,通过利用个体基因组中的杂合变异来估计高通量序列数据集的平均PCR重复率,该方法考虑了天然读段重复。对模拟数据和千人基因组计划的外显子组序列数据的分析表明,我们的方法能够准确估计双端及单端读段数据集中的PCR重复率,这些数据集包含高比例的天然读段重复。此外,对使用Nextera文库制备方法制备的外显子组数据集的分析表明,45%-50%的读段重复对应于可能由于片段化偏差导致的天然读段重复。最后,对千人基因组计划中个体的RNA测序数据集的分析表明,在此类数据集中观察到的70%-95%的读段重复对应于从高表达基因中采样的天然重复,并识别出PCR重复率比其他样本高两倍的异常样本。

结论

本文所述方法是估计高通量序列数据集PCR重复率以及评估对应于天然读段重复的读段重复比例的有用工具。该方法的实现可在https://github.com/vibansal/PCRduplicates获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/0ff52cc41510/12859_2017_1471_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/d2c720561cdd/12859_2017_1471_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/37d21d37028c/12859_2017_1471_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/753789f39382/12859_2017_1471_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/053a1dd3745e/12859_2017_1471_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/0ff52cc41510/12859_2017_1471_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/d2c720561cdd/12859_2017_1471_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/37d21d37028c/12859_2017_1471_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/753789f39382/12859_2017_1471_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/053a1dd3745e/12859_2017_1471_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8b8/5374682/0ff52cc41510/12859_2017_1471_Fig5_HTML.jpg

相似文献

1
A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.一种用于估计DNA和RNA测序实验中PCR重复率的计算方法。
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):43. doi: 10.1186/s12859-017-1471-9.
2
dupRadar: a Bioconductor package for the assessment of PCR artifacts in RNA-Seq data.dupRadar:一个用于评估RNA测序数据中PCR假象的Bioconductor软件包。
BMC Bioinformatics. 2016 Oct 21;17(1):428. doi: 10.1186/s12859-016-1276-2.
3
Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.通过对异构体和外显子特异性读段测序率进行建模来改进RNA测序表达估计。
BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.
4
Bias from removing read duplication in ultra-deep sequencing experiments.超深度测序实验中去除重复读取所导致的偏差。
Bioinformatics. 2014 Apr 15;30(8):1073-1080. doi: 10.1093/bioinformatics/btt771. Epub 2014 Jan 2.
5
Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers.利用独特分子标识符消除 RNA-seq 和 small RNA-seq 中的 PCR 重复。
BMC Genomics. 2018 Jul 13;19(1):531. doi: 10.1186/s12864-018-4933-1.
6
Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy.退化接头序列可提高降低代表性测序数据中 PCR 重复检测的基因型调用准确性。
Mol Ecol Resour. 2015 Mar;15(2):329-36. doi: 10.1111/1755-0998.12314. Epub 2014 Sep 5.
7
ChimPipe: accurate detection of fusion genes and transcription-induced chimeras from RNA-seq data.ChimPipe:从RNA测序数据中准确检测融合基因和转录诱导嵌合体。
BMC Genomics. 2017 Jan 3;18(1):7. doi: 10.1186/s12864-016-3404-9.
8
The impact of amplification on differential expression analyses by RNA-seq.扩增对RNA测序差异表达分析的影响。
Sci Rep. 2016 May 9;6:25533. doi: 10.1038/srep25533.
9
Removing duplicate reads using graphics processing units.使用图形处理单元去除重复读数。
BMC Bioinformatics. 2016 Nov 8;17(Suppl 12):346. doi: 10.1186/s12859-016-1192-5.
10
A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.一种用于隐性营养不良型大疱性表皮松解症的单细胞 RNA-seq 分析的多任务聚类方法。
PLoS Comput Biol. 2018 Apr 9;14(4):e1006053. doi: 10.1371/journal.pcbi.1006053. eCollection 2018 Apr.

引用本文的文献

1
The impact of PCR duplication on RNAseq data generated using NovaSeq 6000, NovaSeq X, AVITI, and G4 sequencers.PCR重复对使用NovaSeq 6000、NovaSeq X、AVITI和G4测序仪生成的RNAseq数据的影响。
Genome Biol. 2025 May 28;26(1):145. doi: 10.1186/s13059-025-03613-7.
2
Sequali: efficient and comprehensive quality control of short- and long-read sequencing data.Sequali:对短读长和长读长测序数据进行高效且全面的质量控制
Bioinform Adv. 2025 Jan 29;5(1):vbaf010. doi: 10.1093/bioadv/vbaf010. eCollection 2025.
3
The TELCoMB Protocol for High-Sensitivity Detection of ARG-MGE Colocalizations in Complex Microbial Communities.

本文引用的文献

1
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
2
Improved Protocols for Illumina Sequencing.用于Illumina测序的改进方案。
Curr Protoc Hum Genet. 2014 Jan 21;80:18.2.1-42. doi: 10.1002/0471142905.hg1802s80.
3
High-fidelity target sequencing of individual molecules identified using barcode sequences: de novo detection and absolute quantitation of mutations in plasma cell-free DNA from cancer patients.使用条形码序列鉴定单个分子的高保真靶向测序:癌症患者血浆游离DNA中突变的从头检测和绝对定量。
TELCoMB 协议用于在复杂微生物群落中高灵敏度检测 ARG-MGE 共定位。
Curr Protoc. 2024 Oct;4(10):e70031. doi: 10.1002/cpz1.70031.
4
A real-world multi-center RNA-seq benchmarking study using the Quartet and MAQC reference materials.基于 Quartet 和 MAQC 参考品的真实世界多中心 RNA-seq 基准研究。
Nat Commun. 2024 Jul 22;15(1):6167. doi: 10.1038/s41467-024-50420-y.
5
Detection of ac4C in human mRNA is preserved upon data reassessment.在重新评估数据后,可检测到人 mRNA 中的 ac4C。
Mol Cell. 2024 Apr 18;84(8):1611-1625.e3. doi: 10.1016/j.molcel.2024.03.018.
6
An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies.一个被忽视的现象:潜在误差源对细菌从头基因组组装质量的复杂相互作用。
BMC Genomics. 2024 Jan 9;25(1):45. doi: 10.1186/s12864-023-09910-4.
7
Affordable, accurate and unbiased RNA sequencing by manual library miniaturization: A case study in barley.通过手动文库微缩实现经济实惠、准确且无偏的 RNA 测序:大麦中的案例研究。
Plant Biotechnol J. 2023 Nov;21(11):2241-2253. doi: 10.1111/pbi.14126. Epub 2023 Aug 18.
8
RAPID: A Simple, Fast Protocol for RNA Metagenomic Sequencing of Clinical Samples.RAPID:一种用于临床样本 RNA 宏基因组测序的简单、快速的方案。
Viruses. 2023 Apr 19;15(4):1006. doi: 10.3390/v15041006.
9
Dietary Supplementation with Milk Lipids Leads to Suppression of Developmental and Behavioral Phenotypes of Hyperexcitable Drosophila Mutants.膳食补充乳脂可抑制过度兴奋的果蝇突变体的发育和行为表型。
Neuroscience. 2023 Jun 1;520:1-17. doi: 10.1016/j.neuroscience.2023.03.027. Epub 2023 Mar 31.
10
Back to Basics: A Simplified Improvement to Multiple Displacement Amplification for Microbial Single-Cell Genomics.回归基础:一种简化的微生物单细胞基因组学多重置换扩增方法。
Int J Mol Sci. 2023 Feb 21;24(5):4270. doi: 10.3390/ijms24054270.
DNA Res. 2015 Aug;22(4):269-77. doi: 10.1093/dnares/dsv010. Epub 2015 Jun 29.
4
Biased estimates of clonal evolution and subclonal heterogeneity can arise from PCR duplicates in deep sequencing experiments.深度测序实验中的PCR重复可能会导致克隆进化和亚克隆异质性的偏差估计。
Genome Biol. 2014 Aug 7;15(8):420. doi: 10.1186/s13059-014-0420-4.
5
Performance comparison of four exome capture systems for deep sequencing.四种外显子捕获系统用于深度测序的性能比较
BMC Genomics. 2014 Jun 9;15(1):449. doi: 10.1186/1471-2164-15-449.
6
Library construction for next-generation sequencing: overviews and challenges.下一代测序文库构建:概述与挑战。
Biotechniques. 2014 Feb 1;56(2):61-4, 66, 68, passim. doi: 10.2144/000114133. eCollection 2014.
7
Bias from removing read duplication in ultra-deep sequencing experiments.超深度测序实验中去除重复读取所导致的偏差。
Bioinformatics. 2014 Apr 15;30(8):1073-1080. doi: 10.1093/bioinformatics/btt771. Epub 2014 Jan 2.
8
Quantitative single-cell RNA-seq with unique molecular identifiers.带有独特分子标识符的定量单细胞 RNA-seq。
Nat Methods. 2014 Feb;11(2):163-6. doi: 10.1038/nmeth.2772. Epub 2013 Dec 22.
9
Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.高通量 mRNA 和 small RNA 测序在实验室间的可重复性。
Nat Biotechnol. 2013 Nov;31(11):1015-22. doi: 10.1038/nbt.2702. Epub 2013 Sep 15.
10
Transcriptome and genome sequencing uncovers functional variation in humans.转录组和基因组测序揭示了人类功能变异。
Nature. 2013 Sep 26;501(7468):506-11. doi: 10.1038/nature12531. Epub 2013 Sep 15.