扩增对RNA测序差异表达分析的影响。

The impact of amplification on differential expression analyses by RNA-seq.

作者信息

Parekh Swati, Ziegenhain Christoph, Vieth Beate, Enard Wolfgang, Hellmann Ines

机构信息

Anthropology &Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, 82152 Martinsried, Germany.

出版信息

Sci Rep. 2016 May 9;6:25533. doi: 10.1038/srep25533.

DOI:10.1038/srep25533

PMID:27156886

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4860583/

Abstract

Currently, quantitative RNA-seq methods are pushed to work with increasingly small starting amounts of RNA that require amplification. However, it is unclear how much noise or bias amplification introduces and how this affects precision and accuracy of RNA quantification. To assess the effects of amplification, reads that originated from the same RNA molecule (PCR-duplicates) need to be identified. Computationally, read duplicates are defined by their mapping position, which does not distinguish PCR- from natural duplicates and hence it is unclear how to treat duplicated reads. Here, we generate and analyse RNA-seq data sets prepared using three different protocols (Smart-Seq, TruSeq and UMI-seq). We find that a large fraction of computationally identified read duplicates are not PCR duplicates and can be explained by sampling and fragmentation bias. Consequently, the computational removal of duplicates does improve neither accuracy nor precision and can actually worsen the power and the False Discovery Rate (FDR) for differential gene expression. Even when duplicates are experimentally identified by unique molecular identifiers (UMIs), power and FDR are only mildly improved. However, the pooling of samples as made possible by the early barcoding of the UMI-protocol leads to an appreciable increase in the power to detect differentially expressed genes.

摘要

目前，定量RNA测序方法正被用于处理起始RNA量越来越少且需要扩增的样本。然而，尚不清楚扩增会引入多少噪声或偏差，以及这如何影响RNA定量的精度和准确性。为了评估扩增的影响，需要识别源自同一RNA分子的reads（PCR重复序列）。在计算上，reads重复序列是由它们的映射位置定义的，这无法区分PCR重复序列和天然重复序列，因此不清楚如何处理重复的reads。在这里，我们生成并分析了使用三种不同方案（Smart-Seq、TruSeq和UMI-seq）制备的RNA测序数据集。我们发现，计算识别出的大部分reads重复序列并非PCR重复序列，而是可以用抽样和片段化偏差来解释。因此，通过计算去除重复序列既不能提高准确性也不能提高精度，实际上还可能降低差异基因表达的检验效能和错误发现率（FDR）。即使通过唯一分子标识符（UMI）在实验上识别出重复序列，检验效能和FDR也只是略有改善。然而，UMI方案早期条形码技术实现的样本合并，会显著提高检测差异表达基因的效能。

相似文献

The impact of amplification on differential expression analyses by RNA-seq.

Sci Rep. 2016 May 9;6:25533. doi: 10.1038/srep25533.

Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers.

BMC Genomics. 2018 Jul 13;19(1):531. doi: 10.1186/s12864-018-4933-1.

A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):43. doi: 10.1186/s12859-017-1471-9.

Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

Biotechniques. 2017 Nov 1;63(5):221-226. doi: 10.2144/000114608.

Quantile normalization of single-cell RNA-seq read counts without unique molecular identifiers.

Genome Biol. 2020 Jul 3;21(1):160. doi: 10.1186/s13059-020-02078-0.

BUTTERFLY: addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq.

Genome Biol. 2021 Jun 8;22(1):174. doi: 10.1186/s13059-021-02386-z.

Gene length and detection bias in single cell RNA sequencing protocols.

F1000Res. 2017 Apr 28;6:595. doi: 10.12688/f1000research.11290.1. eCollection 2017.

dupRadar: a Bioconductor package for the assessment of PCR artifacts in RNA-Seq data.

BMC Bioinformatics. 2016 Oct 21;17(1):428. doi: 10.1186/s12859-016-1276-2.

Full-Length Single-Cell RNA-Sequencing with FLASH-seq.

Methods Mol Biol. 2023;2584:123-164. doi: 10.1007/978-1-0716-2756-3_5.

Application of the 3' mRNA-Seq using unique molecular identifiers in highly degraded RNA derived from formalin-fixed, paraffin-embedded tissue.

BMC Genomics. 2021 Oct 24;22(1):759. doi: 10.1186/s12864-021-08068-1.

引用本文的文献

Genomic Insights into Tumorigenesis in Newly Diagnosed Multiple Myeloma.

Diagnostics (Basel). 2025 Aug 23;15(17):2130. doi: 10.3390/diagnostics15172130.

DeepQR: single-molecule QR codes for optical gene-expression analysis.

Nanophotonics. 2024 Jul 30;14(15):2549-2561. doi: 10.1515/nanoph-2024-0236. eCollection 2025 Aug.

Efficient profiling of total RNA in single cells with STORM-seq.

bioRxiv. 2025 May 20:2022.03.14.484332. doi: 10.1101/2022.03.14.484332.

The impact of PCR duplication on RNAseq data generated using NovaSeq 6000, NovaSeq X, AVITI, and G4 sequencers.

Genome Biol. 2025 May 28;26(1):145. doi: 10.1186/s13059-025-03613-7.

Label-free single-cell phenotyping to determine tumor cell heterogeneity in pancreatic cancer in real time.

JCI Insight. 2025 May 27;10(13). doi: 10.1172/jci.insight.169105. eCollection 2025 Jul 8.

Model of metabolism and gene expression predicts proteome allocation in Pseudomonas putida.

NPJ Syst Biol Appl. 2025 May 24;11(1):55. doi: 10.1038/s41540-025-00521-1.

Utilizing Nanopore direct RNA sequencing of blood from patients with sepsis for discovery of co- and post-transcriptional disease biomarkers.

BMC Infect Dis. 2025 May 13;25(1):692. doi: 10.1186/s12879-025-11078-z.

Mitochondrial fatty acid synthesis and MECR regulate CD4+ T cell function and oxidative metabolism.

J Immunol. 2025 May 1;214(5):958-976. doi: 10.1093/jimmun/vkaf034.

Microenvironmental arginine restriction sensitizes pancreatic cancers to polyunsaturated fatty acids by suppression of lipid synthesis.

bioRxiv. 2025 Mar 13:2025.03.10.642426. doi: 10.1101/2025.03.10.642426.

Detection of mRNA Transcript Variants.

Genes (Basel). 2025 Mar 16;16(3):343. doi: 10.3390/genes16030343.

本文引用的文献

Comparative Analysis of Single-Cell RNA Sequencing Methods.

Mol Cell. 2017 Feb 16;65(4):631-643.e4. doi: 10.1016/j.molcel.2017.01.023.

Single-cell messenger RNA sequencing reveals rare intestinal cell types.

Nature. 2015 Sep 10;525(7568):251-5. doi: 10.1038/nature14966. Epub 2015 Aug 19.

The technology and biology of single-cell RNA sequencing.

Mol Cell. 2015 May 21;58(4):610-20. doi: 10.1016/j.molcel.2015.04.005.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.

PROPER: comprehensive power evaluation for differential expression using RNA-seq.

Bioinformatics. 2015 Jan 15;31(2):233-41. doi: 10.1093/bioinformatics/btu640. Epub 2014 Oct 1.

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.

Nat Biotechnol. 2014 Sep;32(9):903-14. doi: 10.1038/nbt.2957. Epub 2014 Aug 24.

Single-cell RNA-seq: advances and future challenges.

Nucleic Acids Res. 2014 Aug;42(14):8845-60. doi: 10.1093/nar/gku555. Epub 2014 Jul 22.

IVT-seq reveals extreme bias in RNA sequencing.

Genome Biol. 2014 Jun 30;15(6):R86. doi: 10.1186/gb-2014-15-6-r86.

Validation of noise models for single-cell transcriptomics.

Nat Methods. 2014 Jun;11(6):637-40. doi: 10.1038/nmeth.2930. Epub 2014 Apr 20.

featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

Bioinformatics. 2014 Apr 1;30(7):923-30. doi: 10.1093/bioinformatics/btt656. Epub 2013 Nov 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

扩增对RNA测序差异表达分析的影响。

The impact of amplification on differential expression analyses by RNA-seq.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献