RNA-seq 中的差异表达：深度的问题。

Differential expression in RNA-seq: a matter of depth.

机构信息

Bioinformatics and Genomics Department, Centro de Investigación Príncipe Felipe, 46012 Valencia, Spain.

出版信息

Genome Res. 2011 Dec;21(12):2213-23. doi: 10.1101/gr.124321.111. Epub 2011 Sep 8.

DOI:10.1101/gr.124321.111

PMID:21903743

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3227109/

Abstract

Next-generation sequencing (NGS) technologies are revolutionizing genome research, and in particular, their application to transcriptomics (RNA-seq) is increasingly being used for gene expression profiling as a replacement for microarrays. However, the properties of RNA-seq data have not been yet fully established, and additional research is needed for understanding how these data respond to differential expression analysis. In this work, we set out to gain insights into the characteristics of RNA-seq data analysis by studying an important parameter of this technology: the sequencing depth. We have analyzed how sequencing depth affects the detection of transcripts and their identification as differentially expressed, looking at aspects such as transcript biotype, length, expression level, and fold-change. We have evaluated different algorithms available for the analysis of RNA-seq and proposed a novel approach--NOISeq--that differs from existing methods in that it is data-adaptive and nonparametric. Our results reveal that most existing methodologies suffer from a strong dependency on sequencing depth for their differential expression calls and that this results in a considerable number of false positives that increases as the number of reads grows. In contrast, our proposed method models the noise distribution from the actual data, can therefore better adapt to the size of the data set, and is more effective in controlling the rate of false discoveries. This work discusses the true potential of RNA-seq for studying regulation at low expression ranges, the noise within RNA-seq data, and the issue of replication.

摘要

下一代测序（NGS）技术正在彻底改变基因组学研究，特别是它们在转录组学（RNA-seq）中的应用，正越来越多地被用于基因表达谱分析，以替代微阵列。然而，RNA-seq 数据的特性尚未完全确定，需要进一步的研究来了解这些数据如何响应差异表达分析。在这项工作中，我们通过研究该技术的一个重要参数——测序深度，旨在深入了解 RNA-seq 数据分析的特点。我们分析了测序深度如何影响转录本的检测及其作为差异表达的识别，研究了转录本的生物类型、长度、表达水平和倍数变化等方面。我们评估了 RNA-seq 分析的不同算法，并提出了一种新的方法——NOISeq，与现有方法不同的是，它是数据自适应的和非参数的。我们的结果表明，大多数现有的方法在进行差异表达分析时，对测序深度有很强的依赖性，这导致了大量的假阳性，随着读取次数的增加而增加。相比之下，我们提出的方法从实际数据中建模噪声分布，因此可以更好地适应数据集的大小，并且在控制假发现率方面更有效。这项工作讨论了 RNA-seq 在低表达范围的调控研究中的真正潜力、RNA-seq 数据中的噪声以及复制问题。

相似文献

Differential expression in RNA-seq: a matter of depth.RNA-seq 中的差异表达：深度的问题。

Genome Res. 2011 Dec;21(12):2213-23. doi: 10.1101/gr.124321.111. Epub 2011 Sep 8.

Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.通过纳入非外显子映射读数对RNA测序数据进行差异表达分析。

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11.

Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays.使用 RNA-Seq 和微阵列评估 C57BL/6J 和 DBA/2J 小鼠纹状体中的基因表达。

PLoS One. 2011 Mar 24;6(3):e17820. doi: 10.1371/journal.pone.0017820.

A systematic comparison and evaluation of high density exon arrays and RNA-seq technology used to unravel the peripheral blood transcriptome of sickle cell disease.系统比较和评价高密度外显子芯片和 RNA-seq 技术用于揭示镰状细胞病外周血转录组。

BMC Med Genomics. 2012 Jun 29;5:28. doi: 10.1186/1755-8794-5-28.

DAFS: a data-adaptive flag method for RNA-sequencing data to differentiate genes with low and high expression.DAFS：一种用于 RNA-seq 数据的自适应标记方法，用于区分低表达和高表达基因。

BMC Bioinformatics. 2014 Mar 31;15:92. doi: 10.1186/1471-2105-15-92.

A comparison of RNA-seq and exon arrays for whole genome transcription profiling of the L5 spinal nerve transection model of neuropathic pain in the rat.RNA-seq 和外显子芯片在大鼠 L5 脊神经横断神经病理性疼痛全基因组转录谱分析中的比较。

Mol Pain. 2014 Jan 28;10:7. doi: 10.1186/1744-8069-10-7.

Sequencing transcriptomes in toto.全转录组测序。

Integr Biol (Camb). 2011 May;3(5):522-8. doi: 10.1039/c0ib00062k. Epub 2011 Feb 4.

Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl(-/-) retinal transcriptomes.新一代测序技术有助于对野生型和Nrl基因敲除小鼠视网膜转录组进行定量分析。

Mol Vis. 2011;17:3034-54. Epub 2011 Nov 23.

Detecting differentially expressed genes by smoothing effect of gene length on variance estimation.通过基因长度对方差估计的平滑效应来检测差异表达基因。

J Bioinform Comput Biol. 2015 Dec;13(6):1542004. doi: 10.1142/S0219720015420044. Epub 2015 Oct 11.

RNA-Seq vs dual- and single-channel microarray data: sensitivity analysis for differential expression and clustering.RNA-Seq 与双通道和单通道微阵列数据：差异表达和聚类的敏感性分析。

PLoS One. 2012;7(12):e50986. doi: 10.1371/journal.pone.0050986. Epub 2012 Dec 10.

引用本文的文献

Comparison of transcriptional activity profiling by metabolic labeling or nuclear RNA sequencing.通过代谢标记或核RNA测序进行转录活性谱分析的比较。

Plant J. 2025 Aug;123(3):e70401. doi: 10.1111/tpj.70401.

How thoughtful experimental design can empower biologists in the omics era.深思熟虑的实验设计如何在组学时代助力生物学家。

Nat Commun. 2025 Aug 6;16(1):7263. doi: 10.1038/s41467-025-62616-x.

Passive shaping of intra- and intercellular m6A dynamics via mRNA metabolism.通过mRNA代谢对细胞内和细胞间m6A动态进行被动塑造。

Elife. 2025 Jun 30;13:RP100448. doi: 10.7554/eLife.100448.

Unraveling the Nectar Secretion Pathway and Floral-Specific Expression of and Genes in Five Dandelion Species Through RNA Sequencing.通过RNA测序解析五种蒲公英花蜜分泌途径及相关基因的花特异性表达

Plants (Basel). 2025 Jun 5;14(11):1718. doi: 10.3390/plants14111718.

scMetaIntegrator: a meta-analysis approach to paired single-cell differential expression analysis.scMetaIntegrator：一种用于配对单细胞差异表达分析的荟萃分析方法。

bioRxiv. 2025 Jun 8:2025.06.04.657898. doi: 10.1101/2025.06.04.657898.

Network Pharmacology-Based Elucidation of the Hypoglycemic Mechanism of GF5000 Polysaccharides via GCK modulation in Diabetic Rats.基于网络药理学阐明 GF5000 多糖通过调节 GCK 对糖尿病大鼠的降血糖机制

Nutrients. 2025 Mar 10;17(6):964. doi: 10.3390/nu17060964.

Heterogeneity-preserving discriminative feature selection for disease-specific subtype discovery.用于疾病特异性亚型发现的保持异质性的判别特征选择

Nat Commun. 2025 Apr 16;16(1):3593. doi: 10.1038/s41467-025-58718-1.

The role of CsrA in controls the extracellular electron transfer and biofilm production in .CsrA在控制[具体微生物名称]中的细胞外电子传递和生物膜形成方面的作用。（注：原文句子不完整，缺少具体涉及的微生物名称）

Front Microbiol. 2025 Mar 11;16:1534446. doi: 10.3389/fmicb.2025.1534446. eCollection 2025.

Associations of ANGPT2 expression and its variants (rs1868554 and rs7825407) with multiple myeloma risk and outcome.血管生成素2（ANGPT2）表达及其变体（rs1868554和rs7825407）与多发性骨髓瘤风险及预后的关联。

Front Oncol. 2025 Mar 6;15:1468373. doi: 10.3389/fonc.2025.1468373. eCollection 2025.

Interspecies predictions of growth traits from quantitative transcriptome data acquired during fruit development.基于果实发育过程中获取的定量转录组数据对生长性状进行种间预测。

J Exp Bot. 2025 Aug 21;76(12):3390-3411. doi: 10.1093/jxb/eraf122.

本文引用的文献

Analysing high-throughput sequencing data in Python with HTSeq 2.0.用 HTSeq 2.0 分析 Python 中的高通量测序数据。

Bioinformatics. 2022 May 13;38(10):2943-2945. doi: 10.1093/bioinformatics/btac166.

Comparative and demographic analysis of orang-utan genomes.猩猩基因组的比较和人口统计学分析。

Nature. 2011 Jan 27;469(7331):529-33. doi: 10.1038/nature09687.

The genome of Theobroma cacao.可可基因组。

Nat Genet. 2011 Feb;43(2):101-8. doi: 10.1038/ng.736. Epub 2010 Dec 26.

The developmental transcriptome of Drosophila melanogaster.黑腹果蝇的发育转录组。

Nature. 2011 Mar 24;471(7339):473-9. doi: 10.1038/nature09715. Epub 2010 Dec 22.

From RNA-seq reads to differential expression results.从 RNA-seq 读取到差异表达结果。

Genome Biol. 2010;11(12):220. doi: 10.1186/gb-2010-11-12-220. Epub 2010 Dec 22.

The sequence read archive.序列读取存档库。

Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21. doi: 10.1093/nar/gkq1019. Epub 2010 Nov 9.

Ensembl 2011.Ensembl 2011年版

Nucleic Acids Res. 2011 Jan;39(Database issue):D800-6. doi: 10.1093/nar/gkq1064. Epub 2010 Nov 2.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

Differential expression analysis for sequence count data.差异表达分析序列计数数据。

Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.

Alternative expression analysis by RNA sequencing.RNA 测序的替代表达分析。

Nat Methods. 2010 Oct;7(10):843-7. doi: 10.1038/nmeth.1503. Epub 2010 Sep 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验