从假设的角度选择样本间 RNA-Seq 标准化方法。

Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.

机构信息

Department of Statistics, Baker Hall, Carnegie Mellon University, Pittsburgh, PA, USA.

Pomona College.

出版信息

Brief Bioinform. 2018 Sep 28;19(5):776-792. doi: 10.1093/bib/bbx008.

DOI:10.1093/bib/bbx008

PMID:28334202

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6171491/

Abstract

RNA-Seq is a widely used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this article, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment.

摘要

RNA-Seq 是一种广泛用于研究不同生物条件下基因行为的方法。RNA-Seq 研究中的一个重要步骤是归一化，即调整原始数据以考虑防止直接比较表达测量的因素。归一化中的错误会对下游分析产生重大影响，例如在差异表达分析中虚报阳性。归一化被忽视的一个特征是方法所依赖的假设，以及这些假设的有效性如何对方法的性能产生重大影响。在本文中，我们解释了假设如何在原始 RNA-Seq 读计数和有意义的基因表达测量之间建立联系。我们从假设的角度检查归一化方法，因为理解方法假设对于选择适合手头数据的方法是必要的。此外，我们讨论了当假设被违反时归一化方法为何表现不佳，以及这如何导致后续分析中的问题。为了分析生物学实验，研究人员必须选择具有满足的假设并为给定实验产生有意义的表达测量的归一化方法。

相似文献

Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.从假设的角度选择样本间 RNA-Seq 标准化方法。

Brief Bioinform. 2018 Sep 28;19(5):776-792. doi: 10.1093/bib/bbx008.

Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods.RNA-Seq 差异表达分析工具的基准测试：基于标准化与基于对数比变换的方法。

BMC Bioinformatics. 2018 Jul 18;19(1):274. doi: 10.1186/s12859-018-2261-8.

How does normalization impact RNA-seq disease diagnosis?归一化如何影响 RNA-seq 疾病诊断？

J Biomed Inform. 2018 Sep;85:80-92. doi: 10.1016/j.jbi.2018.07.016. Epub 2018 Jul 21.

A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.用于RNA测序数据差异表达分析的每个样本全局缩放和每个基因归一化方法的比较。

PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.

Statistical Modeling of High Dimensional Counts.高维计数的统计建模。

Methods Mol Biol. 2021;2284:97-134. doi: 10.1007/978-1-0716-1307-8_7.

Normalization of Single-Cell RNA-Seq Data.单细胞 RNA-Seq 数据的归一化处理。

Methods Mol Biol. 2021;2284:303-329. doi: 10.1007/978-1-0716-1307-8_17.

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies.RNA-seq 研究中平衡两组比较差异基因表达分析的库大小标准化和统计方法选择。

BMC Genomics. 2020 Jan 28;21(1):75. doi: 10.1186/s12864-020-6502-7.

RNA-Seq Data Analysis in Galaxy.RNA-Seq 数据分析在 Galaxy 中。

Methods Mol Biol. 2021;2284:367-392. doi: 10.1007/978-1-0716-1307-8_20.

Power analysis and sample size estimation for RNA-Seq differential expression.RNA测序差异表达的功效分析与样本量估计

RNA. 2014 Nov;20(11):1684-96. doi: 10.1261/rna.046011.114. Epub 2014 Sep 22.

Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview.单细胞 RNA 测序分析：分步概述。

Methods Mol Biol. 2021;2284:343-365. doi: 10.1007/978-1-0716-1307-8_19.

引用本文的文献

Exploring the interplay between circadian rhythms and obesity: A Boolean network approach to understanding metabolic dysregulation.探索昼夜节律与肥胖之间的相互作用：一种用于理解代谢失调的布尔网络方法。

PLoS One. 2025 Sep 9;20(9):e0331218. doi: 10.1371/journal.pone.0331218. eCollection 2025.

PYM1 limits non-canonical Exon Junction Complex occupancy in a gene architecture dependent manner to tune mRNA expression.PYM1以基因结构依赖的方式限制非经典外显子连接复合体的占据，以调节mRNA表达。

Nat Commun. 2025 Aug 30;16(1):8138. doi: 10.1038/s41467-025-63455-6.

Exposure-inducible genes may contribute to missingness in RNAseq-based gene expression analyses.暴露诱导基因可能导致基于RNA测序的基因表达分析中出现数据缺失。

Sci Rep. 2025 Aug 22;15(1):30889. doi: 10.1038/s41598-025-14395-0.

Selecting ChIP-seq normalization methods from the perspective of their technical conditions.从技术条件的角度选择ChIP-seq标准化方法。

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf431.

Validation of a bitmap of genes involved in cherry fruit cracking by digital PCR and qPCR, suitable for plant breeding.通过数字PCR和qPCR验证参与樱桃果实裂果的基因位图，适用于植物育种。

Sci Rep. 2025 Jul 22;15(1):26619. doi: 10.1038/s41598-025-11006-w.

Normalization and Selecting Non-Differentially Expressed Genes Improve Machine Learning Modelling of Cross-Platform Transcriptomic Data.归一化和选择非差异表达基因可改善跨平台转录组数据的机器学习建模

Trans Artif Intell. 2025;1(1). doi: 10.53941/tai.2025.100005. Epub 2025 May 25.

Herpes simplex virus type 1 reshapes host chromatin architecture via transcription machinery hijacking.1型单纯疱疹病毒通过劫持转录机制重塑宿主染色质结构。

Nat Commun. 2025 Jun 19;16(1):5313. doi: 10.1038/s41467-025-60534-6.

CrossFilt: A Cross-species Filtering Tool that Eliminates Alignment Bias in Comparative Genomics Studies.CrossFilt：一种跨物种过滤工具，可消除比较基因组学研究中的比对偏差。

bioRxiv. 2025 Jun 6:2025.06.05.654938. doi: 10.1101/2025.06.05.654938.

Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data.用于易出现异常值的数据集中置信区间估计的稳健方法：在分子和生物物理数据中的应用

Biomolecules. 2025 May 12;15(5):704. doi: 10.3390/biom15050704.

5G-exposed human skin cells do not respond with altered gene expression and methylation profiles.暴露于5G环境下的人体皮肤细胞在基因表达和甲基化谱方面没有出现变化。

PNAS Nexus. 2025 May 13;4(5):pgaf127. doi: 10.1093/pnasnexus/pgaf127. eCollection 2025 May.

本文引用的文献

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.使用来自726只黑腹果蝇个体的RNA测序数据进行标准化和差异表达分析的比较。

BMC Genomics. 2016 Jan 5;17:28. doi: 10.1186/s12864-015-2353-z.

The Overlooked Fact: Fundamental Need for Spike-In Control for Virtually All Genome-Wide Analyses.被忽视的事实：几乎所有全基因组分析对掺入对照的根本需求。

Mol Cell Biol. 2015 Dec 28;36(5):662-7. doi: 10.1128/MCB.00970-14.

Dormant non-culturable Mycobacterium tuberculosis retains stable low-abundant mRNA.潜伏性不可培养的结核分枝杆菌保留稳定的低丰度信使核糖核酸。

BMC Genomics. 2015 Nov 16;16:954. doi: 10.1186/s12864-015-2197-6.

Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.比较Illumina高通量RNA测序数据差异分析的标准化方法。

BMC Bioinformatics. 2015 Oct 28;16:347. doi: 10.1186/s12859-015-0778-7.

Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments: A matter of relative size of studied transcriptomes.RNA测序实验中差异基因表达分析的标准化方法比较：所研究转录组相对大小的问题

Commun Integr Biol. 2013 Nov 1;6(6):e25849. doi: 10.4161/cib.25849. Epub 2013 Jul 30.

The Impact of Normalization Methods on RNA-Seq Data Analysis.标准化方法对RNA测序数据分析的影响。

Biomed Res Int. 2015;2015:621690. doi: 10.1155/2015/621690. Epub 2015 Jun 15.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。

Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.

Variation in transcriptome size: are we getting the message?转录组大小的变异：我们领会其中的信息了吗？

Chromosoma. 2015 Mar;124(1):27-43. doi: 10.1007/s00412-014-0496-3. Epub 2014 Nov 26.

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.测序质量控制联盟对RNA测序准确性、可重复性和信息含量的全面评估。

Nat Biotechnol. 2014 Sep;32(9):903-14. doi: 10.1038/nbt.2957. Epub 2014 Aug 24.

Normalization of RNA-seq data using factor analysis of control genes or samples.使用对照基因或样本的因子分析对RNA测序数据进行标准化。

Nat Biotechnol. 2014 Sep;32(9):896-902. doi: 10.1038/nbt.2931. Epub 2014 Aug 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验