• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

如何正确地对基因表达数据分析进行分位数归一化。

How to do quantile normalization correctly for gene expression data analyses.

机构信息

School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China.

Department of Computer Science, National University of Singapore, Singapore, Singapore.

出版信息

Sci Rep. 2020 Sep 23;10(1):15534. doi: 10.1038/s41598-020-72664-6.

DOI:10.1038/s41598-020-72664-6
PMID:32968196
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7511327/
Abstract

Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split ("Class-specific"). Via simulations with both real and simulated batch effects, we demonstrate that the "Class-specific" strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the "Class-specific" strategy.

摘要

分位数归一化是一种常用于高维数据分析的重要归一化技术。然而,当盲目地应用于整个数据集时,它容易受到类效应比例效应(数据集中文档相关变量的比例)和批次效应(潜在混杂技术变化的存在)的影响,导致更高的假阳性和假阴性率。我们评估了五种执行分位数归一化的策略,并证明通过在独立执行分位数归一化之前按样本类别标签分割数据(“类别特定”),可以轻松实现批次效应校正和统计特征选择方面的良好性能。通过对真实和模拟批次效应的模拟,我们证明了“类别特定”策略(以及其他依赖类似原理的策略)可以轻松优于整个数据的分位数归一化,并且即使在分别归一化数据集的联合分析中,也能保持稳健性并保留有用信号。分位数归一化是一种常用的过程。但是,如果在不首先考虑类效应比例和批次效应的情况下在整个数据集上小心地应用,可能会导致性能不佳。如果必须使用分位数归一化,那么我们建议使用“类别特定”策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/f7c058c954ec/41598_2020_72664_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/8b4f79ba30ba/41598_2020_72664_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/9c824112fab0/41598_2020_72664_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/ea6238f0fd70/41598_2020_72664_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/ebb6a579d82f/41598_2020_72664_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/f7c058c954ec/41598_2020_72664_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/8b4f79ba30ba/41598_2020_72664_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/9c824112fab0/41598_2020_72664_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/ea6238f0fd70/41598_2020_72664_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/ebb6a579d82f/41598_2020_72664_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fb1/7511327/f7c058c954ec/41598_2020_72664_Fig5_HTML.jpg

相似文献

1
How to do quantile normalization correctly for gene expression data analyses.如何正确地对基因表达数据分析进行分位数归一化。
Sci Rep. 2020 Sep 23;10(1):15534. doi: 10.1038/s41598-020-72664-6.
2
Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data.去除纵向基因表达中的批次效应——分位数标准化加ComBat是微阵列转录组数据的最佳方法
PLoS One. 2016 Jun 7;11(6):e0156594. doi: 10.1371/journal.pone.0156594. eCollection 2016.
3
Batch effect correction for genome-wide methylation data with Illumina Infinium platform.基于 Illumina Infinium 平台的全基因组甲基化数据的批次效应校正。
BMC Med Genomics. 2011 Dec 16;4:84. doi: 10.1186/1755-8794-4-84.
4
Feature-specific quantile normalization and feature-specific mean-variance normalization deliver robust bi-directional classification and feature selection performance between microarray and RNAseq data.特征特异性分位数归一化和特征特异性均值方差归一化在微阵列和 RNAseq 数据之间提供了稳健的双向分类和特征选择性能。
BMC Bioinformatics. 2024 Mar 29;25(1):136. doi: 10.1186/s12859-024-05759-w.
5
Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias.由于样本特异性基因长度偏差导致 RNA-seq 数据的功能解读反复出错。
PLoS Biol. 2019 Nov 12;17(11):e3000481. doi: 10.1371/journal.pbio.3000481. eCollection 2019 Nov.
6
Optimal consistency in microRNA expression analysis using reference-gene-based normalization.使用基于参考基因标准化的方法在微小RNA表达分析中实现最佳一致性
Mol Biosyst. 2015 May;11(5):1235-40. doi: 10.1039/c4mb00711e.
7
Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure.Normics:基于方差和数据固有相关性结构的蛋白质组学标准化。
Mol Cell Proteomics. 2022 Sep;21(9):100269. doi: 10.1016/j.mcpro.2022.100269. Epub 2022 Jul 16.
8
A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.用于RNA测序数据差异表达分析的每个样本全局缩放和每个基因归一化方法的比较。
PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.
9
Enhanced quantile normalization of microarray data to reduce loss of information in gene expression profiles.微阵列数据的增强分位数标准化以减少基因表达谱中的信息损失。
Biometrics. 2007 Mar;63(1):50-9. doi: 10.1111/j.1541-0420.2006.00670.x.
10
Super-delta: a new differential gene expression analysis procedure with robust data normalization.超级德尔塔:一种新的具有稳健数据归一化的差异基因表达分析程序。
BMC Bioinformatics. 2017 Dec 21;18(1):582. doi: 10.1186/s12859-017-1992-2.

引用本文的文献

1
Progress and new challenges in image-based profiling.基于图像的分析技术的进展与新挑战。
ArXiv. 2025 Aug 7:arXiv:2508.05800v1.
2
Cross-Dataset Evaluation of Dementia Longitudinal Progression Prediction Models.痴呆纵向进展预测模型的跨数据集评估
Hum Brain Mapp. 2025 Aug 1;46(11):e70280. doi: 10.1002/hbm.70280.
3
A self-adaptive and versatile tool for eliminating multiple undesirable variations from large-scale transcriptomes.一种用于消除大规模转录组中多种不良变异的自适应通用工具。

本文引用的文献

1
Dealing with Confounders in Omics Analysis.处理组学分析中的混杂因素。
Trends Biotechnol. 2018 May;36(5):488-498. doi: 10.1016/j.tibtech.2018.01.013. Epub 2018 Feb 20.
2
Smooth quantile normalization.平滑分位数归一化
Biostatistics. 2018 Apr 1;19(2):185-198. doi: 10.1093/biostatistics/kxx028.
3
Characterization of background noise in capture-based targeted sequencing data.基于捕获的靶向测序数据中背景噪声的特征分析
Nat Biomed Eng. 2025 Jul 25. doi: 10.1038/s41551-025-01466-w.
4
Evaluation of normalization strategies for mass spectrometry-based multi-omics datasets.基于质谱的多组学数据集标准化策略的评估
Metabolomics. 2025 Jul 1;21(4):98. doi: 10.1007/s11306-025-02297-1.
5
Identification and correction of time-series transcriptomic anomalies.时间序列转录组异常的识别与校正。
Nucleic Acids Res. 2025 Jun 20;53(12). doi: 10.1093/nar/gkaf524.
6
Spatio-temporal dynamics of human-induced carbon emissions in Southeast Asia (1992-2022) based on nighttime light.基于夜间灯光的东南亚地区人为碳排放时空动态(1992 - 2022年)
Eco Environ Health. 2025 Apr 26;4(2):100150. doi: 10.1016/j.eehl.2025.100150. eCollection 2025 Jun.
7
Profiling of the Peripheral Blood Mononuclear Cells Proteome by Shotgun Proteomics Identifies Alterations of Immune System Components, Proteolytic Balance, Autophagy, and Mitochondrial Metabolism in Glaucoma Subjects.通过鸟枪法蛋白质组学分析外周血单个核细胞蛋白质组可鉴定青光眼患者免疫系统成分、蛋白水解平衡、自噬和线粒体代谢的改变。
ACS Omega. 2025 Apr 9;10(15):14866-14883. doi: 10.1021/acsomega.4c10035. eCollection 2025 Apr 22.
8
Assessing the impact of batch effect associated missing values on downstream analysis in high-throughput biomedical data.评估高通量生物医学数据中与批次效应相关的缺失值对下游分析的影响。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf168.
9
Epitranscriptomic analysis reveals clinical and molecular signatures in glioblastoma.表观转录组学分析揭示了胶质母细胞瘤的临床和分子特征。
Acta Neuropathol Commun. 2025 Apr 11;13(1):74. doi: 10.1186/s40478-025-01966-5.
10
Traditional Chinese medicine as a viable option for managing vascular cognitive impairment: A ray of hope.传统中医作为治疗血管性认知障碍的可行选择:一线希望。
Medicine (Baltimore). 2025 Mar 14;104(11):e41694. doi: 10.1097/MD.0000000000041694.
Genome Biol. 2017 Jul 21;18(1):136. doi: 10.1186/s13059-017-1275-2.
4
NetProt: Complex-based Feature Selection.NetProt:基于网络的特征选择。
J Proteome Res. 2017 Aug 4;16(8):3102-3112. doi: 10.1021/acs.jproteome.7b00363. Epub 2017 Jul 7.
5
Transcriptome-wide mega-analyses reveal joint dysregulation of immunologic genes and transcription regulators in brain and blood in schizophrenia.全转录组大规模分析揭示精神分裂症患者大脑和血液中免疫基因及转录调节因子的联合失调。
Schizophr Res. 2016 Oct;176(2-3):114-124. doi: 10.1016/j.schres.2016.07.006. Epub 2016 Jul 20.
6
The Impact of Normalization Methods on RNA-Seq Data Analysis.标准化方法对RNA测序数据分析的影响。
Biomed Res Int. 2015;2015:621690. doi: 10.1155/2015/621690. Epub 2015 Jun 15.
7
quantro: a data-driven approach to guide the choice of an appropriate normalization method.Quantro:一种数据驱动的方法,用于指导选择合适的归一化方法。
Genome Biol. 2015 Jun 4;16(1):117. doi: 10.1186/s13059-015-0679-0.
8
Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps.将组织活检样本快速质谱转化为永久性定量数字蛋白质组图谱。
Nat Med. 2015 Apr;21(4):407-13. doi: 10.1038/nm.3807. Epub 2015 Mar 2.
9
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.批次效应混杂会导致通过交叉验证获得的性能估计产生强烈偏差。
PLoS One. 2014 Jun 26;9(6):e100335. doi: 10.1371/journal.pone.0100335. eCollection 2014.
10
Deciphering global signal features of high-throughput array data from cancers.解析来自癌症的高通量阵列数据的全局信号特征。
Mol Biosyst. 2014 Jun;10(6):1549-56. doi: 10.1039/c4mb00084f. Epub 2014 Apr 3.