RobNorm：基于模型的有标记定量质谱蛋白质组学数据稳健归一化方法。

RobNorm: model-based robust normalization method for labeled quantitative mass spectrometry proteomics data.

机构信息

Department of Genetics, Stanford University, Stanford, CA 94305, USA.

出版信息

Bioinformatics. 2021 May 5;37(6):815-821. doi: 10.1093/bioinformatics/btaa904.

DOI:10.1093/bioinformatics/btaa904

PMID:33098413

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8098025/

Abstract

MOTIVATION

Data normalization is an important step in processing proteomics data generated in mass spectrometry experiments, which aims to reduce sample-level variation and facilitate comparisons of samples. Previously published methods for normalization primarily depend on the assumption that the distribution of protein expression is similar across all samples. However, this assumption fails when the protein expression data is generated from heterogenous samples, such as from various tissue types. This led us to develop a novel data-driven method for improved normalization to correct the systematic bias meanwhile maintaining underlying biological heterogeneity.

RESULTS

To robustly correct the systematic bias, we used the density-power-weight method to down-weigh outliers and extended the one-dimensional robust fitting method described in the previous work to our structured data. We then constructed a robustness criterion and developed a new normalization algorithm, called RobNorm.In simulation studies and analysis of real data from the genotype-tissue expression project, we compared and evaluated the performance of RobNorm against other normalization methods. We found that the RobNorm approach exhibits the greatest reduction in systematic bias while maintaining across-tissue variation, especially for datasets from highly heterogeneous samples.

AVAILABILITYAND IMPLEMENTATION

https://github.com/mwgrassgreen/RobNorm.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

数据标准化是处理质谱实验产生的蛋白质组学数据的重要步骤，旨在减少样本水平的差异，便于比较样本。以前发表的归一化方法主要依赖于蛋白质表达分布在所有样本中相似的假设。然而，当蛋白质表达数据来自异质样本（如不同组织类型）时，这种假设就会失效。这促使我们开发了一种新的数据驱动方法，以改进归一化，在纠正系统偏差的同时保持潜在的生物学异质性。

结果

为了稳健地纠正系统偏差，我们使用密度-幂权法来减轻异常值的影响，并将之前工作中描述的一维稳健拟合方法扩展到我们的结构化数据中。然后，我们构建了一个稳健性标准，并开发了一种新的归一化算法，称为 RobNorm。在模拟研究和基因型-组织表达项目的真实数据分析中，我们比较和评估了 RobNorm 与其他归一化方法的性能。我们发现，RobNorm 方法在保持组织间变异性的同时，系统偏差的减少最大，特别是对于来自高度异质样本的数据集。

可用性和实现

https://github.com/mwgrassgreen/RobNorm。

补充信息

补充数据可在生物信息学在线获得。

相似文献

RobNorm: model-based robust normalization method for labeled quantitative mass spectrometry proteomics data.RobNorm：基于模型的有标记定量质谱蛋白质组学数据稳健归一化方法。

Bioinformatics. 2021 May 5;37(6):815-821. doi: 10.1093/bioinformatics/btaa904.

MAFFIN: metabolomics sample normalization using maximal density fold change with high-quality metabolic features and corrected signal intensities.MAFFIN：使用具有高质量代谢特征和校正信号强度的最大密度倍数变化进行代谢组学样本归一化。

Bioinformatics. 2022 Jun 27;38(13):3429-3437. doi: 10.1093/bioinformatics/btac355.

Phosphonormalizer: an R package for normalization of MS-based label-free phosphoproteomics.Phosphonormalizer：一个用于基于 MS 的无标记磷酸化蛋白质组学数据标准化的 R 包。

Bioinformatics. 2018 Feb 15;34(4):693-694. doi: 10.1093/bioinformatics/btx573.

MatchMixeR: a cross-platform normalization method for gene expression data integration.MatchMixeR：一种用于基因表达数据整合的跨平台归一化方法。

Bioinformatics. 2020 Apr 15;36(8):2486-2491. doi: 10.1093/bioinformatics/btz974.

EXIMS: an improved data analysis pipeline based on a new peak picking method for EXploring Imaging Mass Spectrometry data.EXIMS：一种基于新型峰提取方法的改进型数据分析管道，用于探索成像质谱数据。

Bioinformatics. 2015 Oct 1;31(19):3198-206. doi: 10.1093/bioinformatics/btv356. Epub 2015 Jun 10.

MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing.MS-REDUCE：一种用于减少大量质谱数据以进行高通量处理的超快速技术。

Bioinformatics. 2016 May 15;32(10):1518-26. doi: 10.1093/bioinformatics/btw023. Epub 2016 Jan 21.

MetTailor: dynamic block summary and intensity normalization for robust analysis of mass spectrometry data in metabolomics.MetTailor：用于代谢组学中质谱数据分析的动态块摘要和强度归一化，以实现稳健分析。

Bioinformatics. 2015 Nov 15;31(22):3645-52. doi: 10.1093/bioinformatics/btv434. Epub 2015 Jul 27.

AdaTiSS: a novel data-Adaptive robust method for identifying Tissue Specificity Scores.AdaTiSS：一种用于识别组织特异性分数的新型数据自适应稳健方法。

Bioinformatics. 2021 Dec 7;37(23):4469-4476. doi: 10.1093/bioinformatics/btab460.

FLINO: a new method for immunofluorescence bioimage normalization.FLINO：一种用于免疫荧光生物图像归一化的新方法。

Bioinformatics. 2022 Jan 3;38(2):520-526. doi: 10.1093/bioinformatics/btab686.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

引用本文的文献

Multiplexed Quantification of First-Trimester Serum Biomarkers in Healthy Pregnancy.健康妊娠早期血清生物标志物的多重定量分析

Int J Mol Sci. 2025 Aug 18;26(16):7970. doi: 10.3390/ijms26167970.

Privacy-preserving multicenter differential protein abundance analysis with FedProt.使用FedProt进行隐私保护的多中心差异蛋白质丰度分析。

Nat Comput Sci. 2025 Aug;5(8):675-688. doi: 10.1038/s43588-025-00832-7. Epub 2025 Jul 11.

Evaluation of normalization strategies for mass spectrometry-based multi-omics datasets.基于质谱的多组学数据集标准化策略的评估

Metabolomics. 2025 Jul 1;21(4):98. doi: 10.1007/s11306-025-02297-1.

Systematic evaluation of normalization approaches in tandem mass tag and label-free protein quantification data using PRONE.使用PRONE对串联质谱标签和无标记蛋白质定量数据中的归一化方法进行系统评估。

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf201.

Comprehensive Overview of Bottom-Up Proteomics Using Mass Spectrometry.基于质谱的自下而上蛋白质组学综合概述

ACS Meas Sci Au. 2024 Jun 4;4(4):338-417. doi: 10.1021/acsmeasuresciau.3c00068. eCollection 2024 Aug 21.

Comprehensive Overview of Bottom-Up Proteomics using Mass Spectrometry.基于质谱的自下而上蛋白质组学综合概述

ArXiv. 2023 Nov 13:arXiv:2311.07791v1.

Deep Proteomics Network and Machine Learning Analysis of Human Cerebrospinal Fluid in Japanese Encephalitis Virus Infection.深度蛋白质组学网络和机器学习分析日本脑炎病毒感染患者的脑脊液。

J Proteome Res. 2023 Jun 2;22(6):1614-1629. doi: 10.1021/acs.jproteome.2c00563. Epub 2023 May 23.

Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.针对基于质谱的无标记定量蛋白质组学中差异分析的多重插补诱导变异性进行核算。

PLoS Comput Biol. 2022 Aug 29;18(8):e1010420. doi: 10.1371/journal.pcbi.1010420. eCollection 2022 Aug.

Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial.在大规模蛋白质组学研究中进行批次效应的诊断和校正：教程。

Mol Syst Biol. 2021 Aug;17(8):e10240. doi: 10.15252/msb.202110240.

A Quantitative Proteome Map of the Human Body.人体定量蛋白质组图谱。

Cell. 2020 Oct 1;183(1):269-283.e19. doi: 10.1016/j.cell.2020.08.036. Epub 2020 Sep 10.

本文引用的文献

A Quantitative Proteome Map of the Human Body.人体定量蛋白质组图谱。

Cell. 2020 Oct 1;183(1):269-283.e19. doi: 10.1016/j.cell.2020.08.036. Epub 2020 Sep 10.

A systematic evaluation of normalization methods in quantitative label-free proteomics.一种定量无标记蛋白质组学中标准化方法的系统评价。

Brief Bioinform. 2018 Jan 1;19(1):1-11. doi: 10.1093/bib/bbw095.

limma powers differential expression analyses for RNA-sequencing and microarray studies.limma为RNA测序和微阵列研究提供差异表达分析的动力。

Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015 Jan 20.

In-depth evaluation of software tools for data-independent acquisition based label-free quantification.基于数据非依赖采集的无标记定量软件工具的深入评估。

Proteomics. 2015 Sep;15(18):3140-51. doi: 10.1002/pmic.201400396. Epub 2015 Feb 5.

Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ.通过延迟归一化和最大肽段比率提取进行全蛋白质组精确的无标记定量，称为MaxLFQ。

Mol Cell Proteomics. 2014 Sep;13(9):2513-26. doi: 10.1074/mcp.M113.031591. Epub 2014 Jun 17.

Normalyzer: a tool for rapid evaluation of normalization methods for omics data sets.Normalyzer：一种用于快速评估组学数据集归一化方法的工具。

J Proteome Res. 2014 Jun 6;13(6):3114-20. doi: 10.1021/pr401264n. Epub 2014 May 2.

STRING v9.1: protein-protein interaction networks, with increased coverage and integration.STRING v9.1：蛋白质-蛋白质相互作用网络，具有更高的覆盖度和集成度。

Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29.

Statistical methods for quantitative mass spectrometry proteomic experiments with labeling.标记定量质谱蛋白质组学实验的统计方法。

BMC Bioinformatics. 2012;13 Suppl 16(Suppl 16):S7. doi: 10.1186/1471-2105-13-S16-S7. Epub 2012 Nov 5.

Normalization and missing value imputation for label-free LC-MS analysis.无标记 LC-MS 分析的归一化和缺失值插补。

BMC Bioinformatics. 2012;13 Suppl 16(Suppl 16):S5. doi: 10.1186/1471-2105-13-S16-S5. Epub 2012 Nov 5.

Normalization and statistical analysis of quantitative proteomics data generated by metabolic labeling.代谢标记定量蛋白质组学数据的标准化和统计分析。

Mol Cell Proteomics. 2009 Oct;8(10):2227-42. doi: 10.1074/mcp.M800462-MCP200. Epub 2009 Jul 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。