评估具有改进标准的蛋白质组学插补方法。

Evaluating Proteomics Imputation Methods with Improved Criteria.

机构信息

Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States.

Talus Biosciences, Seattle, Washington 98112, United States.

出版信息

J Proteome Res. 2023 Nov 3;22(11):3427-3438. doi: 10.1021/acs.jproteome.3c00205. Epub 2023 Oct 20.

DOI:10.1021/acs.jproteome.3c00205

PMID:37861703

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10949645/

Abstract

Quantitative measurements produced by tandem mass spectrometry proteomics experiments typically contain a large proportion of missing values. Missing values hinder reproducibility, reduce statistical power, and make it difficult to compare across samples or experiments. Although many methods exist for imputing missing values, in practice, the most commonly used methods are among the worst performing. Furthermore, previous benchmarking studies have focused on relatively simple measurements of error such as the mean-squared error between imputed and held-out values. Here we evaluate the performance of commonly used imputation methods using three practical, "downstream-centric" criteria. These criteria measure the ability to identify differentially expressed peptides, generate new quantitative peptides, and improve the peptide lower limit of quantification. Our evaluation comprises several experiment types and acquisition strategies, including data-dependent and data-independent acquisition. We find that imputation does not necessarily improve the ability to identify differentially expressed peptides but that it can identify new quantitative peptides and improve the peptide lower limit of quantification. We find that MissForest is generally the best performing method per our downstream-centric criteria. We also argue that existing imputation methods do not properly account for the variance of peptide quantifications and highlight the need for methods that do.

摘要

串联质谱蛋白质组学实验产生的定量测量值通常包含很大比例的缺失值。缺失值会阻碍可重复性，降低统计能力，并使得跨样本或实验进行比较变得困难。尽管存在许多用于插补缺失值的方法，但在实践中，最常用的方法是性能最差的方法之一。此外，以前的基准研究主要集中在相对简单的错误测量上，例如插补值和保留值之间的均方误差。在这里，我们使用三个实际的、“下游中心”标准来评估常用插补方法的性能。这些标准衡量识别差异表达肽、生成新定量肽和提高肽定量下限的能力。我们的评估包括几种实验类型和采集策略，包括数据依赖和数据独立采集。我们发现，插补不一定能提高识别差异表达肽的能力，但它可以识别新的定量肽并提高肽定量下限。我们发现，根据我们的下游中心标准，MissForest 通常是性能最好的方法。我们还认为，现有的插补方法没有正确考虑肽定量的方差，并强调需要开发能够正确考虑这种方差的方法。

相似文献

Evaluating Proteomics Imputation Methods with Improved Criteria.评估具有改进标准的蛋白质组学插补方法。

J Proteome Res. 2023 Nov 3;22(11):3427-3438. doi: 10.1021/acs.jproteome.3c00205. Epub 2023 Oct 20.

Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics.肽质组学分析中线性模型的评估和缺失值插补

BMC Bioinformatics. 2019 Mar 14;20(Suppl 2):102. doi: 10.1186/s12859-019-2619-6.

Missing value imputation in proximity extension assay-based targeted proteomics data.基于邻近延伸分析的靶向蛋白质组学数据中的缺失值插补。

PLoS One. 2020 Dec 14;15(12):e0243487. doi: 10.1371/journal.pone.0243487. eCollection 2020.

DEqMS: A Method for Accurate Variance Estimation in Differential Protein Expression Analysis.DEqMS：一种用于差异蛋白质表达分析中精确方差估计的方法。

Mol Cell Proteomics. 2020 Jun;19(6):1047-1057. doi: 10.1074/mcp.TIR119.001646. Epub 2020 Mar 23.

Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.考虑无标记定量蛋白质组学数据集中缺失值的多重性质以比较插补策略。

J Proteome Res. 2016 Apr 1;15(4):1116-25. doi: 10.1021/acs.jproteome.5b00981. Epub 2016 Mar 1.

Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.针对基于质谱的无标记定量蛋白质组学中差异分析的多重插补诱导变异性进行核算。

PLoS Comput Biol. 2022 Aug 29;18(8):e1010420. doi: 10.1371/journal.pcbi.1010420. eCollection 2022 Aug.

Proper imputation of missing values in proteomics datasets for differential expression analysis.蛋白质组学数据集缺失值的恰当推断用于差异表达分析。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa112.

Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics.基于质谱的无标记全局蛋白质组学中缺失值插补挑战的综述、评估与讨论。

J Proteome Res. 2015 May 1;14(5):1993-2001. doi: 10.1021/pr501138h. Epub 2015 Apr 22.

Identification of differentially expressed peptides in high-throughput proteomics data.高通量蛋白质组学数据中差异表达肽的鉴定。

Brief Bioinform. 2018 Sep 28;19(5):971-981. doi: 10.1093/bib/bbx031.

Assessment of label-free quantification and missing value imputation for proteomics in non-human primates.非人类灵长类动物蛋白质组学中无标记定量和缺失值插补的评估。

BMC Genomics. 2022 Jul 8;23(1):496. doi: 10.1186/s12864-022-08723-1.

引用本文的文献

Plasma proteomic signatures of social support and their association with cardiovascular disease and mortality.社会支持的血浆蛋白质组学特征及其与心血管疾病和死亡率的关联。

medRxiv. 2025 Aug 11:2025.08.07.25333199. doi: 10.1101/2025.08.07.25333199.

Laser capture proteomics reveals new candidates for sperm-interacting proteins in the bovine oviduct epithelium.激光捕获蛋白质组学揭示了牛输卵管上皮中精子相互作用蛋白的新候选物。

Reproduction. 2025 Jul 17;170(2). doi: 10.1530/REP-24-0390. Print 2025 Aug 1.

Optimizing imputation strategies for mass spectrometry-based proteomics considering intensity and missing value rates.考虑强度和缺失值率优化基于质谱的蛋白质组学的插补策略。

Comput Struct Biotechnol J. 2025 May 3;27:1818-1826. doi: 10.1016/j.csbj.2025.04.041. eCollection 2025.

DNA-damage-associated protein co-expression network in cardiomyocytes informs on tolerance to genetic variation and disease.心肌细胞中与DNA损伤相关的蛋白质共表达网络揭示了对基因变异和疾病的耐受性。

iScience. 2025 Apr 18;28(5):112474. doi: 10.1016/j.isci.2025.112474. eCollection 2025 May 16.

Analysis of FAIMS for the study of affinity-purified protein complexes using the orbitrap ascend tribrid mass spectrometer.使用轨道阱Ascend三合一质谱仪对用于亲和纯化蛋白质复合物研究的流动辅助离子迁移谱（FAIMS）进行分析。

Mol Omics. 2025 Jul 7;21(4):303-314. doi: 10.1039/d5mo00038f.

AUGMENTED DOUBLY ROBUST POST-IMPUTATION INFERENCE FOR PROTEOMIC DATA.蛋白质组学数据的增强双稳健插补后推断

bioRxiv. 2025 Jan 19:2024.03.23.586387. doi: 10.1101/2024.03.23.586387.

Comprehensive Evaluation of Advanced Imputation Methods for Proteomic Data Acquired via the Label-Free Approach.通过无标记方法获取的蛋白质组学数据的先进插补方法综合评估

Int J Mol Sci. 2024 Dec 17;25(24):13491. doi: 10.3390/ijms252413491.

Affinity-Enriched Plasma Proteomics for Biomarker Discovery in Abdominal Aortic Aneurysms.用于腹主动脉瘤生物标志物发现的亲和富集血浆蛋白质组学

Proteomes. 2024 Dec 9;12(4):37. doi: 10.3390/proteomes12040037.

Analysis of FAIMS for the Study of Affinity-Purified Protein Complexes Using the Orbitrap Ascend Tribrid Mass Spectrometer.使用Orbitrap Ascend Tribrid质谱仪对亲和纯化蛋白质复合物进行研究的FAIMS分析

bioRxiv. 2024 Dec 2:2024.12.02.626431. doi: 10.1101/2024.12.02.626431.

Metabolic status is a key factor influencing proteomic changes in ewe granulosa cells induced by chronic BPS exposure.代谢状态是影响慢性 BPS 暴露诱导绵羊颗粒细胞蛋白质组变化的关键因素。

BMC Genomics. 2024 Nov 16;25(1):1095. doi: 10.1186/s12864-024-11034-2.

本文引用的文献

Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning.基于自监督深度学习的无标签定量蛋白质组学数据的推断。

Nat Commun. 2024 Jun 26;15(1):5405. doi: 10.1038/s41467-024-48711-5.

LFQ-Based Peptide and Protein Intensity Differential Expression Analysis.基于 LFQ 的肽段和蛋白质相对强度差异表达分析。

J Proteome Res. 2023 Jun 2;22(6):2114-2123. doi: 10.1021/acs.jproteome.2c00812. Epub 2023 May 23.

MsImpute: Estimation of Missing Peptide Intensity Data in Label-Free Quantitative Mass Spectrometry.MsImpute：无标记定量质谱中缺失肽段强度数据的估计。

Mol Cell Proteomics. 2023 Aug;22(8):100558. doi: 10.1016/j.mcpro.2023.100558. Epub 2023 Apr 25.

A peptide-centric quantitative proteomics dataset for the phenotypic assessment of Alzheimer's disease.阿尔茨海默病表型评估的肽段为中心的定量蛋白质组学数据集。

Sci Data. 2023 Apr 14;10(1):206. doi: 10.1038/s41597-023-02057-7.

MSstats Version 4.0: Statistical Analyses of Quantitative Mass Spectrometry-Based Proteomic Experiments with Chromatography-Based Quantification at Scale.MSstats 版本 4.0：大规模基于色谱定量的定量蛋白质组学实验的统计分析

J Proteome Res. 2023 May 5;22(5):1466-1482. doi: 10.1021/acs.jproteome.2c00834. Epub 2023 Apr 5.

: A Comprehensive -Package for Proteomics Differential Expression Analysis.蛋白质组学差异表达分析的综合套餐。

J Proteome Res. 2023 Apr 7;22(4):1092-1104. doi: 10.1021/acs.jproteome.2c00441. Epub 2023 Mar 20.

Challenges and Opportunities for Single-cell Computational Proteomics.单细胞计算蛋白质组学面临的挑战与机遇。

Mol Cell Proteomics. 2023 Apr;22(4):100518. doi: 10.1016/j.mcpro.2023.100518. Epub 2023 Feb 23.

Ultra-high sensitivity mass spectrometry quantifies single-cell proteome changes upon perturbation.超高灵敏度质谱定量分析扰动后单细胞蛋白质组的变化。

Mol Syst Biol. 2022 Mar;18(3):e10798. doi: 10.15252/msb.202110798.

Putting Humpty Dumpty Back Together Again: What Does Protein Quantification Mean in Bottom-Up Proteomics?将“Humpty Dumpty”重新拼凑起来：在“自上而下”蛋白质组学中，蛋白质定量意味着什么？

J Proteome Res. 2022 Apr 1;21(4):891-898. doi: 10.1021/acs.jproteome.1c00894. Epub 2022 Feb 27.

Effect of imputation on gene network reconstruction from single-cell RNA-seq data.插补对单细胞RNA测序数据基因网络重建的影响。

Patterns (N Y). 2021 Dec 22;3(2):100414. doi: 10.1016/j.patter.2021.100414. eCollection 2022 Feb 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验