基于等压标记的 shotgun 蛋白质组学中填补策略的综述

A Review of Imputation Strategies for Isobaric Labeling-Based Shotgun Proteomics.

机构信息

Computing & Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.

Boeing, Seattle, Washington 98055, United States.

出版信息

J Proteome Res. 2021 Jan 1;20(1):1-13. doi: 10.1021/acs.jproteome.0c00123. Epub 2020 Sep 25.

DOI:10.1021/acs.jproteome.0c00123

PMID:32929967

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8996546/

Abstract

The throughput efficiency and increased depth of coverage provided by isobaric-labeled proteomics measurements have led to increased usage of these techniques. However, the structure of missing data is different than unlabeled studies, which prompts the need for this review to compare the efficacy of nine imputation methods on large isobaric-labeled proteomics data sets to guide researchers on the appropriateness of various imputation methods. Imputation methods were evaluated by accuracy, statistical hypothesis test inference, and run time. In general, expectation maximization and random forest imputation methods yielded the best performance, and constant-based methods consistently performed poorly across all data set sizes and percentages of missing values. For data sets with small sample sizes and higher percentages of missing data, results indicate that statistical inference with no imputation may be preferable. On the basis of the findings in this review, there are core imputation methods that perform better for isobaric-labeled proteomics data, but great care and consideration as to whether imputation is the optimal strategy should be given for data sets comprised of a small number of samples.

摘要

同重标记蛋白质组学测量的高通量效率和增加的覆盖深度导致了这些技术的使用增加。然而，缺失数据的结构与未标记的研究不同，这促使我们需要对这九种插补方法在大型同重标记蛋白质组学数据集上的功效进行比较，以指导研究人员选择各种插补方法的适当性。通过准确性、统计假设检验推断和运行时间来评估插补方法。一般来说，期望最大化和随机森林插补方法的性能最好，而基于常数的方法在所有数据集大小和缺失值百分比下的性能都很差。对于样本量较小且缺失数据百分比较高的数据集，结果表明没有插补的统计推断可能是更好的选择。基于本综述的结果，对于同重标记蛋白质组学数据，有一些核心的插补方法表现更好，但对于由少数样本组成的数据集，应慎重考虑是否采用插补策略。

相似文献

A Review of Imputation Strategies for Isobaric Labeling-Based Shotgun Proteomics.基于等压标记的 shotgun 蛋白质组学中填补策略的综述

J Proteome Res. 2021 Jan 1;20(1):1-13. doi: 10.1021/acs.jproteome.0c00123. Epub 2020 Sep 25.

Data Imputation in Merged Isobaric Labeling-Based Relative Quantification Datasets.基于等压标记的相对定量合并数据集中的数据插补

Methods Mol Biol. 2020;2051:297-308. doi: 10.1007/978-1-4939-9744-2_13.

A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets.一种简单的优化工作流程，可实现蛋白质组学数据集缺失值的精确和准确插补。

J Proteome Res. 2021 Jun 4;20(6):3214-3229. doi: 10.1021/acs.jproteome.1c00070. Epub 2021 May 3.

Assessment of label-free quantification and missing value imputation for proteomics in non-human primates.非人类灵长类动物蛋白质组学中无标记定量和缺失值插补的评估。

BMC Genomics. 2022 Jul 8;23(1):496. doi: 10.1186/s12864-022-08723-1.

Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics.基于质谱的无标记全局蛋白质组学中缺失值插补挑战的综述、评估与讨论。

J Proteome Res. 2015 May 1;14(5):1993-2001. doi: 10.1021/pr501138h. Epub 2015 Apr 22.

A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.一种流行的蛋白质组学软件工作流程的综合评估，用于无标记蛋白质组定量和插补。

Brief Bioinform. 2018 Nov 27;19(6):1344-1355. doi: 10.1093/bib/bbx054.

Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.考虑无标记定量蛋白质组学数据集中缺失值的多重性质以比较插补策略。

J Proteome Res. 2016 Apr 1;15(4):1116-25. doi: 10.1021/acs.jproteome.5b00981. Epub 2016 Mar 1.

DIMA: Data-Driven Selection of an Imputation Algorithm.DIMA：基于数据驱动的插补算法选择。

J Proteome Res. 2021 Jul 2;20(7):3489-3496. doi: 10.1021/acs.jproteome.1c00119. Epub 2021 Jun 1.

In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values.深度方法评估用于具有缺失值的鸟枪法蛋白质组学数据的差异表达蛋白检测。

Sci Rep. 2017 Jun 13;7(1):3367. doi: 10.1038/s41598-017-03650-8.

Binned Data Provide Better Imputation of Missing Time Series Data from Wearables.分箱数据可更好地对可穿戴设备中缺失时间序列数据进行插补。

Sensors (Basel). 2023 Jan 28;23(3):1454. doi: 10.3390/s23031454.

引用本文的文献

Identification and applications of disease-associated differential human and bacterial proteins with metaproteomic evidence.基于宏蛋白质组学证据的疾病相关人类和细菌差异蛋白质的鉴定与应用

Health Inf Sci Syst. 2025 Aug 29;13(1):54. doi: 10.1007/s13755-025-00369-z. eCollection 2025 Dec.

Informatics at the Frontier of Cancer Research.癌症研究前沿的信息学

Cancer Res. 2025 Aug 15;85(16):2967-2986. doi: 10.1158/0008-5472.CAN-24-2829.

Evaluation of imputation and imputation-free strategies for differential abundance analysis in metaproteomics data.宏蛋白质组学数据中差异丰度分析的插补和无插补策略评估

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf141.

Assessing the impact of batch effect associated missing values on downstream analysis in high-throughput biomedical data.评估高通量生物医学数据中与批次效应相关的缺失值对下游分析的影响。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf168.

AUGMENTED DOUBLY ROBUST POST-IMPUTATION INFERENCE FOR PROTEOMIC DATA.蛋白质组学数据的增强双稳健插补后推断

bioRxiv. 2025 Jan 19:2024.03.23.586387. doi: 10.1101/2024.03.23.586387.

Data from a multi-year targeted proteomics study of a longitudinal birth cohort of type 1 diabetes.一项针对1型糖尿病纵向出生队列的多年靶向蛋白质组学研究的数据。

Sci Data. 2025 Jan 20;12(1):112. doi: 10.1038/s41597-024-04249-1.

Imputation of cancer proteomics data with a deep model that learns from many datasets.使用从多个数据集学习的深度模型对癌症蛋白质组学数据进行插补。

bioRxiv. 2024 Aug 28:2024.08.26.609780. doi: 10.1101/2024.08.26.609780.

Cross-platform proteomics signatures of extreme old age.高龄的跨平台蛋白质组学特征

Geroscience. 2025 Feb;47(1):1199-1220. doi: 10.1007/s11357-024-01286-x. Epub 2024 Jul 25.

RiceProteomeDB (RPDB): a user-friendly database for proteomics data storage, retrieval, and analysis.稻米蛋白质组数据库（RPDB）：一个用于蛋白质组学数据存储、检索和分析的用户友好型数据库。

Sci Rep. 2024 Feb 14;14(1):3671. doi: 10.1038/s41598-024-54151-4.

Proteomic analyses identify HK1 and ATP5A to be overexpressed in distant metastases of lung adenocarcinomas compared to matched primary tumors.蛋白质组学分析表明，与匹配的原发性肿瘤相比，HK1 和 ATP5A 在肺腺癌的远处转移中过表达。

Sci Rep. 2023 Nov 28;13(1):20948. doi: 10.1038/s41598-023-47767-5.

本文引用的文献

Data Imputation in Merged Isobaric Labeling-Based Relative Quantification Datasets.基于等压标记的相对定量合并数据集中的数据插补

Methods Mol Biol. 2020;2051:297-308. doi: 10.1007/978-1-4939-9744-2_13.

Multibatch TMT Reveals False Positives, Batch Effects and Missing Values.多批次 TMT 揭示了假阳性、批次效应和缺失值。

Mol Cell Proteomics. 2019 Oct;18(10):1967-1980. doi: 10.1074/mcp.RA119.001472. Epub 2019 Jul 22.

An Integrative Analysis of Tumor Proteomic and Phosphoproteomic Profiles to Examine the Relationships Between Kinase Activity and Phosphorylation.肿瘤蛋白质组和磷酸化蛋白质组综合分析，探讨激酶活性与磷酸化之间的关系。

Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S26-S36. doi: 10.1074/mcp.RA119.001540. Epub 2019 Jun 21.

Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.基于非靶向 MS 的代谢组学数据中缺失值的特征描述及缺失数据处理策略的评价。

Metabolomics. 2018 Sep 20;14(10):128. doi: 10.1007/s11306-018-1420-2.

pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data.pmartR：基于质谱的生物学数据的质量控制和统计。

J Proteome Res. 2019 Mar 1;18(3):1418-1425. doi: 10.1021/acs.jproteome.8b00760. Epub 2019 Jan 28.

Reproducible workflow for multiplexed deep-scale proteome and phosphoproteome analysis of tumor tissues by liquid chromatography-mass spectrometry.通过液相色谱-质谱联用技术对肿瘤组织进行多重深度蛋白质组和磷酸化蛋白质组分析的可重现工作流程。

Nat Protoc. 2018 Jul;13(7):1632-1661. doi: 10.1038/s41596-018-0006-9.

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.基于质谱的代谢组学数据的缺失值插补方法。

Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.

2016 update of the PRIDE database and its related tools.PRIDE数据库及其相关工具的2016年更新。

Nucleic Acids Res. 2016 Dec 15;44(22):11033. doi: 10.1093/nar/gkw880. Epub 2016 Sep 28.

Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer.人类高级别浆液性卵巢癌的综合蛋白质基因组特征分析

Cell. 2016 Jul 28;166(3):755-765. doi: 10.1016/j.cell.2016.05.069. Epub 2016 Jun 29.

J Proteome Res. 2016 Apr 1;15(4):1116-25. doi: 10.1021/acs.jproteome.5b00981. Epub 2016 Mar 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验