针对基于质谱的无标记定量蛋白质组学中差异分析的多重插补诱导变异性进行核算。

Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.

机构信息

Institut de Recherche Mathématique Avancée, UMR 7501, CNRS-Université de Strasbourg, Strasbourg, France.

Laboratoire de Spectrométrie de Masse Bio-Organique, Institut Pluridisciplinaire Hubert Curien, UMR 7178, CNRS-Université de Strasbourg, Strasbourg, France.

出版信息

PLoS Comput Biol. 2022 Aug 29;18(8):e1010420. doi: 10.1371/journal.pcbi.1010420. eCollection 2022 Aug.

DOI:10.1371/journal.pcbi.1010420

PMID:36037245

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9462777/

Abstract

Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters' variability thanks to Rubin's rules. The imputation-based peptide's intensities' variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. Indeed, an aggregation step is included for protein-level results based on peptide-level quantification data. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.

摘要

在无标记定量蛋白质组学中，缺失值插补是一种常见的做法。插补的目的是用用户定义的值替换缺失值。然而，在插补过程的下游，插补本身可能没有被最优地考虑，因为插补数据集通常被认为是完整的。因此，由于插补而产生的不确定性没有被充分考虑。我们提供了一种严格的多重插补策略，由于鲁宾的规则，这使得对参数变异性的估计更加无偏。然后，使用贝叶斯层次模型来调节基于插补的肽强度方差估计量。最后，该估计量被包含在经过调节的 t 检验统计中，以提供差异分析结果。这个工作流程可以在定量数据集的肽和蛋白质水平上使用。实际上，基于肽水平的定量数据，为蛋白质水平的结果包含了一个聚合步骤。我们的方法，名为 mi4p，与 DAPAR R 包中实现的最先进的 limma 工作流程在模拟和真实数据集上进行了比较。我们观察到了敏感性和特异性之间的权衡，而 mi4p 的整体性能在 F-Score 方面优于 DAPAR。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1194/9462777/2a7a26d1419d/pcbi.1010420.g001.jpg

相似文献

Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.针对基于质谱的无标记定量蛋白质组学中差异分析的多重插补诱导变异性进行核算。

PLoS Comput Biol. 2022 Aug 29;18(8):e1010420. doi: 10.1371/journal.pcbi.1010420. eCollection 2022 Aug.

Towards a More Accurate Differential Analysis of Multiple Imputed Proteomics Data with mi4limma.mi4limma 实现更精确的多重插补蛋白质组学数据差异分析

Methods Mol Biol. 2023;2426:131-140. doi: 10.1007/978-1-0716-1967-4_7.

Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics.肽质组学分析中线性模型的评估和缺失值插补

BMC Bioinformatics. 2019 Mar 14;20(Suppl 2):102. doi: 10.1186/s12859-019-2619-6.

Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.考虑无标记定量蛋白质组学数据集中缺失值的多重性质以比较插补策略。

J Proteome Res. 2016 Apr 1;15(4):1116-25. doi: 10.1021/acs.jproteome.5b00981. Epub 2016 Mar 1.

Baldur: Bayesian Hierarchical Modeling for Label-Free Proteomics with Gamma Regressing Mean-Variance Trends.贝达：基于伽玛回归均值-方差趋势的无标签蛋白质组学的贝叶斯分层建模。

Mol Cell Proteomics. 2023 Dec;22(12):100658. doi: 10.1016/j.mcpro.2023.100658. Epub 2023 Oct 7.

Proper imputation of missing values in proteomics datasets for differential expression analysis.蛋白质组学数据集缺失值的恰当推断用于差异表达分析。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa112.

PEPerMINT: peptide abundance imputation in mass spectrometry-based proteomics using graph neural networks.PEPerMINT：基于图神经网络的质谱蛋白质组学中肽丰度推断。

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii70-ii78. doi: 10.1093/bioinformatics/btae389.

Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning.基于自监督深度学习的无标签定量蛋白质组学数据的推断。

Nat Commun. 2024 Jun 26;15(1):5405. doi: 10.1038/s41467-024-48711-5.

Evaluating Proteomics Imputation Methods with Improved Criteria.评估具有改进标准的蛋白质组学插补方法。

J Proteome Res. 2023 Nov 3;22(11):3427-3438. doi: 10.1021/acs.jproteome.3c00205. Epub 2023 Oct 20.

Propensity score analysis with partially observed covariates: How should multiple imputation be used?倾向评分分析与部分观测协变量：应如何使用多重插补？

Stat Methods Med Res. 2019 Jan;28(1):3-19. doi: 10.1177/0962280217713032. Epub 2017 Jun 2.

引用本文的文献

Optimizing imputation strategies for mass spectrometry-based proteomics considering intensity and missing value rates.考虑强度和缺失值率优化基于质谱的蛋白质组学的插补策略。

Comput Struct Biotechnol J. 2025 May 3;27:1818-1826. doi: 10.1016/j.csbj.2025.04.041. eCollection 2025.

AUGMENTED DOUBLY ROBUST POST-IMPUTATION INFERENCE FOR PROTEOMIC DATA.蛋白质组学数据的增强双稳健插补后推断

bioRxiv. 2025 Jan 19:2024.03.23.586387. doi: 10.1101/2024.03.23.586387.

Challenges and Opportunities for Single-cell Computational Proteomics.单细胞计算蛋白质组学面临的挑战与机遇。

Mol Cell Proteomics. 2023 Apr;22(4):100518. doi: 10.1016/j.mcpro.2023.100518. Epub 2023 Feb 23.

Proximity Mapping of CCP6 Reveals Its Association with Centrosome Organization and Cilium Assembly.CCP6 接近作图揭示其与中心体组织和纤毛组装的关联。

Int J Mol Sci. 2023 Jan 9;24(2):1273. doi: 10.3390/ijms24021273.

本文引用的文献

How Many Imputations Do You Need? A Two-stage Calculation Using a Quadratic Rule.你需要多少次插补？使用二次规则的两阶段计算。

Sociol Methods Res. 2020 Aug;49(3):699-718. doi: 10.1177/0049124117747303. Epub 2018 Jan 18.

Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics.自下而上蛋白质组学中缺失值问题的多重插补方法。

Int J Mol Sci. 2021 Sep 6;22(17):9650. doi: 10.3390/ijms22179650.

RobNorm: model-based robust normalization method for labeled quantitative mass spectrometry proteomics data.RobNorm：基于模型的有标记定量质谱蛋白质组学数据稳健归一化方法。

Bioinformatics. 2021 May 5;37(6):815-821. doi: 10.1093/bioinformatics/btaa904.

MSqRob Takes the Missing Hurdle: Uniting Intensity- and Count-Based Proteomics.MSqRob 跨越缺失的鸿沟：统一基于强度和计数的蛋白质组学。

Anal Chem. 2020 May 5;92(9):6278-6287. doi: 10.1021/acs.analchem.9b04375. Epub 2020 Apr 15.

Improved methods for estimating fraction of missing information in multiple imputation.多重填补中缺失信息比例估计的改进方法。

Cogent Math Stat. 2018;5:1551504. doi: 10.1080/25742558.2018.1551504. Epub 2018 Nov 23.

Protein-Level Statistical Analysis of Quantitative Label-Free Proteomics Data with ProStaR.使用ProStaR对无标记定量蛋白质组学数据进行蛋白质水平的统计分析。

Methods Mol Biol. 2019;1959:225-246. doi: 10.1007/978-1-4939-9164-8_15.

PANDA-view: an easy-to-use tool for statistical analysis and visualization of quantitative proteomics data.PANDA-view：一个用于定量蛋白质组学数据的统计分析和可视化的简单易用的工具。

Bioinformatics. 2018 Oct 15;34(20):3594-3596. doi: 10.1093/bioinformatics/bty408.

ROBUST HYPERPARAMETER ESTIMATION PROTECTS AGAINST HYPERVARIABLE GENES AND IMPROVES POWER TO DETECT DIFFERENTIAL EXPRESSION.稳健的超参数估计可抵御高变异性基因，并提高检测差异表达的能力。

Ann Appl Stat. 2016 Jun;10(2):946-963. doi: 10.1214/16-AOAS920. Epub 2016 Jul 22.

Benchmarking sample preparation/digestion protocols reveals tube-gel being a fast and repeatable method for quantitative proteomics.对样本制备/消化方案进行基准测试发现，管内凝胶法是一种用于定量蛋白质组学的快速且可重复的方法。

Proteomics. 2016 Dec;16(23):2953-2961. doi: 10.1002/pmic.201600288.

DAPAR & ProStaR: software to perform statistical analyses in quantitative discovery proteomics.DAPAR和ProStaR：用于定量发现蛋白质组学统计分析的软件。

Bioinformatics. 2017 Jan 1;33(1):135-136. doi: 10.1093/bioinformatics/btw580. Epub 2016 Sep 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

针对基于质谱的无标记定量蛋白质组学中差异分析的多重插补诱导变异性进行核算。

Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献