• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于无标记蛋白质组学的缺失值插补方法评估的比较研究。

A comparative study of evaluating missing value imputation methods in label-free proteomics.

机构信息

Drug Metabolism and Pharmacokinetics, AbbVie Bioresearch Center, Worcester, MA, 01605, USA.

Discovery and Exploratory Statistics, AbbVie Bioresearch Center, Worcester, MA, 01605, USA.

出版信息

Sci Rep. 2021 Jan 19;11(1):1760. doi: 10.1038/s41598-021-81279-4.

DOI:10.1038/s41598-021-81279-4
PMID:33469060
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7815892/
Abstract

The presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. Imputation has been widely utilized to handle MVs, and selection of the proper method is critical for the accuracy and reliability of imputation. Here we present a comparative study that evaluates the performance of seven popular imputation methods with a large-scale benchmark dataset and an immune cell dataset. Simulated MVs were incorporated into the complete part of each dataset with different combinations of MV rates and missing not at random (MNAR) rates. Normalized root mean square error (NRMSE) was applied to evaluate the accuracy of protein abundances and intergroup protein ratios after imputation. Detection of true positives (TPs) and false altered-protein discovery rate (FADR) between groups were also compared using the benchmark dataset. Furthermore, the accuracy of handling real MVs was assessed by comparing enriched pathways and signature genes of cell activation after imputing the immune cell dataset. We observed that the accuracy of imputation is primarily affected by the MNAR rate rather than the MV rate, and downstream analysis can be largely impacted by the selection of imputation methods. A random forest-based imputation method consistently outperformed other popular methods by achieving the lowest NRMSE, high amount of TPs with the average FADR < 5%, and the best detection of relevant pathways and signature genes, highlighting it as the most suitable method for label-free proteomics.

摘要

缺失值(MVs)的存在极大地降低了无标记定量蛋白质组学数据的完整性。插补已被广泛应用于处理 MVs,选择适当的方法对于插补的准确性和可靠性至关重要。在这里,我们进行了一项比较研究,使用大规模基准数据集和免疫细胞数据集评估了七种流行插补方法的性能。模拟的 MVs 以不同的 MV 率和非随机缺失(MNAR)率组合被合并到每个数据集的完整部分中。归一化均方根误差(NRMSE)用于评估插补后蛋白质丰度和组间蛋白质比值的准确性。还使用基准数据集比较了组间真实阳性(TP)和假改变蛋白发现率(FADR)的检测。此外,通过比较免疫细胞数据集插补后细胞激活的富集途径和特征基因,评估了处理真实 MVs 的准确性。我们观察到,插补的准确性主要受 MNAR 率而非 MV 率的影响,下游分析可能会受到插补方法选择的极大影响。基于随机森林的插补方法通过实现最低的 NRMSE、高数量的 TP 和平均 FADR<5%,以及最佳的相关途径和特征基因检测,始终优于其他流行方法,这突出表明它是无标记蛋白质组学中最适合的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/03d1e2acb4ca/41598_2021_81279_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/41b6e6c1192f/41598_2021_81279_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/8272d7c88805/41598_2021_81279_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/4932313a1b3b/41598_2021_81279_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/21b21fbd50b0/41598_2021_81279_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/e0c608eb05e4/41598_2021_81279_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/03d1e2acb4ca/41598_2021_81279_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/41b6e6c1192f/41598_2021_81279_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/8272d7c88805/41598_2021_81279_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/4932313a1b3b/41598_2021_81279_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/21b21fbd50b0/41598_2021_81279_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/e0c608eb05e4/41598_2021_81279_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3112/7815892/03d1e2acb4ca/41598_2021_81279_Fig6_HTML.jpg

相似文献

1
A comparative study of evaluating missing value imputation methods in label-free proteomics.基于无标记蛋白质组学的缺失值插补方法评估的比较研究。
Sci Rep. 2021 Jan 19;11(1):1760. doi: 10.1038/s41598-021-81279-4.
2
Proper imputation of missing values in proteomics datasets for differential expression analysis.蛋白质组学数据集缺失值的恰当推断用于差异表达分析。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa112.
3
In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values.深度方法评估用于具有缺失值的鸟枪法蛋白质组学数据的差异表达蛋白检测。
Sci Rep. 2017 Jun 13;7(1):3367. doi: 10.1038/s41598-017-03650-8.
4
ProJect: a powerful mixed-model missing value imputation method.ProJect:一种强大的混合模型缺失值插补方法。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad233.
5
Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.基于随机森林的插补方法在 LC-MS 代谢组学数据插补方面优于其他方法:一项比较研究。
BMC Bioinformatics. 2019 Oct 11;20(1):492. doi: 10.1186/s12859-019-3110-0.
6
A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.一种流行的蛋白质组学软件工作流程的综合评估,用于无标记蛋白质组定量和插补。
Brief Bioinform. 2018 Nov 27;19(6):1344-1355. doi: 10.1093/bib/bbx054.
7
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
8
Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.考虑无标记定量蛋白质组学数据集中缺失值的多重性质以比较插补策略。
J Proteome Res. 2016 Apr 1;15(4):1116-25. doi: 10.1021/acs.jproteome.5b00981. Epub 2016 Mar 1.
9
Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics.自下而上蛋白质组学中缺失值问题的多重插补方法。
Int J Mol Sci. 2021 Sep 6;22(17):9650. doi: 10.3390/ijms22179650.
10
DIMA: Data-Driven Selection of an Imputation Algorithm.DIMA:基于数据驱动的插补算法选择。
J Proteome Res. 2021 Jul 2;20(7):3489-3496. doi: 10.1021/acs.jproteome.1c00119. Epub 2021 Jun 1.

引用本文的文献

1
Estimating progression of Alzheimer's disease with extracellular vesicle-related multi-omics risk models.利用细胞外囊泡相关的多组学风险模型评估阿尔茨海默病的进展
Front Aging Neurosci. 2025 Jul 24;17:1617611. doi: 10.3389/fnagi.2025.1617611. eCollection 2025.
2
High performance data integration for large-scale analyses of incomplete Omic profiles using Batch-Effect Reduction Trees (BERT).使用批效应减少树(BERT)对不完整组学图谱进行大规模分析的高性能数据集成。
Nat Commun. 2025 Aug 2;16(1):7104. doi: 10.1038/s41467-025-62237-4.
3
The phosphatases TCPTP, PTPN22, and SHP1 play unique roles in T cell phosphotyrosine maintenance and feedback regulation of the TCR.

本文引用的文献

1
NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses.NAguideR:执行和优先考虑缺失值插补以进行一致的从头蛋白质组学分析。
Nucleic Acids Res. 2020 Aug 20;48(14):e83. doi: 10.1093/nar/gkaa498.
2
Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.基于随机森林的插补方法在 LC-MS 代谢组学数据插补方面优于其他方法:一项比较研究。
BMC Bioinformatics. 2019 Oct 11;20(1):492. doi: 10.1186/s12859-019-3110-0.
3
Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics.
磷酸酶TCPTP、PTPN22和SHP1在T细胞磷酸酪氨酸维持及TCR的反馈调节中发挥独特作用。
Sci Rep. 2025 Jul 30;15(1):27747. doi: 10.1038/s41598-025-12951-2.
4
Library-based virtual match-between-runs quantification in GlyPep-Quant improves site-specific glycan identification.GlyPep-Quant中基于库的运行间虚拟匹配定量分析可改善位点特异性聚糖鉴定。
Nat Commun. 2025 Jul 14;16(1):6483. doi: 10.1038/s41467-025-61673-6.
5
Privacy-preserving multicenter differential protein abundance analysis with FedProt.使用FedProt进行隐私保护的多中心差异蛋白质丰度分析。
Nat Comput Sci. 2025 Aug;5(8):675-688. doi: 10.1038/s43588-025-00832-7. Epub 2025 Jul 11.
6
Integrative Spatial Proteomics and Single-Cell RNA Sequencing Unveil Molecular Complexity in Rheumatoid Arthritis for Novel Therapeutic Targeting.整合空间蛋白质组学与单细胞RNA测序揭示类风湿关节炎的分子复杂性以寻找新的治疗靶点
Proteomes. 2025 May 22;13(2):17. doi: 10.3390/proteomes13020017.
7
Optimizing imputation strategies for mass spectrometry-based proteomics considering intensity and missing value rates.考虑强度和缺失值率优化基于质谱的蛋白质组学的插补策略。
Comput Struct Biotechnol J. 2025 May 3;27:1818-1826. doi: 10.1016/j.csbj.2025.04.041. eCollection 2025.
8
Survival strategies of Rhinocladiella similis in perchlorate-rich Mars like environments.类鼻疽霉菌在富含高氯酸盐的类火星环境中的生存策略。
NPJ Microgravity. 2025 May 22;11(1):18. doi: 10.1038/s41526-025-00475-y.
9
Analysis of FAIMS for the study of affinity-purified protein complexes using the orbitrap ascend tribrid mass spectrometer.使用轨道阱Ascend三合一质谱仪对用于亲和纯化蛋白质复合物研究的流动辅助离子迁移谱(FAIMS)进行分析。
Mol Omics. 2025 Jul 7;21(4):303-314. doi: 10.1039/d5mo00038f.
10
Applying LFQRatio Normalization in Quantitative Proteomic Analysis of Microbial Co-culture Systems.LFQRatio归一化在微生物共培养系统定量蛋白质组分析中的应用
Bio Protoc. 2025 May 5;15(9):e5294. doi: 10.21769/BioProtoc.5294.
肽质组学分析中线性模型的评估和缺失值插补
BMC Bioinformatics. 2019 Mar 14;20(Suppl 2):102. doi: 10.1186/s12859-019-2619-6.
4
IonStar enables high-precision, low-missing-data proteomics quantification in large biological cohorts.IonStar 可实现大规模生物队列中高精度、低数据缺失的蛋白质组学定量分析。
Proc Natl Acad Sci U S A. 2018 May 22;115(21):E4767-E4776. doi: 10.1073/pnas.1800541115. Epub 2018 May 9.
5
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.基于质谱的代谢组学数据的缺失值插补方法。
Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.
6
A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.一种流行的蛋白质组学软件工作流程的综合评估,用于无标记蛋白质组定量和插补。
Brief Bioinform. 2018 Nov 27;19(6):1344-1355. doi: 10.1093/bib/bbx054.
7
Social network architecture of human immune cells unveiled by quantitative proteomics.人类免疫细胞的社交网络结构通过定量蛋白质组学揭示。
Nat Immunol. 2017 May;18(5):583-593. doi: 10.1038/ni.3693. Epub 2017 Mar 6.
8
Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.考虑无标记定量蛋白质组学数据集中缺失值的多重性质以比较插补策略。
J Proteome Res. 2016 Apr 1;15(4):1116-25. doi: 10.1021/acs.jproteome.5b00981. Epub 2016 Mar 1.
9
Functional inflammatory profiles distinguish myelin-reactive T cells from patients with multiple sclerosis.功能炎症图谱可区分多发性硬化症患者的髓鞘反应性T细胞。
Sci Transl Med. 2015 May 13;7(287):287ra74. doi: 10.1126/scitranslmed.aaa8038.
10
Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics.基于质谱的无标记全局蛋白质组学中缺失值插补挑战的综述、评估与讨论。
J Proteome Res. 2015 May 1;14(5):1993-2001. doi: 10.1021/pr501138h. Epub 2015 Apr 22.