Suppr超能文献

组学通用质量框架(omicsGMF):一种用于降维、批次校正和插补的多工具,应用于批量和单细胞蛋白质组学数据。

omicsGMF: a multi-tool for dimensionality reduction, batch correction and imputation applied to bulk- and single cell proteomics data.

作者信息

Segers Alexandre, Castiglione Cristian, Vanderaa Christophe, De Baere Elfride, Martens Lennart, Risso Davide, Clement Lieven

机构信息

Department of Applied Mathematics, Computer Science and Statistics, Ghent University. Ghent, Belgium.

Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital. Ghent, Belgium.

出版信息

bioRxiv. 2025 Mar 28:2025.03.24.644996. doi: 10.1101/2025.03.24.644996.

Abstract

The unprecedented speed and sensitivity of mass spectrometry (MS) unlocked large-scale applications of proteomics and even enabled proteome profiling of single cells. However, this fast-evolving field is hindered by a lack of scalable dimensionality reduction tools that can compensate for substantial batch effects and missingness across MS runs. Therefore, we present omicsGMF, a fast, scalable, and interpretable matrix factorization method, tailored for bulk and single-cell proteomics data. Unlike current workflows that sequentially apply imputation, batch correction, and principal component analysis, omicsGMF integrates these steps into a unified framework, dramatically enhancing data processing and dimensionality reduction. Additionally, omicsGMF provides robust imputation of missing values, outperforming bespoke state-of-the-art imputation tools. We further demonstrate how this integrated approach increases statistical power to detect differentially abundant proteins in the downstream data analysis. Hence, omicsGMF is a highly scalable approach to dimensionality reduction in proteomics, that dramatically improves many important steps in proteomics data analysis.

摘要

质谱(MS)前所未有的速度和灵敏度开启了蛋白质组学的大规模应用,甚至实现了单细胞蛋白质组分析。然而,这个快速发展的领域受到缺乏可扩展的降维工具的阻碍,这些工具可以补偿质谱运行过程中的大量批次效应和数据缺失。因此,我们提出了omicsGMF,这是一种快速、可扩展且可解释的矩阵分解方法,专门针对批量和单细胞蛋白质组学数据量身定制。与当前依次应用插补、批次校正和主成分分析的工作流程不同,omicsGMF将这些步骤集成到一个统一的框架中,极大地增强了数据处理和降维能力。此外,omicsGMF提供了强大的缺失值插补功能,优于定制的最先进插补工具。我们进一步展示了这种集成方法如何在下游数据分析中提高检测差异丰富蛋白质的统计能力。因此,omicsGMF是一种用于蛋白质组学降维的高度可扩展方法,它极大地改进了蛋白质组学数据分析中的许多重要步骤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888c/12128835/55252f8a2398/nihpp-2025.03.24.644996v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验