Segers Alexandre, Castiglione Cristian, Vanderaa Christophe, De Baere Elfride, Martens Lennart, Risso Davide, Clement Lieven
Department of Applied Mathematics, Computer Science and Statistics, Ghent University. Ghent, Belgium.
Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital. Ghent, Belgium.
bioRxiv. 2025 Mar 28:2025.03.24.644996. doi: 10.1101/2025.03.24.644996.
The unprecedented speed and sensitivity of mass spectrometry (MS) unlocked large-scale applications of proteomics and even enabled proteome profiling of single cells. However, this fast-evolving field is hindered by a lack of scalable dimensionality reduction tools that can compensate for substantial batch effects and missingness across MS runs. Therefore, we present omicsGMF, a fast, scalable, and interpretable matrix factorization method, tailored for bulk and single-cell proteomics data. Unlike current workflows that sequentially apply imputation, batch correction, and principal component analysis, omicsGMF integrates these steps into a unified framework, dramatically enhancing data processing and dimensionality reduction. Additionally, omicsGMF provides robust imputation of missing values, outperforming bespoke state-of-the-art imputation tools. We further demonstrate how this integrated approach increases statistical power to detect differentially abundant proteins in the downstream data analysis. Hence, omicsGMF is a highly scalable approach to dimensionality reduction in proteomics, that dramatically improves many important steps in proteomics data analysis.
质谱(MS)前所未有的速度和灵敏度开启了蛋白质组学的大规模应用,甚至实现了单细胞蛋白质组分析。然而,这个快速发展的领域受到缺乏可扩展的降维工具的阻碍,这些工具可以补偿质谱运行过程中的大量批次效应和数据缺失。因此,我们提出了omicsGMF,这是一种快速、可扩展且可解释的矩阵分解方法,专门针对批量和单细胞蛋白质组学数据量身定制。与当前依次应用插补、批次校正和主成分分析的工作流程不同,omicsGMF将这些步骤集成到一个统一的框架中,极大地增强了数据处理和降维能力。此外,omicsGMF提供了强大的缺失值插补功能,优于定制的最先进插补工具。我们进一步展示了这种集成方法如何在下游数据分析中提高检测差异丰富蛋白质的统计能力。因此,omicsGMF是一种用于蛋白质组学降维的高度可扩展方法,它极大地改进了蛋白质组学数据分析中的许多重要步骤。