Suppr超能文献

通过图神经网络解析蛋白质组学数据中的细胞类型丰度

Deciphering Cell Type Abundance in Proteomics Data Through Graph Neural Networks.

作者信息

Dai Zhiming, Song Yujie, Qi Tuoshi, Zhang Hongyu, Zhao Huiying, Wang Zheng, Yang Yuedong, Zeng Yuansong

机构信息

School of Big Data and Software Engineering, Chongqing University, Chongqing, 400000, China.

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China.

出版信息

Adv Sci (Weinh). 2025 Jun 20:e02987. doi: 10.1002/advs.202502987.

Abstract

Recent advancements in proteomics sequencing have significantly enhanced our ability to explore cell-type-specific signatures within complex tissues, providing critical insights into disease mechanisms. However, current proteomic technologies often suffer from low resolution, resulting in the mixing of multiple cell types during profiling. To address this limitation, cell-type deconvolution methods are developed to infer cellular composition from proteomic data. While most existing deconvolution methods are focused on transcriptomics, their application to proteomics is hindered by the weak correlation and divergent quantification between transcriptome and proteome data. Although a few proteomic-specific deconvolution methods are recently emerged, they still exhibit limited capability and performance, partly because they only extract shared information from individual samples while ignoring higher-order relationships between them. Here, GraphDEC is proposed, a novel graph neural network-based method for deciphering cell type proportions in proteomic profiling data. GraphDEC begins by simulating bulk samples from single-cell proteomic data to create reference data, which is then used to infer cell types in target datasets. Specifically, GraphDEC employs an autoencoder to extract low-dimensional representations from both reference and target proteomic data, enabling the construction of similarity relationships among samples. These relationships, combined with proteomic data, are processed by a graph neural network that integrates a multi-channel mechanism and a hybrid neighborhood-aware approach to learn highly effective representations. To optimize the model, GraphDEC utilizes multiple loss functions, including triplet loss, domain adaptation loss, and Mean Squared Error (MSE) loss, ensuring robust performance and mitigating batch effects. Benchmark experiments demonstrate that GraphDEC achieves state-of-the-art performance across diverse synthetic proteomic datasets from different sequencing technologies and real-world spatial proteomic datasets. Furthermore, GraphDEC exhibits strong generalization capabilities, showing high efficiency when applied to cross-species proteomic data and even transcriptomics.

摘要

蛋白质组学测序的最新进展显著增强了我们在复杂组织中探索细胞类型特异性特征的能力,为疾病机制提供了关键见解。然而,当前的蛋白质组学技术常常分辨率较低,导致在分析过程中多种细胞类型混合在一起。为了解决这一局限性,人们开发了细胞类型反卷积方法,以便从蛋白质组学数据中推断细胞组成。虽然大多数现有的反卷积方法都集中在转录组学上,但它们在蛋白质组学中的应用受到转录组和蛋白质组数据之间弱相关性和不同量化的阻碍。尽管最近出现了一些特定于蛋白质组学的反卷积方法,但它们的能力和性能仍然有限,部分原因是它们只从单个样本中提取共享信息,而忽略了它们之间的高阶关系。在此,提出了GraphDEC,这是一种基于图神经网络的新颖方法,用于解读蛋白质组学分析数据中的细胞类型比例。GraphDEC首先从单细胞蛋白质组学数据模拟批量样本以创建参考数据,然后将其用于推断目标数据集中的细胞类型。具体而言,GraphDEC采用自动编码器从参考和目标蛋白质组学数据中提取低维表示,从而能够构建样本之间的相似关系。这些关系与蛋白质组学数据相结合,由一个集成了多通道机制和混合邻域感知方法的图神经网络进行处理,以学习高效的表示。为了优化模型,GraphDEC利用多种损失函数,包括三元组损失、域适应损失和均方误差(MSE)损失,确保稳健的性能并减轻批次效应。基准实验表明,GraphDEC在来自不同测序技术的各种合成蛋白质组学数据集和真实世界空间蛋白质组学数据集中均实现了领先的性能。此外,GraphDEC表现出强大的泛化能力,在应用于跨物种蛋白质组学数据甚至转录组学时显示出高效率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验