ProtLoc-GRPO：使用基于图的模型和强化学习进行细胞系特异性亚细胞定位预测。

ProtLoc-GRPO: Cell line-specific subcellular localization prediction using a graph-based model and reinforcement learning.

作者信息

Zeng Shuai, Zhang Weinan, Li Chaohan, Jiang Yuexu, Wang Duolin, Shao Qing, Xu Dong

机构信息

Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA.

Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.

出版信息

bioRxiv. 2025 Jul 22:2025.07.17.665451. doi: 10.1101/2025.07.17.665451.

DOI:10.1101/2025.07.17.665451

PMID:40777512

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12330487/

Abstract

Subcellular localization prediction is crucial for understanding protein functions and cellular processes. Subcellular localization is dependent on tissue and cell lines derived from different cell types. Predicting cell line-specific subcellular localization using the information of protein-protein interactions (PPIs) offers deeper insights into dynamic cellular organization and molecular mechanisms. However, many existing PPI networks contain systematic errors that limit prediction accuracy. In this study, we propose a reinforcement learning approach, ProtLoc-GRPO, to enhance subcellular localization prediction by optimizing the structure of the underlying PPI network. ProtLoc-GRPO learns to rank and retain the most informative PPI edges to maximize the macro-F1 score for cell line-specific subcellular localization. Our approach yields a 7% improvement in macro-F1 score over the baseline. We further evaluate its robustness across various edge pruning rates and benchmark it against conventional pruning strategies. Results show that our proposed method consistently outperforms existing approaches. To our knowledge, this work represents the first study to predict cell line-specific protein subcellular localization and the first application of the Group Relative Policy Optimization (GRPO) framework to a graph-based model for bioinformatics tasks.

摘要

亚细胞定位预测对于理解蛋白质功能和细胞过程至关重要。亚细胞定位取决于源自不同细胞类型的组织和细胞系。利用蛋白质-蛋白质相互作用（PPI）信息预测细胞系特异性亚细胞定位，能更深入地了解动态细胞组织和分子机制。然而，许多现有的PPI网络存在系统性错误，限制了预测准确性。在本研究中，我们提出一种强化学习方法ProtLoc-GRPO，通过优化潜在PPI网络结构来增强亚细胞定位预测。ProtLoc-GRPO学习对最具信息性的PPI边进行排序和保留，以最大化细胞系特异性亚细胞定位的宏F1分数。我们的方法在宏F1分数上比基线提高了7%。我们进一步评估了其在各种边剪枝率下的稳健性，并与传统剪枝策略进行了基准测试。结果表明，我们提出的方法始终优于现有方法。据我们所知，这项工作代表了首次预测细胞系特异性蛋白质亚细胞定位的研究，也是首次将组相对策略优化（GRPO）框架应用于基于图的生物信息学任务模型。