CPE-Pro：一种用于蛋白质表征和起源评估的结构敏感深度学习方法。

CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation.

作者信息

Gou Wenrui, Ge Wenhui, Tan Yang, Li Mingchen, Fan Guisheng, Yu Huiqun

机构信息

School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.

Shanghai Engineering Research Center of Smart Energy, Shanghai, 201103, China.

出版信息

Interdiscip Sci. 2025 Jun 8. doi: 10.1007/s12539-025-00732-4.

DOI:10.1007/s12539-025-00732-4

PMID:40483648

Abstract

Protein structures are fundamental to understanding their functions and interactions. With the continuous advancement of protein structure prediction methods, structure databases are rapidly expanding. Identifying the origin of protein structures is crucial for assessing the reliability of experimental resolution and computational prediction methods, as well as for guiding downstream biological research. Existing protein representation approaches often fail to capture subtle yet critical structural differences, posing challenges for precise structural traceability. To address this, we propose a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro), for the representation and origin evaluation of protein structures. CPE-Pro integrates a pre-trained protein Structural Sequence Language Model (SSLM) and Geometric Vector Perceptron-Graph Neural Network (GVP-GNN) to learn structure-aware protein representations and capture structural differences, enabling accurate classification across four origins of structural data. Preliminary results indicate that, compared to large-scale protein language models trained on extensive amino acid sequences, structural sequences enriched with local structural features enable the model to capture more informative protein characteristics, thereby enhancing and refining protein representations. Future research directions include extending the architecture to additional protein structure paradigms and developing evaluation methodologies for low-pLDDT predicted structures, providing more effective tools for protein structure analysis. The code, model weights, and all relevant materials are available at https://github.com/wr1102/CPE-Pro .

摘要

蛋白质结构对于理解其功能和相互作用至关重要。随着蛋白质结构预测方法的不断进步，结构数据库正在迅速扩展。确定蛋白质结构的起源对于评估实验分辨率和计算预测方法的可靠性，以及指导下游生物学研究至关重要。现有的蛋白质表示方法往往无法捕捉到细微但关键的结构差异，给精确的结构溯源带来了挑战。为了解决这个问题，我们提出了一种结构敏感的监督深度学习模型，即蛋白质结构晶体与预测评估器（CPE-Pro），用于蛋白质结构的表示和起源评估。CPE-Pro集成了预训练的蛋白质结构序列语言模型（SSLM）和几何向量感知器-图神经网络（GVP-GNN），以学习结构感知的蛋白质表示并捕捉结构差异，从而能够对结构数据的四个起源进行准确分类。初步结果表明，与在广泛的氨基酸序列上训练的大规模蛋白质语言模型相比，富含局部结构特征的结构序列使模型能够捕捉到更多信息丰富的蛋白质特征，从而增强和优化蛋白质表示。未来的研究方向包括将该架构扩展到其他蛋白质结构范式，以及为低pLDDT预测结构开发评估方法，为蛋白质结构分析提供更有效的工具。代码、模型权重和所有相关材料可在https://github.com/wr1102/CPE-Pro获取。

相似文献

CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation.

Interdiscip Sci. 2025 Jun 8. doi: 10.1007/s12539-025-00732-4.

GGN-GO: geometric graph networks for predicting protein function by multi-scale structure features.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae559.

Structure-aware protein self-supervised learning.

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad189.

DSSGNN-PPI: A Protein-Protein Interactions prediction model based on Double Structure and Sequence graph neural networks.

Comput Biol Med. 2024 Jul;177:108669. doi: 10.1016/j.compbiomed.2024.108669. Epub 2024 May 29.

Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures.

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad360.

Multi-Scale Representation Learning for Protein Fitness Prediction.

ArXiv. 2024 Dec 2:arXiv:2412.01108v1.

Drug-target affinity prediction with extended graph learning-convolutional networks.

BMC Bioinformatics. 2024 Feb 16;25(1):75. doi: 10.1186/s12859-024-05698-6.

Identifying B-cell epitopes using AlphaFold2 predicted structures and pretrained language model.

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad187.

Multimodal geometric learning for antimicrobial peptide identification by leveraging alphafold2-predicted structures and surface features.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf261.

Graph-Aware AURALSTM: An Attentive Unified Representation Architecture with BiLSTM for Enhanced Molecular Property Prediction.

Mol Divers. 2025 Apr 25. doi: 10.1007/s11030-025-11197-4.

本文引用的文献

Proteins with alternative folds reveal blind spots in AlphaFold-based protein structure prediction.

Curr Opin Struct Biol. 2025 Feb;90:102973. doi: 10.1016/j.sbi.2024.102973. Epub 2025 Jan 4.

Bilingual language model for protein sequence and structure.

NAR Genom Bioinform. 2024 Nov 15;6(4):lqae150. doi: 10.1093/nargab/lqae150. eCollection 2024 Dec.

Simple, Efficient, and Scalable Structure-Aware Adapter Boosts Protein Language Models.

J Chem Inf Model. 2024 Aug 26;64(16):6338-6349. doi: 10.1021/acs.jcim.4c00689. Epub 2024 Aug 7.

DeepSS2GO: protein function prediction from secondary structure.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae196.

Protein Engineering with Lightweight Graph Denoising Neural Networks.

J Chem Inf Model. 2024 May 13;64(9):3650-3661. doi: 10.1021/acs.jcim.4c00036. Epub 2024 Apr 17.

ProGen2: Exploring the boundaries of protein language models.

Cell Syst. 2023 Nov 15;14(11):968-978.e3. doi: 10.1016/j.cels.2023.10.002. Epub 2023 Oct 30.

De novo design of protein structure and function with RFdiffusion.

Nature. 2023 Aug;620(7976):1089-1100. doi: 10.1038/s41586-023-06415-8. Epub 2023 Jul 11.

Fast and accurate protein structure search with Foldseek.

Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.

Structure-Based Drug Discovery with Deep Learning.

Chembiochem. 2023 Jul 3;24(13):e202200776. doi: 10.1002/cbic.202200776. Epub 2023 Jun 13.

ProtGPT2 is a deep unsupervised language model for protein design.

Nat Commun. 2022 Jul 27;13(1):4348. doi: 10.1038/s41467-022-32007-7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CPE-Pro：一种用于蛋白质表征和起源评估的结构敏感深度学习方法。

CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献