CPI-Pred：一种用于预测化合物-蛋白质相互作用功能参数的深度学习框架。

CPI-Pred: A deep learning framework for predicting functional parameters of compound-protein interactions.

作者信息

Xu Zhiqing, Barghout Rana Ahmed, Wu Jinghao, Garg Dhruv, Song Yun S, Mahadevan Radhakrishnan

机构信息

Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada.

Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, India.

出版信息

bioRxiv. 2025 Jan 21:2025.01.16.633372. doi: 10.1101/2025.01.16.633372.

DOI:10.1101/2025.01.16.633372

PMID:39896624

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11785036/

Abstract

Recent advancements in deep learning have enabled functional annotation of genome sequences, facilitating the discovery of new enzymes and metabolites. However, accurately predicting compound-protein interactions (CPI) from sequences remains challenging due to the complexity of these interactions and the sparsity and heterogeneity of available data, which constrain the generalization of patterns across their solution space. In this work, we introduce CPI-Pred, a versatile deep learning model designed to predict compound-protein interaction function. CPI-Pred integrates compound representations derived from a novel message-passing neural network and enzyme representations generated by state-of-the-art protein language models, leveraging innovative sequence pooling and cross-attention mechanisms. To train and evaluate CPI-Pred, we compiled the largest dataset of enzyme kinetic parameters to date, encompassing four key metrics: the Michaelis-Menten constant ( ), enzyme turnover number ( ), catalytic efficiency ( ), and inhibition constant ( ). These kinetic parameters are critical for elucidating enzyme function in metabolic contexts and understanding their regulation by compounds within biological networks. We demonstrate that CPI-Pred can predict diverse types of CPI using only the amino acid sequence of enzymes and structural representations of compounds, outperforming state-of-the-art models on unseen compounds and structurally dissimilar enzymes. Over workflow provides a valuable tool for tackling a range of metabolic engineering challenges, including the designing of novel enzyme sequences and compounds, such as enzyme inhibitors. Additionally, the datasets curated in this study offer a valuable resource for the scientific community, serving as a benchmark for machine learning models focused on enzyme activity and promiscuity prediction.

摘要

深度学习的最新进展使得基因组序列的功能注释成为可能，有助于发现新的酶和代谢物。然而，由于这些相互作用的复杂性以及可用数据的稀疏性和异质性，从序列中准确预测化合物 - 蛋白质相互作用（CPI）仍然具有挑战性，这限制了模式在其解空间中的泛化。在这项工作中，我们引入了CPI - Pred，这是一种通用的深度学习模型，旨在预测化合物 - 蛋白质相互作用功能。CPI - Pred整合了源自新型消息传递神经网络的化合物表示和由最先进的蛋白质语言模型生成的酶表示，利用了创新的序列池化和交叉注意力机制。为了训练和评估CPI - Pred，我们编制了迄今为止最大的酶动力学参数数据集，涵盖四个关键指标：米氏常数（）、酶周转数（）、催化效率（）和抑制常数（）。这些动力学参数对于阐明代谢环境中的酶功能以及理解生物网络中化合物对它们的调节至关重要。我们证明，CPI - Pred仅使用酶的氨基酸序列和化合物的结构表示就能预测多种类型的CPI，在未见化合物和结构不同的酶上优于现有模型。我们的工作流程为应对一系列代谢工程挑战提供了一个有价值的工具，包括设计新型酶序列和化合物，如酶抑制剂。此外，本研究整理的数据集为科学界提供了宝贵的资源，可作为专注于酶活性和混杂性预测的机器学习模型的基准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8175/11785036/0f922f61ef55/nihpp-2025.01.16.633372v1-f0001.jpg

相似文献

CPI-Pred: A deep learning framework for predicting functional parameters of compound-protein interactions.CPI-Pred：一种用于预测化合物-蛋白质相互作用功能参数的深度学习框架。

bioRxiv. 2025 Jan 21:2025.01.16.633372. doi: 10.1101/2025.01.16.633372.

A general prediction model for compound-protein interactions based on deep learning.一种基于深度学习的化合物-蛋白质相互作用通用预测模型。

Front Pharmacol. 2024 Sep 4;15:1465890. doi: 10.3389/fphar.2024.1465890. eCollection 2024.

CPI-GGS: A deep learning model for predicting compound-protein interaction based on graphs and sequences.CPI-GGS：一种基于图形和序列预测化合物-蛋白质相互作用的深度学习模型。

Comput Biol Chem. 2025 Apr;115:108326. doi: 10.1016/j.compbiolchem.2024.108326. Epub 2024 Dec 29.

CAT-CPI: Combining CNN and transformer to learn compound image features for predicting compound-protein interactions.CAT-CPI：结合卷积神经网络（CNN）和变压器（Transformer）学习化合物图像特征以预测化合物-蛋白质相互作用

Front Mol Biosci. 2022 Sep 15;9:963912. doi: 10.3389/fmolb.2022.963912. eCollection 2022.

MMCL-CPI: A multi-modal compound-protein interaction prediction model incorporating contrastive learning pre-training.MMCL-CPI：一种结合对比学习预训练的多模态化合物-蛋白质相互作用预测模型。

Comput Biol Chem. 2024 Oct;112:108137. doi: 10.1016/j.compbiolchem.2024.108137. Epub 2024 Jul 25.

Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences.基于图和序列神经网络端到端学习的化合物-蛋白质相互作用预测。

Bioinformatics. 2019 Jan 15;35(2):309-318. doi: 10.1093/bioinformatics/bty535.

An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model.一种基于简化同构图卷积网络和预训练语言模型预测化合物-蛋白质相互作用的端到端方法。

J Cheminform. 2024 Jun 7;16(1):67. doi: 10.1186/s13321-024-00862-9.

MDL-CPI: Multi-view deep learning model for compound-protein interaction prediction.MDL-CPI：用于化合物-蛋白质相互作用预测的多视图深度学习模型。

Methods. 2022 Aug;204:418-427. doi: 10.1016/j.ymeth.2022.01.008. Epub 2022 Jan 31.

A deep learning method for predicting molecular properties and compound-protein interactions.一种用于预测分子性质和化合物-蛋白质相互作用的深度学习方法。

J Mol Graph Model. 2022 Dec;117:108283. doi: 10.1016/j.jmgm.2022.108283. Epub 2022 Aug 17.

BACPI: a bi-directional attention neural network for compound-protein interaction and binding affinity prediction.BACPI：一种用于化合物-蛋白质相互作用和结合亲和力预测的双向注意力神经网络。

Bioinformatics. 2022 Mar 28;38(7):1995-2002. doi: 10.1093/bioinformatics/btac035.

本文引用的文献

OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization.OpenFold：重新训练 AlphaFold2 可深入了解其学习机制和泛化能力。

Nat Methods. 2024 Aug;21(8):1514-1524. doi: 10.1038/s41592-024-02272-z. Epub 2024 May 14.

Accurate structure prediction of biomolecular interactions with AlphaFold 3.利用 AlphaFold 3 进行生物分子相互作用的精确结构预测。

Nature. 2024 Jun;630(8016):493-500. doi: 10.1038/s41586-024-07487-w. Epub 2024 May 8.

Convolutions are competitive with transformers for protein sequence pretraining.卷积运算在蛋白质序列预训练方面与转换器竞争。

Cell Syst. 2024 Mar 20;15(3):286-294.e2. doi: 10.1016/j.cels.2024.01.008. Epub 2024 Feb 29.

Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning.使用机器学习和深度学习预测动力学特征未知的酶的周转率。

Nat Commun. 2023 Jul 12;14(1):4139. doi: 10.1038/s41467-023-39840-4.

A general model to predict small molecule substrates of enzymes based on machine and deep learning.基于机器学习和深度学习的酶小分子底物通用预测模型。

Nat Commun. 2023 May 15;14(1):2787. doi: 10.1038/s41467-023-38347-2.

Enzyme function prediction using contrastive learning.使用对比学习进行酶功能预测。

Science. 2023 Mar 31;379(6639):1358-1363. doi: 10.1126/science.adf2465. Epub 2023 Mar 30.

HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for Highly Accurate Protein-Ligand Binding Affinity Prediction.HAC-Net：一种基于混合注意力的卷积神经网络，用于高精度蛋白质-配体结合亲和力预测。

J Chem Inf Model. 2023 Apr 10;63(7):1947-1960. doi: 10.1021/acs.jcim.3c00251. Epub 2023 Mar 29.

Nucleotide augmentation for machine learning-guided protein engineering.用于机器学习引导蛋白质工程的核苷酸增强

Bioinform Adv. 2022 Dec 9;3(1):vbac094. doi: 10.1093/bioadv/vbac094. eCollection 2023.

Enzyme Activity Prediction of Sequence Variants on Novel Substrates using Improved Substrate Encodings and Convolutional Pooling.使用改进的底物编码和卷积池化预测新型底物上序列变体的酶活性

Proc Mach Learn Res. 2022 Nov;165:78-87.

UniProt: the Universal Protein Knowledgebase in 2023.UniProt：2023 年的通用蛋白质知识库。

Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CPI-Pred：一种用于预测化合物-蛋白质相互作用功能参数的深度学习框架。

CPI-Pred: A deep learning framework for predicting functional parameters of compound-protein interactions.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献