Qian Wenjia, Wang Xiaorui, Huang Yuansheng, Kang Yu, Pan Peichen, Hsieh Chang-Yu, Hou Tingjun
College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
J Chem Inf Model. 2025 Jan 13;65(1):187-200. doi: 10.1021/acs.jcim.4c01801. Epub 2024 Dec 25.
Enzymes are ubiquitous catalysts with enormous application potential in biomedicine, green chemistry, and biotechnology. However, accurately predicting whether a molecule serves as a substrate for a specific enzyme, especially for novel entities, remains a significant challenge. Compared with traditional experimental methods, computational approaches are much more resource-efficient and time-saving, but they often compromise on accuracy. To address this, we introduce the molecule-enzyme interaction (MEI) model, a novel machine learning framework designed to predict the probability that a given molecule is a substrate for a specified enzyme with high accuracy. Utilizing a comprehensive data set that encapsulates extensive information on enzymatic reactions and enzyme sequences, the MEI model seamlessly combines atomic environmental data with amino acid sequence features through an advanced attention mechanism within a hierarchical neural network. Empirical evaluations have confirmed that the MEI model outperforms the current state-of-the-art model by at least 6.7% in prediction accuracy and 8.5% in AUROC, underscoring its enhanced predictive capabilities. Additionally, the MEI model demonstrates remarkable generalization across data sets of varying qualities and sizes. This adaptability is further evidenced by its successful application in diverse areas, such as predicting interactions within the CYP450 enzyme family and achieving an outstanding accuracy of 90.5% in predicting the enzymatic breakdown of complex plastics within environmental applications. These examples illustrate the model's ability to effectively transfer knowledge from coarsely annotated enzyme databases to smaller, high-precision data sets, robustly modeling both sparse and high-quality databases. We believe that this versatility firmly establishes the MEI model as a foundational tool in enzyme research with immense potential to extend beyond its original scope.
酶是普遍存在的催化剂,在生物医学、绿色化学和生物技术领域具有巨大的应用潜力。然而,准确预测一个分子是否是特定酶的底物,尤其是对于新的分子实体,仍然是一项重大挑战。与传统实验方法相比,计算方法在资源利用效率和时间节省方面更具优势,但往往在准确性上有所妥协。为了解决这个问题,我们引入了分子-酶相互作用(MEI)模型,这是一种新颖的机器学习框架,旨在高精度地预测给定分子是特定酶底物的概率。利用一个包含酶促反应和酶序列广泛信息的综合数据集,MEI模型通过分层神经网络中的先进注意力机制,将原子环境数据与氨基酸序列特征无缝结合。实证评估证实,MEI模型在预测准确性方面比当前最先进的模型至少高出6.7%,在受试者工作特征曲线下面积(AUROC)方面高出8.5%,突出了其增强的预测能力。此外,MEI模型在不同质量和大小的数据集上表现出显著的泛化能力。其在不同领域的成功应用进一步证明了这种适应性,例如预测细胞色素P450酶家族内的相互作用,以及在环境应用中预测复杂塑料的酶促分解时达到了90.5%的出色准确率。这些例子说明了该模型能够有效地将知识从粗略注释的酶数据库转移到更小的高精度数据集,对稀疏和高质量数据库都能进行稳健建模。我们相信,这种多功能性牢固地确立了MEI模型作为酶研究中的基础工具,具有超越其原始范围的巨大潜力。