Suppr超能文献

IECata:可解释的双线性注意力网络和证据深度学习改进了酶的催化效率预测

IECata: interpretable bilinear attention network and evidential deep learning improve the catalytic efficiency prediction of enzymes.

作者信息

Wang Jingjing, Zhao Yanpeng, Yang Zhijiang, Yao Ge, Han Penggang, Liu Jiajia, Chen Chang, Zan Peng, Wan Xiukun, Bo Xiaochen, Jiang Hui

机构信息

State Key Laboratory of NBC Protection for Civilian, No. 37, South Central Street, Changping District, Beijing 102205, China.

School of Medicine, Shanghai University, No. 99, Shangda Road, Baoshan District, Shanghai 200444, China.

出版信息

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf283.

Abstract

Enzyme catalytic efficiency (kcat/Km) is a key parameter for identifying high-activity enzymes. Recently, deep learning techniques have demonstrated the potential for fast and accurate kcat/Km prediction. However, three challenges remain: (i) the limited size of the available kcat/Km dataset hinders the development of deep learning models; (ii) the model predictions lack reliable confidence estimates; and (iii) models lack interpretable insights into enzyme-catalyzed reactions. To address these challenges, we proposed IECata, a kcat/Km prediction model that provides uncertainty estimation and interpretability. IECata collected a dataset of 11 815 kcat/Km entries from the BRENDA and SABIO-RK databases, along with an out-of-domain test dataset of 806 entries from the literature. By introducing evidential deep learning, IECata provides uncertainty estimates for kcat/Km predictions. Moreover, it uses a bilinear attention mechanism to focus on learning crucial local interactions to interpret the key residues and substrate atoms in enzyme-catalyzed reactions. Testing results indicate that the prediction performance of IECata exceeds that of state-of-the-art benchmark models. More importantly, it provides a reliable confidence assessment for these predictions. Case studies further highlight that the incorporation of uncertainty in screening for highly active enzymes can effectively increase the hit ratio, thereby improving the efficiency of experimental validation and accelerating directed enzyme evolution. To facilitate researchers' use of IECata, we have developed an online prediction platform: http://mathtc.nscc-tj.cn/cataai/.

摘要

酶催化效率(kcat/Km)是鉴定高活性酶的关键参数。最近,深度学习技术已展现出快速准确预测kcat/Km的潜力。然而,仍存在三个挑战:(i)可用的kcat/Km数据集规模有限,阻碍了深度学习模型的开发;(ii)模型预测缺乏可靠的置信度估计;(iii)模型缺乏对酶催化反应的可解释性见解。为应对这些挑战,我们提出了IECata,这是一种能提供不确定性估计和可解释性的kcat/Km预测模型。IECata从BRENDA和SABIO-RK数据库收集了一个包含11815个kcat/Km条目的数据集,以及一个来自文献的806个条目的域外测试数据集。通过引入证据深度学习,IECata为kcat/Km预测提供不确定性估计。此外,它使用双线性注意力机制专注于学习关键的局部相互作用,以解释酶催化反应中的关键残基和底物原子。测试结果表明,IECata的预测性能超过了最先进的基准模型。更重要的是,它为这些预测提供了可靠的置信度评估。案例研究进一步突出了在筛选高活性酶时纳入不确定性可以有效提高命中率,从而提高实验验证效率并加速定向酶进化。为方便研究人员使用IECata,我们开发了一个在线预测平台:http://mathtc.nscc-tj.cn/cataai/

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50a1/12205960/5f0fb614f84e/bbaf283f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验