Suppr超能文献

基于机器学习的酶反应预测模型的探索与评估。

Exploration and Evaluation of Machine Learning-Based Models for Predicting Enzymatic Reactions.

机构信息

Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501 Japan.

Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan.

出版信息

J Chem Inf Model. 2020 Mar 23;60(3):1833-1843. doi: 10.1021/acs.jcim.9b00877. Epub 2020 Feb 27.

Abstract

Unannotated gene sequences in databases are increasing due to sequencing advances. Therefore, computational methods to predict functions of unannotated genes are needed. Moreover, novel enzyme discovery for metabolic engineering applications further encourages annotation of sequences. Here, enzyme functions are predicted using two general approaches, each including several machine learning algorithms. First, Enzyme-models (E-models) predict Enzyme Commission (EC) numbers from amino acid sequence information. Second, Substrate-Enzyme models (SE-models) are built to predict substrates of enzymatic reactions together with EC numbers, and Substrate-Enzyme-Product models (SEP-models) are built to predict substrates, products, and EC numbers. While accuracy of E-models is not optimal, SE-models and SEP-models predict EC numbers and reactions with high accuracy using all tested machine learning-based methods. For example, a single Random Forests-based SEP-model predicts EC first digits with an Average AUC score of over 0.94. Various metrics indicate that the current strategy of combining sequence and chemical structure information is effective at improving enzyme reaction prediction.

摘要

由于测序技术的进步,数据库中未注释的基因序列不断增加。因此,需要开发计算方法来预测未注释基因的功能。此外,新型酶的发现也促进了代谢工程应用中的序列注释。在这里,使用两种通用方法预测酶的功能,每种方法都包含几种机器学习算法。首先,酶模型 (E-model) 根据氨基酸序列信息预测酶委员会 (EC) 编号。其次,构建底物-酶模型 (SE-model) 来预测酶反应的底物以及 EC 编号,并且构建底物-酶-产物模型 (SEP-model) 来预测底物、产物和 EC 编号。虽然 E-model 的准确性不是最佳的,但 SE-model 和 SEP-model 使用所有测试的基于机器学习的方法以高精度预测 EC 编号和反应。例如,单个基于随机森林的 SEP-model 预测 EC 前几位的平均 AUC 得分超过 0.94。各种指标表明,结合序列和化学结构信息的当前策略在提高酶反应预测方面是有效的。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验