Suppr超能文献

AHLS-pred:一种基于序列的新型酰基高丝氨酸内酯合酶预测器,使用机器学习算法。

AHLS-pred: a novel sequence-based predictor of acyl-homoserine-lactone synthases using machine learning algorithms.

机构信息

State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China.

出版信息

Environ Microbiol Rep. 2022 Aug;14(4):616-631. doi: 10.1111/1758-2229.13068. Epub 2022 Apr 10.

Abstract

Acyl-homoserine-lactones (AHLs), as the major quorum sensing (QS) signalling molecules in Gram-negative bacteria, have shown great application potential in regulating biological nutrient removal process. The identification of AHLs synthases plays an essential role in in-depth research on QS mechanisms and applications of biological wastewater treatment processes. This work proposed the first prediction model for AHLs synthases based on machine learning algorithms, namely, AHLS-pred. The training dataset AHLS1400 and the independent testing dataset AHLS132 for AHLSs prediction were first established. Three sequence-based feature extraction methods are utilized to generate feature descriptors, namely, amino acid composition, dipeptide composition and G-gap dipeptide composition respectively. Subsequently, the optimal features were obtained based on the sorted feature descriptors (in F-score order) and the sequential forward search strategy. By comparing five different machine learning algorithms, the final prediction model is trained with support vector machine classifier on AHLS1400 in fivefold cross-validation with the best performance (ACC = 99.43%, MCC = 0.989, AUC = 0.997). The results show that AHLS-pred achieves an ACC of 94.70%, MCC of 0.894 and AUC of 0.995 on the independent testing dataset AHLS132. It demonstrates that AHLS-pred is a promising and powerful prediction method for accelerating the process of AHLSs computational identification.

摘要

酰高丝氨酸内酯(AHLs)作为革兰氏阴性菌中主要的群体感应(QS)信号分子,在调节生物养分去除过程方面显示出巨大的应用潜力。AHLs 合酶的鉴定在深入研究 QS 机制和生物废水处理过程的应用中起着至关重要的作用。本工作提出了第一个基于机器学习算法的 AHLs 合酶预测模型,即 AHLS-pred。首先建立了用于 AHLS 预测的训练数据集 AHLS1400 和独立测试数据集 AHLS132。利用三种基于序列的特征提取方法分别生成特征描述符,即氨基酸组成、二肽组成和 G-gap 二肽组成。随后,根据排序特征描述符(按 F-score 排序)和顺序前向搜索策略,获得最优特征。通过比较五种不同的机器学习算法,最终的预测模型在 AHLS1400 上使用支持向量机分类器进行五折交叉验证,性能最佳(ACC=99.43%,MCC=0.989,AUC=0.997)。结果表明,AHLS-pred 在独立测试数据集 AHLS132 上的 ACC 为 94.70%,MCC 为 0.894,AUC 为 0.995。这表明 AHLS-pred 是一种很有前途和强大的预测方法,可加速 AHLs 计算鉴定的过程。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验