State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China.
Environ Microbiol Rep. 2022 Aug;14(4):616-631. doi: 10.1111/1758-2229.13068. Epub 2022 Apr 10.
Acyl-homoserine-lactones (AHLs), as the major quorum sensing (QS) signalling molecules in Gram-negative bacteria, have shown great application potential in regulating biological nutrient removal process. The identification of AHLs synthases plays an essential role in in-depth research on QS mechanisms and applications of biological wastewater treatment processes. This work proposed the first prediction model for AHLs synthases based on machine learning algorithms, namely, AHLS-pred. The training dataset AHLS1400 and the independent testing dataset AHLS132 for AHLSs prediction were first established. Three sequence-based feature extraction methods are utilized to generate feature descriptors, namely, amino acid composition, dipeptide composition and G-gap dipeptide composition respectively. Subsequently, the optimal features were obtained based on the sorted feature descriptors (in F-score order) and the sequential forward search strategy. By comparing five different machine learning algorithms, the final prediction model is trained with support vector machine classifier on AHLS1400 in fivefold cross-validation with the best performance (ACC = 99.43%, MCC = 0.989, AUC = 0.997). The results show that AHLS-pred achieves an ACC of 94.70%, MCC of 0.894 and AUC of 0.995 on the independent testing dataset AHLS132. It demonstrates that AHLS-pred is a promising and powerful prediction method for accelerating the process of AHLSs computational identification.
酰高丝氨酸内酯(AHLs)作为革兰氏阴性菌中主要的群体感应(QS)信号分子,在调节生物养分去除过程方面显示出巨大的应用潜力。AHLs 合酶的鉴定在深入研究 QS 机制和生物废水处理过程的应用中起着至关重要的作用。本工作提出了第一个基于机器学习算法的 AHLs 合酶预测模型,即 AHLS-pred。首先建立了用于 AHLS 预测的训练数据集 AHLS1400 和独立测试数据集 AHLS132。利用三种基于序列的特征提取方法分别生成特征描述符,即氨基酸组成、二肽组成和 G-gap 二肽组成。随后,根据排序特征描述符(按 F-score 排序)和顺序前向搜索策略,获得最优特征。通过比较五种不同的机器学习算法,最终的预测模型在 AHLS1400 上使用支持向量机分类器进行五折交叉验证,性能最佳(ACC=99.43%,MCC=0.989,AUC=0.997)。结果表明,AHLS-pred 在独立测试数据集 AHLS132 上的 ACC 为 94.70%,MCC 为 0.894,AUC 为 0.995。这表明 AHLS-pred 是一种很有前途和强大的预测方法,可加速 AHLs 计算鉴定的过程。