Protein Interaction Laboratory, Department of Biotechnology, Mepco Schlenk Engineering College, Mepco Nagar, Sivakasi, Tamil Nadu, 626005, India.
International Research Centre of Spectroscopy and Quantum Chemistry -IRC SQC, Siberian Federal University, Krasnoyarsk, 660074, Russia.
J Microbiol. 2022 Jul;60(7):756-765. doi: 10.1007/s12275-022-2044-9. Epub 2022 Jun 22.
Bacteria exist in natural environments for most of their life as complex, heterogeneous, and multicellular aggregates. Under these circumstances, critical cell functions are controlled by several signaling molecules known as quorum sensing (QS) molecules. In Gram-positive bacteria, peptides are deployed as QS molecules. The development of antibodies against such QS molecules has been identified as a promising therapeutic intervention for bacterial control. Hence, the identification of QS peptides has received considerable attention. Availability of a fast and reliable predictive model to effectively identify QS peptides can help the existing high throughput experiments. In this study, a stacked generalization ensemble model with Gradient Boosting Machine (GBM)-based feature selection, namely EnsembleQS was developed to predict QS peptides with high accuracy. On selected GBM features (791D), the EnsembleQS outperformed finely tuned baseline classifiers and demonstrated robust performance, indicating the superiority of the model. The accuracy of EnsembleQS is 4% higher than those resulting from ensemble model on hybrid dataset. When evaluating an independent data set of 40 QS peptides, the EnsembleQS model showed an accuracy of 93.4% with Matthew's Correlation Coefficient (MCC) and area under the ROC curve (AUC) values of 0.91 and 0.951, respectively. These results suggest that EnsembleQS will be a useful computational framework for predicting QS peptides and will efficiently support proteomics research. The source code and all datasets used in this study are publicly available at https://github.com/proteinexplorers/EnsembleQS .
细菌在其自然环境中大多数时间以复杂、异质和多细胞聚集的形式存在。在这些情况下,关键的细胞功能由几种信号分子控制,这些信号分子被称为群体感应(QS)分子。在革兰氏阳性菌中,肽被用作 QS 分子。针对这些 QS 分子的抗体的开发已被确定为控制细菌的一种有前途的治疗干预措施。因此,QS 肽的鉴定受到了相当大的关注。开发一种快速可靠的预测模型来有效地识别 QS 肽,可以帮助现有的高通量实验。在这项研究中,开发了一种基于梯度提升机(GBM)的特征选择的堆叠泛化集成模型 EnsembleQS,用于高精度地预测 QS 肽。在选定的 GBM 特征(791D)上,EnsembleQS 优于微调的基线分类器,并表现出稳健的性能,表明该模型的优越性。EnsembleQS 的准确率比混合数据集上的集成模型高出 4%。当评估 40 个 QS 肽的独立数据集时,EnsembleQS 模型的准确率为 93.4%,马修斯相关系数(MCC)和 ROC 曲线下面积(AUC)分别为 0.91 和 0.951。这些结果表明,EnsembleQS 将成为预测 QS 肽的有用计算框架,并将有效地支持蛋白质组学研究。本研究中使用的源代码和所有数据集均可在 https://github.com/proteinexplorers/EnsembleQS 上公开获取。