Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India.
Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Mohali, 160062, Punjab, India.
Med Biol Eng Comput. 2021 Nov;59(11-12):2397-2408. doi: 10.1007/s11517-021-02443-6. Epub 2021 Oct 11.
The ubiquitous antimicrobial peptides (AMPs), with a broad range of antimicrobial activities, represent a great promise for combating the multi-drug resistant infections. In this study, using a large and diverse set of AMPs (2638) and non-AMPs (3700), we have explored a variety of machine learning classifiers to build in silico models for AMP prediction, including Random Forest (RF), k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), and ensemble learning. Among the various models generated, the RF classifier-based model top-performed in both the internal [Accuracy: 91.40%, Precision: 89.37%, Sensitivity: 90.05%, and Specificity: 92.36%] and external validations [Accuracy: 89.43%, Precision: 88.92%, Sensitivity: 85.21%, and Specificity: 92.43%]. In addition, the RF classifier-based model correctly predicted the known AMPs and non-AMPs; those kept aside as an additional external validation set. The performance assessment revealed three features viz. ChargeD2001, PAAC12 (pseudo amino acid composition), and polarity T13 that are likely to play vital roles in the antimicrobial activity of AMPs. The developed RF-based classification model may further be useful in the design and prediction of the novel potential AMPs.
无处不在的抗菌肽 (AMPs) 具有广泛的抗菌活性,为对抗多药耐药感染带来了巨大希望。在这项研究中,我们使用了大量不同的 AMPs(2638 种)和非 AMPs(3700 种),探索了各种机器学习分类器来构建 AMP 预测的计算模型,包括随机森林 (RF)、k-最近邻 (k-NN)、支持向量机 (SVM)、决策树 (DT)、朴素贝叶斯 (NB)、二次判别分析 (QDA) 和集成学习。在所生成的各种模型中,基于 RF 分类器的模型在内部 [准确性:91.40%,精度:89.37%,敏感性:90.05%,特异性:92.36%] 和外部验证 [准确性:89.43%,精度:88.92%,敏感性:85.21%,特异性:92.43%] 中表现最佳。此外,基于 RF 分类器的模型正确预测了已知的 AMPs 和非 AMPs;这些 AMPs 被保留作为额外的外部验证集。性能评估揭示了三个特征,即 ChargeD2001、PAAC12(伪氨基酸组成)和极性 T13,它们可能在 AMPs 的抗菌活性中发挥重要作用。基于 RF 的分类模型的开发可能进一步有助于新型潜在 AMPs 的设计和预测。