National Center for Genetic Engineering and Biotechnology, Biochemical Engineering and Systems Biology Research Group, National Science and Technology Development Agency, King Mongkut's University of Technology Thonburi, Khun Thian Bangkok 10150, Thailand.
Genes (Basel). 2021 Jan 21;12(2):137. doi: 10.3390/genes12020137.
Antimicrobial peptides (AMPs) are natural peptides possessing antimicrobial activities. These peptides are important components of the innate immune system. They are found in various organisms. AMP screening and identification by experimental techniques are laborious and time-consuming tasks. Alternatively, computational methods based on machine learning have been developed to screen potential AMP candidates prior to experimental verification. Although various AMP prediction programs are available, there is still a need for improvement to reduce false positives (FPs) and to increase the predictive accuracy. In this work, several well-known single and ensemble machine learning approaches have been explored and evaluated based on balanced training datasets and two large testing datasets. We have demonstrated that the developed program with various predictive models has high performance in differentiating between AMPs and non-AMPs. Thus, we describe the development of a program for the prediction and recognition of AMPs using MaxProbVote, which is an ensemble model. Moreover, to increase prediction efficiency, the ensemble model was integrated with a new hybrid feature based on logistic regression. The ensemble model integrated with the hybrid feature can effectively increase the prediction sensitivity of the developed program called Ensemble-AMPPred, resulting in overall improvements in terms of both sensitivity and specificity compared to those of currently available programs.
抗菌肽(AMPs)是具有抗菌活性的天然肽。这些肽是先天免疫系统的重要组成部分。它们存在于各种生物体中。通过实验技术筛选和鉴定抗菌肽是一项费力且耗时的任务。或者,可以开发基于机器学习的计算方法,在进行实验验证之前筛选潜在的 AMP 候选物。尽管有各种 AMP 预测程序可用,但仍需要改进以减少假阳性(FP)并提高预测准确性。在这项工作中,我们探索并评估了几种基于平衡训练数据集和两个大型测试数据集的知名单和集成机器学习方法。我们证明了,使用 MaxProbVote 开发的具有各种预测模型的程序在区分 AMP 和非 AMP 方面具有很高的性能。因此,我们描述了使用 MaxProbVote 开发的一种用于预测和识别 AMP 的程序,MaxProbVote 是一种集成模型。此外,为了提高预测效率,将集成模型与基于逻辑回归的新混合特征集成在一起。集成模型与混合特征的集成可以有效地提高称为 Ensemble-AMPPred 的开发程序的预测灵敏度,从而在灵敏度和特异性方面与现有程序相比都得到了整体改善。