Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia.
Department of Microbiology and Immunology, the Peter Doherty Institute for Infection and Immunity, the University of Melbourne, Australia.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab083.
Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of AMPs faces many difficulties due to the limitations of current methods. Given its significance, more than 30 computational methods have been developed for accurate prediction of AMPs. These approaches show high diversity in their data set size, data quality, core algorithms, feature extraction, feature selection techniques and evaluation strategies. Here, we provide a comprehensive survey on a variety of current approaches for AMP identification and point at the differences between these methods. In addition, we evaluate the predictive performance of the surveyed tools based on an independent test data set containing 1536 AMPs and 1536 non-AMPs. Furthermore, we construct six validation data sets based on six different common AMP databases and compare different computational methods based on these data sets. The results indicate that amPEPpy achieves the best predictive performance and outperforms the other compared methods. As the predictive performances are affected by the different data sets used by different methods, we additionally perform the 5-fold cross-validation test to benchmark different traditional machine learning methods on the same data set. These cross-validation results indicate that random forest, support vector machine and eXtreme Gradient Boosting achieve comparatively better performances than other machine learning methods and are often the algorithms of choice of multiple AMP prediction tools.
抗菌肽 (AMPs) 是一类独特而多样的分子,在众多生物学过程和细胞功能中发挥着关键作用。由于抗菌药物耐药性日益成为一个新兴的全球关注问题,近年来,与 AMP 相关的研究变得越来越受欢迎。由于当前方法的局限性,系统地实验鉴定 AMP 面临许多困难。鉴于其重要性,已经开发了 30 多种计算方法来准确预测 AMP。这些方法在数据集大小、数据质量、核心算法、特征提取、特征选择技术和评估策略方面表现出高度的多样性。在这里,我们对当前各种 AMP 识别方法进行了全面调查,并指出了这些方法之间的差异。此外,我们基于包含 1536 个 AMP 和 1536 个非 AMP 的独立测试数据集评估了被调查工具的预测性能。此外,我们还基于六个常用的 AMP 数据库构建了六个验证数据集,并基于这些数据集比较了不同的计算方法。结果表明,amPEPpy 具有最佳的预测性能,优于其他比较方法。由于预测性能受到不同方法使用的不同数据集的影响,我们还在同一数据集上对不同传统机器学习方法进行了 5 折交叉验证测试。这些交叉验证结果表明,随机森林、支持向量机和 eXtreme Gradient Boosting 比其他机器学习方法具有更好的性能,并且通常是多个 AMP 预测工具选择的算法。