AmPEP：基于氨基酸属性分布模式和随机森林的抗菌肽序列预测。

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.

机构信息

Department of Computer and Information Science, University of Macau, Taipa, Macau, China.

出版信息

Sci Rep. 2018 Jan 26;8(1):1697. doi: 10.1038/s41598-018-19752-w.

DOI:10.1038/s41598-018-19752-w

PMID:29374199

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5785966/

Abstract

Antimicrobial peptides (AMPs) are promising candidates in the fight against multidrug-resistant pathogens owing to AMPs' broad range of activities and low toxicity. Nonetheless, identification of AMPs through wet-lab experiments is still expensive and time consuming. Here, we propose an accurate computational method for AMP prediction by the random forest algorithm. The prediction model is based on the distribution patterns of amino acid properties along the sequence. Using our collection of large and diverse sets of AMP and non-AMP data (3268 and 166791 sequences, respectively), we evaluated 19 random forest classifiers with different positive:negative data ratios by 10-fold cross-validation. Our optimal model, AmPEP with the 1:3 data ratio, showed high accuracy (96%), Matthew's correlation coefficient (MCC) of 0.9, area under the receiver operating characteristic curve (AUC-ROC) of 0.99, and the Kappa statistic of 0.9. Descriptor analysis of AMP/non-AMP distributions by means of Pearson correlation coefficients revealed that reduced feature sets (from a full-featured set of 105 to a minimal-feature set of 23) can result in comparable performance in all respects except for some reductions in precision. Furthermore, AmPEP outperformed existing methods in terms of accuracy, MCC, and AUC-ROC when tested on benchmark datasets.

摘要

抗菌肽 (AMPs) 由于其广泛的活性和低毒性，是对抗多药耐药病原体的有前途的候选物。尽管如此，通过湿实验室实验鉴定 AMP 仍然昂贵且耗时。在这里，我们提出了一种基于随机森林算法的 AMP 预测的准确计算方法。预测模型基于氨基酸性质沿序列分布的模式。使用我们收集的大量和多样化的 AMP 和非 AMP 数据集（分别为 3268 和 166791 个序列），我们通过 10 倍交叉验证评估了 19 个具有不同正：负数据比的随机森林分类器。我们的最优模型 AmPEP（数据比为 1:3）具有高准确性（96%）、马修相关系数（MCC）为 0.9、接收者操作特征曲线下的面积（AUC-ROC）为 0.99 和卡帕统计量为 0.9。通过皮尔逊相关系数对 AMP/非 AMP 分布的描述性分析表明，除了某些精度降低外，减少特征集（从全特征集 105 减少到最小特征集 23）可以在所有方面产生可比的性能。此外，当在基准数据集上进行测试时，AmPEP 在准确性、MCC 和 AUC-ROC 方面优于现有方法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

AmPEP：基于氨基酸属性分布模式和随机森林的抗菌肽序列预测。

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

AmPEP：基于氨基酸属性分布模式和随机森林的抗菌肽序列预测。

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献