Suppr超能文献

AmPEP:基于氨基酸属性分布模式和随机森林的抗菌肽序列预测。

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.

机构信息

Department of Computer and Information Science, University of Macau, Taipa, Macau, China.

出版信息

Sci Rep. 2018 Jan 26;8(1):1697. doi: 10.1038/s41598-018-19752-w.

Abstract

Antimicrobial peptides (AMPs) are promising candidates in the fight against multidrug-resistant pathogens owing to AMPs' broad range of activities and low toxicity. Nonetheless, identification of AMPs through wet-lab experiments is still expensive and time consuming. Here, we propose an accurate computational method for AMP prediction by the random forest algorithm. The prediction model is based on the distribution patterns of amino acid properties along the sequence. Using our collection of large and diverse sets of AMP and non-AMP data (3268 and 166791 sequences, respectively), we evaluated 19 random forest classifiers with different positive:negative data ratios by 10-fold cross-validation. Our optimal model, AmPEP with the 1:3 data ratio, showed high accuracy (96%), Matthew's correlation coefficient (MCC) of 0.9, area under the receiver operating characteristic curve (AUC-ROC) of 0.99, and the Kappa statistic of 0.9. Descriptor analysis of AMP/non-AMP distributions by means of Pearson correlation coefficients revealed that reduced feature sets (from a full-featured set of 105 to a minimal-feature set of 23) can result in comparable performance in all respects except for some reductions in precision. Furthermore, AmPEP outperformed existing methods in terms of accuracy, MCC, and AUC-ROC when tested on benchmark datasets.

摘要

抗菌肽 (AMPs) 由于其广泛的活性和低毒性,是对抗多药耐药病原体的有前途的候选物。尽管如此,通过湿实验室实验鉴定 AMP 仍然昂贵且耗时。在这里,我们提出了一种基于随机森林算法的 AMP 预测的准确计算方法。预测模型基于氨基酸性质沿序列分布的模式。使用我们收集的大量和多样化的 AMP 和非 AMP 数据集(分别为 3268 和 166791 个序列),我们通过 10 倍交叉验证评估了 19 个具有不同正:负数据比的随机森林分类器。我们的最优模型 AmPEP(数据比为 1:3)具有高准确性(96%)、马修相关系数(MCC)为 0.9、接收者操作特征曲线下的面积(AUC-ROC)为 0.99 和卡帕统计量为 0.9。通过皮尔逊相关系数对 AMP/非 AMP 分布的描述性分析表明,除了某些精度降低外,减少特征集(从全特征集 105 减少到最小特征集 23)可以在所有方面产生可比的性能。此外,当在基准数据集上进行测试时,AmPEP 在准确性、MCC 和 AUC-ROC 方面优于现有方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ad/5785966/9bd05dc6f66e/41598_2018_19752_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验