Suppr超能文献

PredAPP:采用欠采样和集成方法预测抗寄生虫肽。

PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches.

机构信息

Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.

State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, Anhui, China.

出版信息

Interdiscip Sci. 2022 Mar;14(1):258-268. doi: 10.1007/s12539-021-00484-x. Epub 2021 Oct 4.

Abstract

Anti-parasitic peptides (APPs) have been regarded as promising therapeutic candidate drugs against parasitic diseases. Due to the fact that the experimental techniques for identifying APPs are expensive and time-consuming, there is an urgent need to develop a computational approach to predict APPs on a large scale. In this study, we provided a computational method, termed PredAPP (Prediction of Anti-Parasitic Peptides) that could effectively identify APPs using an ensemble of well-performed machine learning (ML) classifiers. Firstly, to solve the class imbalance problem, a balanced training dataset was generated by the undersampling method. We found that the balanced dataset based on cluster centroid achieved the best performance. Then, nine groups of features and six ML algorithms were combined to generate 54 classifiers and the output of these classifiers formed 54 feature representations, and in each feature group, we selected the feature representation with best performance for classification. Finally, the selected feature representations were integrated using logistic regression algorithm to construct the prediction model PredAPP. On the independent dataset, PredAPP achieved accuracy and AUC of 0.880 and 0.922, respectively, compared to 0.739 and 0.873 of AMPfun, a state-of-the-art method to predict APPs. The web server of PredAPP is freely accessible at http://predapp.xialab.info and https://github.com/xialab-ahu/PredAPP .

摘要

抗寄生虫肽 (APPs) 被认为是治疗寄生虫病的有前途的候选药物。由于鉴定 APPs 的实验技术昂贵且耗时,因此迫切需要开发一种计算方法来大规模预测 APPs。在这项研究中,我们提供了一种计算方法,称为 PredAPP(抗寄生虫肽预测),它可以使用性能良好的机器学习 (ML) 分类器的集合有效地识别 APPs。首先,为了解决类不平衡问题,通过欠采样方法生成平衡训练数据集。我们发现基于聚类中心的平衡数据集具有最佳性能。然后,将九组特征和六种 ML 算法组合在一起,生成 54 个分类器,这些分类器的输出构成 54 个特征表示,并且在每个特征组中,我们选择具有最佳性能的特征表示用于分类。最后,使用逻辑回归算法对选定的特征表示进行集成,以构建预测模型 PredAPP。在独立数据集上,PredAPP 的准确率和 AUC 分别为 0.880 和 0.922,而 AMPfun 的准确率和 AUC 分别为 0.739 和 0.873,AMPfun 是一种预测 APPs 的最新方法。PredAPP 的网络服务器可在 http://predapp.xialab.infohttps://github.com/xialab-ahu/PredAPP 免费访问。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验