PredAPP：采用欠采样和集成方法预测抗寄生虫肽。

PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches.

机构信息

Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.

State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, Anhui, China.

出版信息

Interdiscip Sci. 2022 Mar;14(1):258-268. doi: 10.1007/s12539-021-00484-x. Epub 2021 Oct 4.

DOI:10.1007/s12539-021-00484-x

PMID:34608613

Abstract

Anti-parasitic peptides (APPs) have been regarded as promising therapeutic candidate drugs against parasitic diseases. Due to the fact that the experimental techniques for identifying APPs are expensive and time-consuming, there is an urgent need to develop a computational approach to predict APPs on a large scale. In this study, we provided a computational method, termed PredAPP (Prediction of Anti-Parasitic Peptides) that could effectively identify APPs using an ensemble of well-performed machine learning (ML) classifiers. Firstly, to solve the class imbalance problem, a balanced training dataset was generated by the undersampling method. We found that the balanced dataset based on cluster centroid achieved the best performance. Then, nine groups of features and six ML algorithms were combined to generate 54 classifiers and the output of these classifiers formed 54 feature representations, and in each feature group, we selected the feature representation with best performance for classification. Finally, the selected feature representations were integrated using logistic regression algorithm to construct the prediction model PredAPP. On the independent dataset, PredAPP achieved accuracy and AUC of 0.880 and 0.922, respectively, compared to 0.739 and 0.873 of AMPfun, a state-of-the-art method to predict APPs. The web server of PredAPP is freely accessible at http://predapp.xialab.info and https://github.com/xialab-ahu/PredAPP .

摘要

抗寄生虫肽 (APPs) 被认为是治疗寄生虫病的有前途的候选药物。由于鉴定 APPs 的实验技术昂贵且耗时，因此迫切需要开发一种计算方法来大规模预测 APPs。在这项研究中，我们提供了一种计算方法，称为 PredAPP（抗寄生虫肽预测），它可以使用性能良好的机器学习 (ML) 分类器的集合有效地识别 APPs。首先，为了解决类不平衡问题，通过欠采样方法生成平衡训练数据集。我们发现基于聚类中心的平衡数据集具有最佳性能。然后，将九组特征和六种 ML 算法组合在一起，生成 54 个分类器，这些分类器的输出构成 54 个特征表示，并且在每个特征组中，我们选择具有最佳性能的特征表示用于分类。最后，使用逻辑回归算法对选定的特征表示进行集成，以构建预测模型 PredAPP。在独立数据集上，PredAPP 的准确率和 AUC 分别为 0.880 和 0.922，而 AMPfun 的准确率和 AUC 分别为 0.739 和 0.873，AMPfun 是一种预测 APPs 的最新方法。PredAPP 的网络服务器可在 http://predapp.xialab.info 和 https://github.com/xialab-ahu/PredAPP 免费访问。

相似文献

PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches.

Interdiscip Sci. 2022 Mar;14(1):258-268. doi: 10.1007/s12539-021-00484-x. Epub 2021 Oct 4.

ACP-Dnnel: anti-coronavirus peptides' prediction based on deep neural network ensemble learning.

Amino Acids. 2023 Sep;55(9):1121-1136. doi: 10.1007/s00726-023-03300-6. Epub 2023 Jul 4.

usDSM: a novel method for deleterious synonymous mutation prediction using undersampling scheme.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab123.

iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model.

Comput Biol Med. 2021 Oct;137:104778. doi: 10.1016/j.compbiomed.2021.104778. Epub 2021 Aug 25.

StackDPPIV: A novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides.

Methods. 2022 Aug;204:189-198. doi: 10.1016/j.ymeth.2021.12.001. Epub 2021 Dec 6.

UMPred-FRL: A New Approach for Accurate Prediction of Umami Peptides Using Feature Representation Learning.

Int J Mol Sci. 2021 Dec 4;22(23):13124. doi: 10.3390/ijms222313124.

Glypred: Lysine Glycation Site Prediction via CCU-LightGBM-BiLSTM Framework with Multi-Head Attention Mechanism.

J Chem Inf Model. 2024 Aug 26;64(16):6699-6711. doi: 10.1021/acs.jcim.4c01034. Epub 2024 Aug 9.

Feature selection and the class imbalance problem in predicting protein function from sequence.

Appl Bioinformatics. 2005;4(3):195-203. doi: 10.2165/00822942-200504030-00004.

Deleterious synonymous mutation identification based on selective ensemble strategy.

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac598.

MultiFeatVotPIP: a voting-based ensemble learning framework for predicting proinflammatory peptides.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae505.

引用本文的文献

A review of machine learning methods for imbalanced data challenges in chemistry.

Chem Sci. 2025 Apr 22;16(18):7637-7658. doi: 10.1039/d5sc00270b. eCollection 2025 May 7.

Protein language model-based prediction for plant miRNA encoded peptides.

PeerJ Comput Sci. 2025 Mar 18;11:e2733. doi: 10.7717/peerj-cs.2733. eCollection 2025.

iMFP-LG: Identify Novel Multi-functional Peptides Using Protein Language Models and Graph-based Deep Learning.

Genomics Proteomics Bioinformatics. 2025 Jan 15;22(6). doi: 10.1093/gpbjnl/qzae084.

DeepBP: Ensemble deep learning strategy for bioactive peptide prediction.

BMC Bioinformatics. 2024 Nov 11;25(1):352. doi: 10.1186/s12859-024-05974-5.

misORFPred: A Novel Method to Mine Translatable sORFs in Plant Pri-miRNAs Using Enhanced Scalable k-mer and Dynamic Ensemble Voting Strategy.

Interdiscip Sci. 2025 Mar;17(1):114-133. doi: 10.1007/s12539-024-00661-8. Epub 2024 Oct 14.

AutoPeptideML: a study on how to build more trustworthy peptide bioactivity predictors.

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae555.

CELA-MFP: a contrast-enhanced and label-adaptive framework for multi-functional therapeutic peptides prediction.

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae348.

MFPPDB: a comprehensive multi-functional plant peptide database.

Front Plant Sci. 2023 Oct 16;14:1224394. doi: 10.3389/fpls.2023.1224394. eCollection 2023.

Characterisation of a novel crustin isoform from mud crab, (Forsskål, 1775) and its functional analysis in silico.

In Silico Pharmacol. 2022 Dec 28;11(1):2. doi: 10.1007/s40203-022-00138-w. eCollection 2023.

The dynamic landscape of peptide activity prediction.

Comput Struct Biotechnol J. 2022 Nov 24;20:6526-6533. doi: 10.1016/j.csbj.2022.11.043. eCollection 2022.

本文引用的文献

BBPpred: Sequence-Based Prediction of Blood-Brain Barrier Peptides with Feature Representation Learning and Logistic Regression.

J Chem Inf Model. 2021 Jan 25;61(1):525-534. doi: 10.1021/acs.jcim.0c01115. Epub 2021 Jan 11.

Prediction of Neuropeptides from Sequence Information Using Ensemble Classifier and Hybrid Features.

J Proteome Res. 2020 Sep 4;19(9):3732-3740. doi: 10.1021/acs.jproteome.0c00276. Epub 2020 Aug 14.

Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening.

Med Res Rev. 2020 Jul;40(4):1276-1314. doi: 10.1002/med.21658. Epub 2020 Jan 10.

DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features.

Brief Bioinform. 2021 Jan 18;22(1):451-462. doi: 10.1093/bib/bbz152.

DRAMP 2.0, an updated data repository of antimicrobial peptides.

Sci Data. 2019 Aug 13;6(1):148. doi: 10.1038/s41597-019-0154-y.

PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning.

Bioinformatics. 2019 Nov 1;35(21):4272-4280. doi: 10.1093/bioinformatics/btz246.

Antimicrobial activity of an antimicrobial peptide against amastigote forms of .

Vet Res Forum. 2018 Fall;9(4):323-328. doi: 10.30466/vrf.2018.33107. Epub 2018 Dec 15.

PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method.

Front Microbiol. 2018 Oct 26;9:2571. doi: 10.3389/fmicb.2018.02571. eCollection 2018.

dbAMP: an integrated resource for exploring antimicrobial peptides with functional activities and physicochemical properties on transcriptome and proteome data.

Nucleic Acids Res. 2019 Jan 8;47(D1):D285-D297. doi: 10.1093/nar/gky1030.

Prospects for antimicrobial peptide-based immunotherapy approaches in Leishmania control.

Expert Rev Anti Infect Ther. 2018 Jun;16(6):461-469. doi: 10.1080/14787210.2018.1483720. Epub 2018 Jun 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PredAPP：采用欠采样和集成方法预测抗寄生虫肽。

PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献