Periwal Neha, Arora Pooja, Thakur Ananya, Agrawal Lakshay, Goyal Yash, Rathore Anand S, Anand Harsimrat Singh, Kaur Baljeet, Sood Vikas
Department of Biochemistry, Jamia Hamdard, India.
Department of Zoology, Hansraj College, University of Delhi, India.
Heliyon. 2024 Aug 13;10(16):e36163. doi: 10.1016/j.heliyon.2024.e36163. eCollection 2024 Aug 30.
Protozoal pathogens pose a considerable threat, leading to notable mortality rates and the ongoing challenge of developing resistance to drugs. This situation underscores the urgent need for alternative therapeutic approaches. Antimicrobial peptides stand out as promising candidates for drug development. However, there is a lack of published research focusing on predicting antimicrobial peptides specifically targeting protozoal pathogens. In this study, we introduce a successful machine learning-based framework designed to predict potential antiprotozoal peptides effective against protozoal pathogens.
The primary objective of this study is to classify and predict antiprotozoal peptides using diverse negative datasets.
A comprehensive literature review was conducted to gather experimentally validated antiprotozoal peptides, forming the positive dataset for our study. To construct a robust machine learning classifier, multiple negative datasets were incorporated, including (i) non-antimicrobial, (ii) antiviral, (iii) antibacterial, (iv) antifungal, and (v) antimicrobial peptides excluding those targeting protozoal pathogens. Various compositional features of the peptides were extracted using the pfeature algorithm. Two feature selection methods, SVC-L1 and mRMR, were employed to identify highly relevant features crucial for distinguishing between the positive and negative datasets. Additionally, five popular classifiers i.e. Decision Tree, Random Forest, Support Vector Machine, Logistic Regression, and XGBoost were used to build efficient decision models.
XGBoost was the most effective in classifying antiprotozoal peptides from each negative dataset based on the features selected by the mRMR feature selection method. The proposed machine learning framework efficiently differentiate the antiprotozoal peptides from (i) non-antimicrobial (ii) antiviral (iii) antibacterial (iv) antifungal and (v) antimicrobial with accuracy of 97.27 %, 93.64 %, 86.36 %, 90.91 %, and 89.09 % respectively on the validation dataset.
The models are incorporated in a user-friendly web server (www.soodlab.com/appred) to predict the antiprotozoal activity of given peptides.
原生动物病原体构成了相当大的威胁,导致显著的死亡率以及对药物产生耐药性这一持续挑战。这种情况凸显了对替代治疗方法的迫切需求。抗菌肽是药物开发中很有前景的候选物。然而,缺乏专注于预测专门针对原生动物病原体的抗菌肽的已发表研究。在本研究中,我们引入了一个成功的基于机器学习的框架,旨在预测对原生动物病原体有效的潜在抗原生动物肽。
本研究的主要目的是使用不同的阴性数据集对抗原生动物肽进行分类和预测。
进行了全面的文献综述,以收集经实验验证的抗原生动物肽,形成我们研究的阳性数据集。为构建一个强大的机器学习分类器,纳入了多个阴性数据集,包括(i)非抗菌的,(ii)抗病毒的,(iii)抗菌的,(iv)抗真菌的,以及(v)不针对原生动物病原体的抗菌肽。使用pfeature算法提取肽的各种组成特征。采用两种特征选择方法,即SVC-L1和mRMR,来识别对于区分阳性和阴性数据集至关重要的高度相关特征。此外,使用五个流行的分类器,即决策树、随机森林、支持向量机、逻辑回归和XGBoost来构建有效的决策模型。
基于mRMR特征选择方法选择的特征,XGBoost在从每个阴性数据集中对抗原生动物肽进行分类方面最为有效。所提出的机器学习框架能够有效地将抗原生动物肽与(i)非抗菌的、(ii)抗病毒的、(iii)抗菌的、(iv)抗真菌的以及(v)抗菌的区分开来,在验证数据集上的准确率分别为97.27%、93.64%、86.36%、90.91%和89.09%。
这些模型被整合到一个用户友好的网络服务器(www.soodlab.com/appred)中,以预测给定肽的抗原生动物活性。