Dey Lopamudra, Chakraborty Sanjay
Department of Biomedical and Clinical Sciences, Linköping University, Sweden; Department of Computer Science & Engineering, Meghnad Saha Institute of Technology, Kolkata, India.
Department of Computer and Information Science (IDA), REAL, AIICS, Linköping University, Sweden; Department of Computer Science & Engineering, Techno International New Town, Kolkata, India.
Gene. 2025 Mar 20;942:149228. doi: 10.1016/j.gene.2025.149228. Epub 2025 Jan 17.
The goal of this research work is to predict protein-protein interactions (PPIs) between the Ebola virus and the host who is at risk of infection. Since there are very limited databases available on the Ebola virus; we have prepared a comprehensive database of all the PPIs between the Ebola virus and human proteins (EbolaInt). Our work focuses on the finding of some new protein-protein interactions between humans and the Ebola virus using some state- of-the-arts machine learning techniques. However, it is basically a two-class problem with a positive interacting dataset and a negative non-interacting dataset. These datasets contain various sequence-based human protein features such as structure of amino acid and conjoint triad and domain-related features. In this research, we have briefly discussed and used some well-known supervised learning approaches to predict PPIs between human proteins and Ebola virus proteins, including K-nearest neighbours (KNN), random forest (RF), support vector machine (SVM), and deep feed-forward multi-layer perceptron (DMLP) etc. We have validated our prediction results using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Our goal with this prediction is to compare all other models' accuracy, precision, recall, and f1-score for predicting these PPIs. In the result section, DMLP is giving the highest accuracy along with the prediction of 2655 potential human target proteins.
这项研究工作的目标是预测埃博拉病毒与有感染风险的宿主之间的蛋白质-蛋白质相互作用(PPI)。由于关于埃博拉病毒的可用数据库非常有限,我们已经准备了一个关于埃博拉病毒与人类蛋白质之间所有PPI的综合数据库(EbolaInt)。我们的工作重点是使用一些最先进的机器学习技术来发现人类与埃博拉病毒之间一些新的蛋白质-蛋白质相互作用。然而,这基本上是一个二类问题,有一个正相互作用数据集和一个负非相互作用数据集。这些数据集包含各种基于序列的人类蛋白质特征,如氨基酸结构、联合三联体和与结构域相关的特征。在这项研究中,我们简要讨论并使用了一些著名的监督学习方法来预测人类蛋白质与埃博拉病毒蛋白质之间的PPI,包括K近邻(KNN)、随机森林(RF)、支持向量机(SVM)和深度前馈多层感知器(DMLP)等。我们使用基因本体(GO)和京都基因与基因组百科全书(KEGG)通路分析验证了我们的预测结果。我们进行此预测的目的是比较所有其他模型在预测这些PPI时的准确率、精确率、召回率和F1分数。在结果部分,DMLP给出了最高的准确率,并预测了2655种潜在的人类靶蛋白。