Alakus Talha Burak, Turkoglu Ibrahim
Kirklareli University, Department of Software Engineering, Kirklareli, 39000, Turkey.
Firat University, Department of Software Engineering, Elazig, 23119, Turkey.
Chemometr Intell Lab Syst. 2022 Sep 15;228:104622. doi: 10.1016/j.chemolab.2022.104622. Epub 2022 Jul 21.
Experimental approaches are currently used to determine viral-host interactions, but these approaches are both time-consuming and costly. For these reasons, computational-based approaches are recommended. In this study, using computational-based approaches, viral-host interactions of SARS-CoV-2 virus and human proteins were predicted. The study consists of four different stages; in the first stage viral and host protein sequences were obtained. In the second stage, protein sequences were converted into numerical expressions by various protein mapping methods. These methods are entropy-based, AVL-tree, FIBHASH, binary encoding, CPNR, PAM250, BLOSUM62, Atchley factors, Meiler parameters, EIIP, AESNN1, Miyazawa energies, Micheletti potentials, Z-scale, and hydrophobicity. In the third stage, a deep learning model was designed and BiLSTM was used for this. In the last stage, the protein sequences were classified, and the viral-host interactions were predicted. The performances of protein mapping methods were determined by accuracy, F1-score, specificity, sensitivity, and AUC scores. According to the classification results, the best classification process was obtained by the entropy-based method. With this method, 94.74% accuracy, and 0.95 AUC score were calculated. Then, the most successful classification process was performed with the Z-scale and 91.23% accuracy, and 0.96 AUC score were obtained. Although other protein mapping methods are not as efficient as Z-scale and entropy-based methods, they have achieved successful classification. AVL-tree, FIBHASH, binary encoding, CPNR, PAM250, BLOSUM62, Atchley factors, Meiler parameters and AESNN1 methods showed over 80% accuracy, F1-score, and AUC score. Accuracy scores of EIIP, Miyazawa energies, Micheletti potentials and hydrophobicity methods remained below 80%. When the results were examined in general, it was observed that the computational approaches were successful in predicting viral-host interactions between SARS-CoV-2 virus and human proteins.
目前采用实验方法来确定病毒与宿主之间的相互作用,但这些方法既耗时又昂贵。基于这些原因,推荐使用基于计算的方法。在本研究中,使用基于计算的方法预测了严重急性呼吸综合征冠状病毒2(SARS-CoV-2)病毒与人类蛋白质之间的病毒-宿主相互作用。该研究包括四个不同阶段;在第一阶段,获取病毒和宿主蛋白质序列。在第二阶段,通过各种蛋白质映射方法将蛋白质序列转换为数值表达式。这些方法包括基于熵的方法、AVL树、FIBHASH、二进制编码、CPNR、PAM250、BLOSUM62、阿奇利因子、梅勒参数、电子等排指数、AESNN1、宫泽能量、米凯莱蒂势、Z尺度和疏水性。在第三阶段,设计了一个深度学习模型,并使用双向长短期记忆网络(BiLSTM)来实现。在最后阶段,对蛋白质序列进行分类,并预测病毒-宿主相互作用。通过准确率、F1分数、特异性、敏感性和曲线下面积(AUC)分数来确定蛋白质映射方法的性能。根据分类结果,基于熵的方法获得了最佳分类过程。使用该方法计算出准确率为94.74%,AUC分数为0.95。然后,使用Z尺度进行了最成功的分类过程,获得了91.23%的准确率和0.96的AUC分数。虽然其他蛋白质映射方法不如Z尺度和基于熵的方法有效,但它们也实现了成功分类。AVL树、FIBHASH、二进制编码、CPNR、PAM250、BLOSUM62、阿奇利因子、梅勒参数和AESNN1方法的准确率、F1分数和AUC分数均超过80%。电子等排指数、宫泽能量、米凯莱蒂势和疏水性方法的准确率分数低于80%。总体检查结果时发现,基于计算的方法成功预测了SARS-CoV-2病毒与人类蛋白质之间的病毒-宿主相互作用。