Faculty of Engineering, Department of Software Engineering, Kirklareli University, 39000, Kirklareli, Turkey.
Faculty of Technology, Department of Software Engineering, Firat University, 23119, Elazig, Turkey.
Interdiscip Sci. 2021 Mar;13(1):44-60. doi: 10.1007/s12539-020-00405-4. Epub 2021 Jan 12.
The new type of corona virus (SARS-COV-2) emerging in Wuhan, China has spread rapidly to the world and has become a pandemic. In addition to having a significant impact on daily life, it also shows its effect in different areas, including public health and economy. Currently, there is no vaccine or antiviral drug available to prevent the COVID-19 disease. Therefore, determination of protein interactions of new types of corona virus is vital in clinical studies, drug therapy, identification of preclinical compounds and protein functions. Protein-protein interactions are important to examine protein functions and pathways involved in various biological processes and to determine the cause and progression of diseases. Various high-throughput experimental methods have been used to identify protein-protein interactions in organisms, yet, there is still a huge gap in specifying all possible protein interactions in an organism. In addition, since the experimental methods used include cloning, labeling, affinity purification mass spectrometry, the processes take a long time. Determining these interactions with artificial intelligence-based methods rather than experimental approaches may help to identify protein functions faster. Thus, protein-protein interaction prediction using deep-learning algorithms has been employed in conjunction with experimental method to explore new protein interactions. However, to predict protein interactions with artificial intelligence techniques, protein sequences need to be mapped. There are various types and numbers of protein-mapping methods in the literature. In this study, we wanted to contribute to the literature by proposing a novel protein-mapping method based on the AVL tree. The proposed method was inspired by the fast search performance on the dictionary structure of AVL tree and was used to verify the protein interactions between SARS-COV-2 virus and human. First, protein sequences were mapped by both the proposed method and various protein-mapping methods. Then, the mapped protein sequences were normalized and classified by bidirectional recurrent neural networks. The performance of the proposed method was evaluated with accuracy, f1-score, precision, recall, and AUC scores. Our results indicated that our mapping method predicts the protein interactions between SARS-COV-2 virus proteins and human proteins at an accuracy of 97.76%, precision of 97.60%, recall of 98.33%, f1-score of 79.42%, and with AUC 89% in average.
在中国武汉出现的新型冠状病毒(SARS-COV-2)迅速传播到世界各地,并已成为一种大流行。除了对日常生活产生重大影响外,它还在包括公共卫生和经济等不同领域发挥作用。目前,尚无预防 COVID-19 疾病的疫苗或抗病毒药物。因此,确定新型冠状病毒的蛋白质相互作用对于临床研究、药物治疗、鉴定临床前化合物和蛋白质功能至关重要。蛋白质-蛋白质相互作用对于研究各种生物过程中涉及的蛋白质功能和途径以及确定疾病的原因和进展非常重要。已经使用各种高通量实验方法来鉴定生物体中的蛋白质-蛋白质相互作用,但在指定生物体中所有可能的蛋白质相互作用方面仍存在巨大差距。此外,由于使用的实验方法包括克隆、标记、亲和纯化质谱,因此过程需要很长时间。使用基于人工智能的方法而不是实验方法来确定这些相互作用可能有助于更快地识别蛋白质功能。因此,已经使用深度学习算法进行蛋白质-蛋白质相互作用预测,结合实验方法来探索新的蛋白质相互作用。然而,使用人工智能技术预测蛋白质相互作用需要映射蛋白质序列。文献中有各种类型和数量的蛋白质映射方法。在这项研究中,我们希望通过提出一种基于 AVL 树的新型蛋白质映射方法为文献做出贡献。该方法的灵感来自 AVL 树字典结构的快速搜索性能,用于验证 SARS-COV-2 病毒与人之间的蛋白质相互作用。首先,使用所提出的方法和各种蛋白质映射方法映射蛋白质序列。然后,通过双向递归神经网络对映射的蛋白质序列进行归一化和分类。使用准确性、f1 分数、精度、召回率和 AUC 分数评估所提出方法的性能。我们的结果表明,我们的映射方法预测 SARS-COV-2 病毒蛋白与人类蛋白之间的蛋白质相互作用的准确率为 97.76%,精度为 97.60%,召回率为 98.33%,f1 得分为 79.42%,平均 AUC 为 89%。