Guzzi Pietro Hiram, Lomoio Ugo, Puccio Barbara, Veltri Pierangelo
Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, Italy.
Netw Model Anal Health Inform Bioinform. 2023;12(1):3. doi: 10.1007/s13721-022-00397-9. Epub 2022 Dec 2.
Since December 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has affected almost all countries. The unprecedented spreading of this virus has led to the insurgence of many variants that impact protein sequence and structure that need continuous monitoring and analysis of the sequences to understand the genetic evolution and to prevent possible dangerous outcomes. Some variants causing the modification of the structure of the proteins, such as the Spike protein S, need to be monitored. Protein contact networks (PCNs) have been recently proposed as a modelling framework for protein structures. In such a framework, the protein structure is represented as an unweighted graph whose nodes are the central atoms of the backbones (C- ), and edges connect two atoms falling in the spatial distance between 4 and 7 Å. PCN may also be a data-rich representation since we may add to each node/atom biological and topological information. Such formalism enables the possibility of using algorithms from graph theory to analyze the graph. In particular, we refer to graph embedding methods enabling the analysis of such graphs with deep learning methods. In this work, we explore the possibility of embedding PCN using Graph Neural Networks and then analyze in the embedded space each residue to distinguish mutated residues from non-mutated ones. In particular, we analyzed the structure of the Spike protein of the coronavirus. First, we obtained the PCNs of the Spike protein for the wild-type, , , and variants. Then we used the GraphSage embedding algorithm to obtain an unsupervised embedding. Then we analyzed the point of mutation in the embedded space. Results show the characteristics of the mutation point in the embedding space.
自2019年12月以来,严重急性呼吸综合征冠状病毒2(SARS-CoV-2)已影响到几乎所有国家。这种病毒前所未有的传播导致了许多变体的出现,这些变体影响蛋白质序列和结构,需要持续监测和分析序列以了解基因进化并预防可能的危险后果。一些导致蛋白质结构改变的变体,如刺突蛋白S,需要进行监测。蛋白质接触网络(PCNs)最近被提出作为蛋白质结构的建模框架。在这样的框架中,蛋白质结构被表示为一个无加权图,其节点是主链的中心原子(C- ),边连接空间距离在4到7埃之间的两个原子。PCN也可能是一种数据丰富的表示形式,因为我们可以向每个节点/原子添加生物学和拓扑信息。这种形式主义使得使用图论算法分析该图成为可能。特别是,我们指的是能够使用深度学习方法分析此类图的图嵌入方法。在这项工作中,我们探索了使用图神经网络嵌入PCN的可能性,然后在嵌入空间中分析每个残基,以区分突变残基和未突变残基。特别是,我们分析了冠状病毒刺突蛋白的结构。首先,我们获得了野生型、 、 和 变体的刺突蛋白的PCN。然后我们使用GraphSage嵌入算法获得无监督嵌入。然后我们在嵌入空间中分析突变点。结果显示了嵌入空间中突变点的特征。