Kellman Benjamin P, Mariethoz Julien, Zhang Yujie, Shaul Sigal, Alteri Mia, Sandoval Daniel, Jeffris Mia, Armingol Erick, Bao Bokan, Lisacek Frederique, Bojar Daniel, Lewis Nathan E
Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA.
Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA.
bioRxiv. 2024 May 23:2024.05.15.594334. doi: 10.1101/2024.05.15.594334.
Glycosylation is described as a non-templated biosynthesis. Yet, the template-free premise is antithetical to the observation that different N-glycans are consistently placed at specific sites. It has been proposed that glycosite-proximal protein structures could constrain glycosylation and explain the observed microheterogeneity. Using site-specific glycosylation data, we trained a hybrid neural network to parse glycosites (recurrent neural network) and match them to feasible N-glycosylation events (graph neural network). From glycosite-flanking sequences, the algorithm predicts most human N-glycosylation events documented in the GlyConnect database and proposed structures corresponding to observed monosaccharide composition of the glycans at these sites. The algorithm also recapitulated glycosylation in Enhanced Aromatic Sequons, SARS-CoV-2 spike, and IgG3 variants, thus demonstrating the ability of the algorithm to predict both glycan structure and abundance. Thus, protein structure constrains glycosylation, and the neural network enables predictive glycosylation of uncharacterized or novel protein sequences and genetic variants.
糖基化被描述为一种非模板化生物合成。然而,无模板这一前提与不同N-聚糖始终位于特定位点的观察结果相悖。有人提出,糖基化位点附近的蛋白质结构可能会限制糖基化,并解释所观察到的微观异质性。利用位点特异性糖基化数据,我们训练了一个混合神经网络来解析糖基化位点(循环神经网络),并将它们与可行的N-糖基化事件(图神经网络)进行匹配。该算法从糖基化位点侧翼序列预测了GlyConnect数据库中记录的大多数人类N-糖基化事件,并提出了与这些位点观察到的聚糖单糖组成相对应的结构。该算法还概括了增强型芳香序列、SARS-CoV-2刺突蛋白和IgG3变体中的糖基化,从而证明了该算法预测聚糖结构和丰度的能力。因此,蛋白质结构限制了糖基化,而神经网络能够对未表征或新的蛋白质序列及基因变体进行预测性糖基化。