Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington D.C, USA.
BMC Mol Cell Biol. 2019 Jun 28;20(1):21. doi: 10.1186/s12860-019-0200-9.
To-date, no claim regarding finding a consensus sequon for O-glycosylation has been made. Thus, predicting the likelihood of O-glycosylation with sequence and structural information using classical regression analysis is quite difficult. In particular, if a binary response is used to distinguish between O-glycosylated and non-O-glycosylated sequences, an appropriate set of non-O-glycosylatable sequences is hard to find.
Three sequences from similar post-translational modifications (PTMs) of proteins occurring at, or very near, the S/T-site are analyzed: N-glycosylation, O-mucin type (O-GalNAc) glycosylation, and phosphorylation. Results found include: 1) The consensus composite sequon for O-glycosylation is: (W-S/T-W), where "" denotes the "not" operator. 2) The consensus sequon for phosphorylation is ~(W-S/T/Y/H-W); although W-S/T/Y/H-W is not an absolute inhibitor of phosphorylation. 3) For linear probability model (LPM) estimation, N-glycosylated sequences are good approximations to non-O-glycosylatable sequences; although N - ~P - S/T is not an absolute inhibitor of O-glycosylation. 4) The selective positioning of an amino acid along the sequence, differentiates the PTMs of proteins. 5) Some N-glycosylated sequences are also phosphorylated at the S/T-site in the N - ~P - S/T sequon. 6) ASA values for N-glycosylated sequences are stochastically larger than those for O-GlcNAc glycosylated sequences. 7) Structural attributes (beta turn II, II´, helix, beta bridges, beta hairpin, and the phi angle) are significant LPM predictors of O-GlcNAc glycosylation. The LPM with sequence and structural data as explanatory variables yields a Kolmogorov-Smirnov (KS) statistic of 99%. 8) With only sequence data, the KS statistic erodes to 80%, and 21% of out-of-sample O-GlcNAc glycosylated sequences are mispredicted as not being glycosylated. The 95% confidence interval around this mispredictions rate is 16% to 26%.
The data indicates the existence of a consensus sequon for O-glycosylation; and underscores the germaneness of structural information for predicting the likelihood of O-glycosylation.
迄今为止,尚未有人声称发现了 O-糖基化的共识序列基序。因此,使用经典回归分析根据序列和结构信息预测 O-糖基化的可能性非常困难。特别是,如果使用二项响应来区分 O-糖基化和非 O-糖基化序列,则很难找到合适的非 O-糖基化序列集。
分析了三种发生在 S/T 位点或非常接近 S/T 位点的蛋白质的类似翻译后修饰(PTM)序列:N-糖基化、O-粘蛋白型(O-GalNAc)糖基化和磷酸化。结果发现包括:1)O-糖基化的共识复合序列基序为:(W-S/T-W),其中“”表示“非”运算符。2)磷酸化的共识序列基序为(W-S/T/Y/H-W);尽管 W-S/T/Y/H-W 不是磷酸化的绝对抑制剂。3)对于线性概率模型(LPM)估计,N-糖基化序列是非 O-糖基化序列的良好近似;尽管 N-P-S/T 不是 O-糖基化的绝对抑制剂。4)沿序列选择性定位的氨基酸可区分蛋白质的 PTM。5)在 N-~P-S/T 序列基序中,一些 N-糖基化序列也在 S/T 位点处被磷酸化。6)N-糖基化序列的ASA 值随机大于 O-GlcNAc 糖基化序列的 ASA 值。7)结构属性(β转角 II、II'、螺旋、β桥、β发夹和φ角)是 O-GlcNAc 糖基化的重要 LPM 预测因子。具有序列和结构数据作为解释变量的 LPM 产生的 Kolmogorov-Smirnov(KS)统计量为 99%。8)仅使用序列数据,KS 统计量会衰减到 80%,并且 21%的样本外 O-GlcNAc 糖基化序列被错误预测为未糖基化。该错误预测率的 95%置信区间在 16%到 26%之间。
数据表明存在 O-糖基化的共识序列基序;并强调了结构信息对于预测 O-糖基化可能性的相关性。