Hansen J E, Lund O, Tolstrup N, Gooley A A, Williams K L, Brunak S
Center for Biological Sequence Analysis, The Technical University of Denmark, Lyngby.
Glycoconj J. 1998 Feb;15(2):115-30. doi: 10.1023/a:1006960004440.
The specificities of the UDP-GalNAc:polypeptide Nacetylgalactosaminyltransferases which link the carbohydrate GalNAc to the side-chain of certain serine and threonine residues in mucin type glycoproteins, are presently unknown. The specificity seems to be modulated by sequence context, secondary structure and surface accessibility. The sequence context of glycosylated threonines was found to differ from that of serine, and the sites were found to cluster. Non-clustered sites had a sequence context different from that of clustered sites. Charged residues were disfavoured at position -1 and +3. A jury of artificial neural networks was trained to recognize the sequence context and surface accessibility of 299 known and verified mucin type O-glycosylation sites extracted from O-GLYCBASE. The cross-validated NetOglyc network system correctly found 83% of the glycosylated and 90% of the non-glycosylated serine and threonine residues in independent test sets, thus proving more accurate than matrix statistics and vector projection methods. Predictions of O-glycosylation sites in the envelope glycoprotein gp120 from the primate lentiviruses HIV-1, HIV-2 and SIV are presented. The most conserved O-glycosylation signals in these evolutionary-related glycoproteins were found in their first hypervariable loop, V1. However, the strain variation for HIV-1 gp120 was significant. A computer server, available through WWW or E-mail, has been developed for prediction of mucin type O-glycosylation sites in proteins based on the amino acid sequence. The server addresses are http://www.cbs.dtu.dk/services/NetOGlyc/ and netOglyc@cbs.dtu.dk.
UDP-N-乙酰半乳糖胺:多肽N-乙酰半乳糖胺基转移酶可将碳水化合物N-乙酰半乳糖胺连接到黏蛋白型糖蛋白中某些丝氨酸和苏氨酸残基的侧链上,其特异性目前尚不清楚。这种特异性似乎受序列背景、二级结构和表面可及性的调节。已发现糖基化苏氨酸的序列背景与丝氨酸不同,且这些位点呈聚类分布。非聚类位点的序列背景与聚类位点不同。在-1和+3位不倾向于出现带电荷的残基。训练了一组人工神经网络来识别从O-GLYCBASE中提取的299个已知且经过验证的黏蛋白型O-糖基化位点的序列背景和表面可及性。经交叉验证的NetOglyc网络系统在独立测试集中正确识别出了83%的糖基化丝氨酸和苏氨酸残基以及90%的非糖基化丝氨酸和苏氨酸残基,因此证明比矩阵统计和向量投影方法更准确。文中给出了对灵长类慢病毒HIV-1、HIV-2和SIV包膜糖蛋白gp120中O-糖基化位点的预测。在这些进化相关的糖蛋白中,最保守的O-糖基化信号位于其第一个高变环V1中。然而,HIV-1 gp120的毒株变异很大。已开发出一个可通过万维网或电子邮件访问的计算机服务器,用于根据氨基酸序列预测蛋白质中的黏蛋白型O-糖基化位点。服务器网址为http://www.cbs.dtu.dk/services/NetOGlyc/ ,邮箱为netOglyc@cbs.dtu.dk。