NetCGlyc 1.0：哺乳动物C-甘露糖基化位点预测

NetCGlyc 1.0: prediction of mammalian C-mannosylation sites.

作者信息

Julenius Karin

机构信息

Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-171 77 Stockholm, Sweden.

出版信息

Glycobiology. 2007 Aug;17(8):868-76. doi: 10.1093/glycob/cwm050. Epub 2007 May 9.

DOI:10.1093/glycob/cwm050

PMID:17494086

Abstract

C-mannosylation is the attachment of an alpha-mannopyranose to a tryptophan via a C-C linkage. The sequence WXXW, in which the first Trp becomes mannosylated, has been suggested as a consensus motif for the modification, but only two-thirds of known sites follow this rule. We have gathered a data set of 69 experimentally verified C-mannosylation sites from the literature. We analyzed these for sequence context and found that apart from Trp in position +3, Cys is accepted in the same position. We also find a clear preference in position +1, where a small and/or polar residue (Ser, Ala, Gly, and Thr) is preferred and a Phe or a Leu residue discriminated against. The Protein Data Bank was searched for structural information, and five structures of C-mannosylated proteins were obtained. We showed that modified tryptophan residues are at least partly solvent exposed. A method predicting the location of C-mannosylation sites in proteins was developed using a neural network approach. The best overall network used a 21-residue sequence input window and information on the presence/absence of the WXXW motif. NetCGlyc 1.0 correctly predicts 93% of both positive and negative C-mannosylation sites. This is a significant improvement over the WXXW consensus motif itself, which only identifies 67% of positive sites. NetCGlyc 1.0 is available at http://www.cbs.dtu.dk/services/NetCGlyc/. Using NetCGlyc 1.0, we scanned the human genome and found 2573 exported or transmembrane transcripts with at least one predicted C-mannosylation site.

摘要

C-甘露糖基化是通过碳-碳键将一个α-甘露吡喃糖连接到色氨酸上。有人提出，第一个色氨酸发生甘露糖基化的WXXW序列是这种修饰的共有基序，但已知位点中只有三分之二遵循这一规则。我们从文献中收集了一组69个经实验验证的C-甘露糖基化位点的数据集。我们分析了这些位点的序列背景，发现除了+3位的色氨酸外，同一位置也接受半胱氨酸。我们还发现+1位有明显偏好，偏好小的和/或极性残基（丝氨酸、丙氨酸、甘氨酸和苏氨酸），而苯丙氨酸或亮氨酸残基则被排除。我们在蛋白质数据库中搜索结构信息，获得了5个C-甘露糖基化蛋白的结构。我们表明，修饰的色氨酸残基至少部分暴露于溶剂中。我们使用神经网络方法开发了一种预测蛋白质中C-甘露糖基化位点位置的方法。最佳的整体网络使用21个残基的序列输入窗口和WXXW基序存在/不存在的信息。NetCGlyc 1.0能正确预测93%的阳性和阴性C-甘露糖基化位点。这比WXXW共有基序本身有显著改进，WXXW共有基序只能识别67%的阳性位点。可在http://www.cbs.dtu.dk/services/NetCGlyc/获取NetCGlyc 1.0。使用NetCGlyc 1.0，我们扫描了人类基因组，发现2573个输出或跨膜转录本至少有一个预测的C-甘露糖基化位点。