University Lyon, University Claude Bernard Lyon 1, CNRS, INSA Lyon, CPE, Institute of Molecular and Supramolecular Chemistry and Biochemistry, UMR 5246, Villeurbanne Cedex, France.
SIB Swiss Institute of Bioinformatics, Geneva 4, Switzerland.
Glycobiology. 2019 Jan 1;29(1):36-44. doi: 10.1093/glycob/cwy084.
Mammalian glycosaminoglycans are linear complex polysaccharides comprising heparan sulfate, heparin, dermatan sulfate, chondroitin sulfate, keratan sulfate and hyaluronic acid. They bind to numerous proteins and these interactions mediate their biological activities. GAG-protein interaction data reported in the literature are curated mostly in MatrixDB database (http://matrixdb.univ-lyon1.fr/). However, a standard nomenclature and a machine-readable format of GAGs together with bioinformatics tools for mining these interaction data are lacking. We report here the building of an automated pipeline to (i) standardize the format of GAG sequences interacting with proteins manually curated from the literature, (ii) translate them into the machine-readable GlycoCT format and into SNFG (Symbol Nomenclature For Glycan) images and (iii) convert their sequences into a format processed by a builder generating three-dimensional structures of polysaccharides based on a repertoire of conformations experimentally validated by data extracted from crystallized GAG-protein complexes. We have developed for this purpose a converter (the CT23D converter) to automatically translate the GlycoCT code of a GAG sequence into the input file required to construct a three-dimensional model.
哺乳动物糖胺聚糖是由肝素硫酸盐、肝素、硫酸皮肤素、硫酸软骨素、硫酸角质素和透明质酸组成的线性复杂多糖。它们与许多蛋白质结合,这些相互作用介导了它们的生物活性。文献中报道的 GAG-蛋白相互作用数据主要在 MatrixDB 数据库(http://matrixdb.univ-lyon1.fr/)中进行了整理。然而,GAG 缺乏标准命名法和机器可读格式,以及用于挖掘这些相互作用数据的生物信息学工具。我们在此报告建立一个自动化管道,(i)标准化与从文献中手动整理的蛋白质相互作用的 GAG 序列的格式,(ii)将其转换为机器可读的 GlycoCT 格式和 SNFG(聚糖符号命名法)图像,以及(iii)将其序列转换为基于通过从结晶 GAG-蛋白质复合物中提取的数据验证的构象库生成多糖三维结构的构建器处理的格式。为此,我们开发了一个转换器(CT23D 转换器),可以自动将 GAG 序列的 GlycoCT 代码转换为构建三维模型所需的输入文件。