Joeres Roman, Bojar Daniel, Kalinina Olga V
Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbruecken, Germany.
Center for Bioinformatics, Saarland University, Saarbruecken, Germany.
J Cheminform. 2023 Mar 23;15(1):37. doi: 10.1186/s13321-023-00704-0.
Glycans are important polysaccharides on cellular surfaces that are bound to glycoproteins and glycolipids. These are one of the most common post-translational modifications of proteins in eukaryotic cells. They play important roles in protein folding, cell-cell interactions, and other extracellular processes. Changes in glycan structures may influence the course of different diseases, such as infections or cancer. Glycans are commonly represented using the IUPAC-condensed notation. IUPAC-condensed is a textual representation of glycans operating on the same topological level as the Symbol Nomenclature for Glycans (SNFG) that assigns colored, geometrical shapes to the main monomers. These symbols are then connected in tree-like structures, visualizing the glycan structure on a topological level. Yet for a representation on the atomic level, notations such as SMILES should be used. To our knowledge, there is no easy-to-use, general, open-source, and offline tool to convert the IUPAC-condensed notation to SMILES. Here, we present the open-access Python package GlyLES for the generalizable generation of SMILES representations out of IUPAC-condensed representations. GlyLES uses a grammar to read in the monomer tree from the IUPAC-condensed notation. From this tree, the tool can compute the atomic structures of each monomer based on their IUPAC-condensed descriptions. In the last step, it merges all monomers into the atomic structure of a glycan in the SMILES notation. GlyLES is the first package that allows conversion from the IUPAC-condensed notation of glycans to SMILES strings. This may have multiple applications, including straightforward visualization, substructure search, molecular modeling and docking, and a new featurization strategy for machine-learning algorithms. GlyLES is available at https://github.com/kalininalab/GlyLES .
聚糖是细胞表面重要的多糖,它们与糖蛋白和糖脂结合。这些是真核细胞中蛋白质最常见的翻译后修饰之一。它们在蛋白质折叠、细胞间相互作用和其他细胞外过程中发挥重要作用。聚糖结构的变化可能会影响不同疾病的进程,如感染或癌症。聚糖通常使用IUPAC缩合表示法来表示。IUPAC缩合表示法是一种聚糖的文本表示形式,其作用于与聚糖符号命名法(SNFG)相同的拓扑层面,SNFG为主要单体赋予彩色几何形状。然后将这些符号连接成树状结构,在拓扑层面上直观呈现聚糖结构。然而,对于原子层面的表示,应使用诸如SMILES之类的表示法。据我们所知,尚无易于使用、通用、开源且离线的工具可将IUPAC缩合表示法转换为SMILES。在此,我们展示了开源的Python包GlyLES,用于从IUPAC缩合表示法中生成可通用化的SMILES表示。GlyLES使用一种语法从IUPAC缩合表示法中读取单体树。基于此树,该工具可根据每个单体的IUPAC缩合描述计算其原子结构。在最后一步,它将所有单体合并为SMILES表示法中聚糖的原子结构。GlyLES是首个允许从聚糖的IUPAC缩合表示法转换为SMILES字符串的软件包。这可能有多种应用,包括直接可视化、子结构搜索、分子建模与对接,以及机器学习算法的一种新的特征化策略。GlyLES可在https://github.com/kalininalab/GlyLES获取。