Center for Lipid Metabolomics, Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, 02215, Massachusetts, USA.
Department of Biochemistry, National Magnetic Resonance Facility at Madison and BioMagResBank, University of Wisconsin Madison, Madison, 53706, Wisconsin, USA.
Sci Data. 2020 Jul 3;7(1):210. doi: 10.1038/s41597-020-0547-y.
The chemical composition of saccharide complexes underlies their biomedical activities as biomarkers for cardiometabolic disease, various types of cancer, and other conditions. However, because these molecules may undergo major structural modifications, distinguishing between compounds of saccharide and non-saccharide origin becomes a challenging computational problem that hinders the aggregation of information about their bioactive moieties. We have developed an algorithm and software package called "Cheminformatics Tool for Probabilistic Identification of Carbohydrates" (CTPIC) that analyzes the covalent structure of a compound to yield a probabilistic measure for distinguishing saccharides and saccharide-derivatives from non-saccharides. CTPIC analysis of the RCSB Ligand Expo (database of small molecules found to bind proteins in the Protein Data Bank) led to a substantial increase in the number of ligands characterized as saccharides. CTPIC analysis of Protein Data Bank identified 7.7% of the proteins as saccharide-binding. CTPIC is freely available as a webservice at (http://ctpic.nmrfam.wisc.edu).
糖复合物的化学成分是其作为心血管代谢疾病、各种癌症和其他疾病生物标志物的生物医学活性的基础。然而,由于这些分子可能经历重大的结构修饰,因此区分糖和非糖来源的化合物成为一个具有挑战性的计算问题,这阻碍了关于它们生物活性部分的信息的聚合。我们开发了一种名为“用于碳水化合物概率识别的化学信息学工具”(CTPIC)的算法和软件包,该工具分析化合物的共价结构,以产生区分糖和糖衍生物与非糖的概率度量。对 RCSB Ligand Expo(从蛋白质数据库中发现与蛋白质结合的小分子数据库)的 CTPIC 分析导致被表征为糖的配体数量大量增加。对蛋白质数据库的 CTPIC 分析表明 7.7%的蛋白质是糖结合蛋白。CTPIC 可作为网络服务免费使用(http://ctpic.nmrfam.wisc.edu)。