Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S33. doi: 10.1186/1471-2105-11-S1-S33.
Glycobiology pertains to the study of carbohydrate sugar chains, or glycans, in a particular cell or organism. Many computational approaches have been proposed for analyzing these complex glycan structures, which are chains of monosaccharides. The monosaccharides are linked to one another by glycosidic bonds, which can take on a variety of comformations, thus forming branches and resulting in complex tree structures. The q-gram method is one of these recent methods used to understand glycan function based on the classification of their tree structures. This q-gram method assumes that for a certain q, different q-grams share no similarity among themselves. That is, that if two structures have completely different components, then they are completely different. However, from a biological standpoint, this is not the case. In this paper, we propose a weighted q-gram method to measure the similarity among glycans by incorporating the similarity of the geometric structures, monosaccharides and glycosidic bonds among q-grams. In contrast to the traditional q-gram method, our weighted q-gram method admits similarity among q-grams for a certain q. Thus our new kernels for glycan structure were developed and then applied in SVMs to classify glycans.
Two glycan datasets were used to compare the weighted q-gram method and the original q-gram method. The results show that the incorporation of q-gram similarity improves the classification performance for all of the important glycan classes tested.
The results in this paper indicate that similarity among q-grams obtained from geometric structure, monosaccharides and glycosidic linkage contributes to the glycan function classification. This is a big step towards the understanding of glycan function based on their complex structures.
糖生物学涉及特定细胞或生物体中碳水化合物糖链(或聚糖)的研究。已经提出了许多计算方法来分析这些复杂的糖链结构,这些结构是单糖的链。单糖通过糖苷键彼此连接,糖苷键可以具有多种构象,从而形成分支并产生复杂的树状结构。q-gram 方法是最近用于根据其树结构分类来理解糖功能的方法之一。该 q-gram 方法假设对于某个 q,不同的 q-gram 彼此之间没有相似性。也就是说,如果两个结构具有完全不同的成分,那么它们就是完全不同的。然而,从生物学的角度来看,情况并非如此。在本文中,我们提出了一种加权 q-gram 方法,通过合并 q-gram 之间的几何结构、单糖和糖苷键的相似性来衡量聚糖之间的相似性。与传统的 q-gram 方法不同,我们的加权 q-gram 方法允许在某个 q 下 q-gram 之间存在相似性。因此,我们开发了新的聚糖结构核函数,并将其应用于 SVM 中以对聚糖进行分类。
使用两个聚糖数据集来比较加权 q-gram 方法和原始 q-gram 方法。结果表明,q-gram 相似性的合并提高了所有测试的重要聚糖类别的分类性能。
本文的结果表明,从几何结构、单糖和糖苷键获得的 q-gram 之间的相似性有助于糖功能分类。这是朝着基于其复杂结构理解糖功能迈出的重要一步。