Yamanishi Yoshihiro, Bach Francis, Vert Jean-Philippe
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan.
Bioinformatics. 2007 May 15;23(10):1211-6. doi: 10.1093/bioinformatics/btm090. Epub 2007 Mar 7.
Glycans are covalent assemblies of sugar that play crucial roles in many cellular processes. Recently, comprehensive data about the structure and function of glycans have been accumulated, therefore the need for methods and algorithms to analyze these data is growing fast.
This article presents novel methods for classifying glycans and detecting discriminative glycan motifs with support vector machines (SVM). We propose a new class of tree kernels to measure the similarity between glycans. These kernels are based on the comparison of tree substructures, and take into account several glycan features such as the sugar type, the sugar bound type or layer depth. The proposed methods are tested on their ability to classify human glycans into four blood components: leukemia cells, erythrocytes, plasma and serum. They are shown to outperform a previously published method. We also applied a feature selection approach to extract glycan motifs which are characteristic of each blood component. We confirmed that some leukemia-specific glycan motifs detected by our method corresponded to several results in the literature.
Softwares are available upon request.
Datasets are available at the following website: http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/glycankernel/
聚糖是糖的共价组装体,在许多细胞过程中发挥着关键作用。最近,关于聚糖结构和功能的全面数据不断积累,因此对分析这些数据的方法和算法的需求迅速增长。
本文提出了用支持向量机(SVM)对聚糖进行分类和检测有鉴别力的聚糖基序的新方法。我们提出了一类新的树核来测量聚糖之间的相似性。这些核基于树子结构的比较,并考虑了几个聚糖特征,如糖类型、糖结合类型或层深度。所提出的方法在将人聚糖分类为四种血液成分(白血病细胞、红细胞、血浆和血清)的能力方面进行了测试。结果表明,它们优于先前发表的方法。我们还应用了一种特征选择方法来提取每种血液成分特有的聚糖基序。我们证实,我们的方法检测到的一些白血病特异性聚糖基序与文献中的几个结果相符。
可根据要求提供软件。
数据集可在以下网站获得:http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/glycankernel/