Kuboyama Tetsuji, Hirata Kouichi, Aoki-Kinoshita Kiyoko F, Kashima Hisashi, Yasuda Hiroshi
Center for Collaborative Research, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8505, Japan.
Genome Inform. 2006;17(2):25-34.
We propose a novel general-purpose tree kernel and apply it to glycan structure analysis. Our kernel measures the similarity between two labeled trees by counting the number of common q-length substrings (tree q-grams) embedded in the trees for all possible lengths q. We apply our tree kernel using a support vector machine (SVM) to classification and specific feature extraction from glycan structure data. Our results show that our kernel outperforms the layered trimer kernel of Hizukuri et al. which is well tailored to glycan data while we do not adjust our kernel to glycan-specific properties. In addition, we extract specific features from various types of glycan data using our trained SVM. The results show that our kernel is more flexible and capable of finding a wider variety of substructures from glycan data.
我们提出了一种新型通用树核,并将其应用于聚糖结构分析。我们的树核通过计算嵌入在树中所有可能长度q的公共q长度子串(树q-gram)的数量来衡量两个标记树之间的相似度。我们使用支持向量机(SVM)将我们的树核应用于聚糖结构数据的分类和特定特征提取。我们的结果表明,我们的树核优于Hizukuri等人的分层三聚体核,后者是针对聚糖数据精心定制的,而我们并没有针对聚糖特定属性调整我们的树核。此外,我们使用训练好的SVM从各种类型的聚糖数据中提取特定特征。结果表明,我们的树核更加灵活,能够从聚糖数据中找到更多种类的子结构。