Hashimoto Kosuke, Takigawa Ichigaku, Shiga Motoki, Kanehisa Minoru, Mamitsuka Hiroshi
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji 611-0011, Japan.
Bioinformatics. 2008 Aug 15;24(16):i167-73. doi: 10.1093/bioinformatics/btn293.
Carbohydrate sugar chains or glycans, the third major class of macromolecules, hold branch shaped tree structures. Glycan motifs are known to be two types: (1) conserved patterns called 'cores' containing the root and (2) ubiquitous motifs which appear in external parts including leaves and are distributed over different glycan classes. Finding these glycan tree motifs is an important issue, but there have been no computational methods to capture these motifs efficiently.
We have developed an efficient method for mining motifs or significant subtrees from glycans. The key contribution of this method is: (1) to have proposed a new concept, 'á-closed frequent subtrees', and an efficient method for mining all these subtrees from given trees and (2) to have proposed to apply statistical hypothesis testing to rerank the frequent subtrees in significance. We experimentally verified the effectiveness of the proposed method using real glycans: (1)We examined the top 10 subtrees obtained by our method at some parameter setting and confirmed that all subtrees are significant motifs in glycobiology. (2) We applied the results of our method to a classification problem and found that our method outperformed other competing methods, SVM with three different tree kernels, being all statistically significant.
Supplementary data are available at Bioinformatics online.
碳水化合物糖链或聚糖是第三大类大分子,具有分支状树形结构。已知聚糖基序有两种类型:(1)称为“核心”的保守模式,包含根;(2)普遍存在的基序,出现在包括叶在内的外部部分,并分布于不同的聚糖类别中。找到这些聚糖树基序是一个重要问题,但目前尚无有效捕捉这些基序的计算方法。
我们开发了一种从聚糖中挖掘基序或重要子树的有效方法。该方法的关键贡献在于:(1)提出了一个新概念“α - 封闭频繁子树”,以及一种从给定树中挖掘所有这些子树的有效方法;(2)提出应用统计假设检验对频繁子树的重要性进行重新排序。我们使用真实聚糖通过实验验证了该方法的有效性:(1)我们在某些参数设置下检查了通过我们的方法获得的前10个子树,并确认所有子树都是糖生物学中的重要基序。(2)我们将我们方法的结果应用于一个分类问题,发现我们的方法优于其他竞争方法,即具有三种不同树核的支持向量机,且所有结果均具有统计学意义。
补充数据可在《生物信息学》在线获取。