用于精确碳水化合物数据库查询的高效树匹配方法。

Efficient tree-matching methods for accurate carbohydrate database queries.

作者信息

Aoki Kiyoko F, Yamaguchi Atsuko, Okuno Yasushi, Akutsu Tatsuya, Ueda Nobuhisa, Kanehisa Minoru, Mamitsuka Hiroshi

机构信息

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan.

出版信息

Genome Inform. 2003;14:134-43.

PMID:15706528

Abstract

One aspect of glycome informatics is the analysis of carbohydrate sugar chains, or glycans, whose basic structure is not a sequence, but a tree structure. Although there has been much work in the development of sequence databases and matching algorithms for sequences (for performing queries and analyzing similarity), the more complicated tree structure of glycans does not allow a direct implementation of such a database for glycans, and further, does not allow for the direct application of sequence alignment algorithms for performing searches or analyzing similarity. Therefore, we have utilized a polynomial-time dynamic programming algorithm for solving the maximum common subtree of two trees to implement an accurate and efficient tool for finding and aligning maximally matching glycan trees. The KEGG Glycan database for glycan structures released recently incorporates our tree-structure alignment algorithm with various parameters to adapt to the needs of a variety of users. Because we use similarity scores as opposed to a distance metric, our methods are more readily used to display trees of higher similarity. We present the two methods developed for this purpose and illustrate its validity.

摘要

糖组信息学的一个方面是对碳水化合物糖链（即聚糖）的分析，其基本结构不是序列，而是树状结构。尽管在序列数据库的开发以及用于序列的匹配算法（用于执行查询和分析相似性）方面已经做了很多工作，但聚糖更为复杂的树状结构不允许直接实现这样的聚糖数据库，而且，也不允许直接应用序列比对算法来进行搜索或分析相似性。因此，我们利用一种多项式时间动态规划算法来求解两棵树的最大公共子树，以实现一个准确且高效的工具，用于查找和比对最大匹配的聚糖树。最近发布的用于聚糖结构的KEGG聚糖数据库将我们的树状结构比对算法与各种参数相结合，以适应不同用户的需求。由于我们使用的是相似性分数而非距离度量，我们的方法更易于用于展示相似度更高的树。我们展示了为此目的开发的两种方法，并说明了其有效性。