Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan.
BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-11-S11-S4.
A bisection-type algorithm for the grammar-based compression of tree-structured data has been proposed recently. In this framework, an elementary ordered-tree grammar (EOTG) and an elementary unordered-tree grammar (EUTG) were defined, and an approximation algorithm was proposed.
In this paper, we propose an integer programming-based method that finds the minimum context-free grammar (CFG) for a given string under the condition that at most two symbols appear on the right-hand side of each production rule. Next, we extend this method to find the minimum EOTG and EUTG grammars for given ordered and unordered trees, respectively. Then, we conduct computational experiments for the ordered and unordered artificial trees. Finally, we apply our methods to pattern extraction of glycan tree structures.
We propose integer programming-based methods that find the minimum CFG, EOTG, and EUTG for given strings, ordered and unordered trees. Our proposed methods for trees are useful for extracting patterns of glycan tree structures.
最近提出了一种基于语法的树状结构数据压缩的二分算法。在这个框架中,定义了基本有序树语法(EOTG)和基本无序树语法(EUTG),并提出了一种逼近算法。
本文提出了一种基于整数规划的方法,用于在每个生成规则的右侧最多只能出现两个符号的条件下,为给定字符串找到最小的上下文无关语法(CFG)。接下来,我们将此方法扩展到为给定的有序树和无序树分别找到最小的 EOTG 和 EUTG 语法。然后,我们对有序和无序人工树进行了计算实验。最后,我们将我们的方法应用于聚糖树结构的模式提取。
我们提出了基于整数规划的方法,用于为给定的字符串、有序树和无序树找到最小的 CFG、EOTG 和 EUTG。我们提出的树方法可用于提取聚糖树结构的模式。