Zhao Yang, Hayashida Morihiro, Cao Yue, Hwang Jaewook, Akutsu Tatsuya
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
BMC Bioinformatics. 2015 Apr 24;16:128. doi: 10.1186/s12859-015-0558-4.
Many tree structures are found in nature and organisms. Such trees are believed to be constructed on the basis of certain rules. We have previously developed grammar-based compression methods for ordered and unordered single trees, based on bisection-type tree grammars. Here, these methods find construction rules for one single tree. On the other hand, specified construction rules can be utilized to generate multiple similar trees.
Therefore, in this paper, we develop novel methods to discover common rules for the construction of multiple distinct trees, by improving and extending the previous methods using integer programming. We apply our proposed methods to several sets of glycans and RNA secondary structures, which play important roles in cellular systems, and can be regarded as tree structures. The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs.
We propose integer programming-based methods MinSEOTGMul and MinSEUTGMul for the determination of the minimum grammars constructing multiple ordered and unordered trees, respectively. The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.
自然界和生物体中存在许多树形结构。人们认为这些树是基于某些规则构建的。我们之前基于二分类型树文法,为有序和无序单棵树开发了基于文法的压缩方法。在此,这些方法用于寻找单棵树的构建规则。另一方面,特定的构建规则可用于生成多棵相似的树。
因此,在本文中,我们通过使用整数规划改进和扩展先前的方法,开发了用于发现多棵不同树构建的通用规则的新方法。我们将所提出的方法应用于几组聚糖和RNA二级结构,它们在细胞系统中发挥重要作用,并且可被视为树形结构。结果表明,我们的方法能够成功应用于确定聚糖和RNA中的最小文法以及一些通用规则。
我们提出了基于整数规划的方法MinSEOTGMul和MinSEUTGMul,分别用于确定构建多棵有序和无序树的最小文法。所提出的方法除了提取频繁模式外,还可为确定树形结构生物数据中包含的层次结构提供线索。