Smoly Ilan, Carmel Amir, Shemer-Avni Yonat, Yeger-Lotem Esti, Ziv-Ukelson Michal
1 Department of Computer Science, Ben-Gurion University of the Negev , Beer-Sheva, Israel .
2 Department of Virology, Ben-Gurion University of the Negev , Beer-Sheva, Israel .
J Comput Biol. 2016 Mar;23(3):165-79. doi: 10.1089/cmb.2015.0168.
Network querying is a powerful approach to mine molecular interaction networks. Most state-of-the-art network querying tools either confine the search to a prespecified topology in the form of some template subnetwork, or do not specify any topological constraints at all. Another approach is grammar-based queries, which are more flexible and expressive as they allow for expressing the topology of the sought pattern according to some grammar-based logic. Previous grammar-based network querying tools were confined to the identification of paths. In this article, we extend the patterns identified by grammar-based query approaches from paths to trees. For this, we adopt a higher order query descriptor in the form of a regular tree grammar (RTG). We introduce a novel problem and propose an algorithm to search a given graph for the k highest scoring subgraphs matching a tree accepted by an RTG. Our algorithm is based on the combination of dynamic programming with color coding, and includes an extension of previous k-best parsing optimization approaches to avoid isomorphic trees in the output. We implement the new algorithm and exemplify its application to mining viral infection patterns within molecular interaction networks. Our code is available online.
网络查询是挖掘分子相互作用网络的一种强大方法。大多数最先进的网络查询工具要么将搜索限制在某种模板子网形式的预先指定拓扑中,要么根本不指定任何拓扑约束。另一种方法是基于语法的查询,它更加灵活且富有表现力,因为它们允许根据某种基于语法的逻辑来表达所寻找模式的拓扑结构。以前基于语法的网络查询工具仅限于路径的识别。在本文中,我们将基于语法的查询方法所识别的模式从路径扩展到树。为此,我们采用正则树文法(RTG)形式的高阶查询描述符。我们引入了一个新问题,并提出了一种算法,用于在给定图中搜索与RTG接受的树匹配的k个得分最高的子图。我们的算法基于动态规划与颜色编码的结合,并且包括对先前k最佳解析优化方法的扩展,以避免输出中的同构树。我们实现了新算法,并举例说明了其在挖掘分子相互作用网络中的病毒感染模式方面的应用。我们的代码可在线获取。