Hossain M Shahriar, Akbar Monika, Polys Nicholas F
Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24060, USA.
J Comput Biol. 2012 Sep;19(9):1043-59. doi: 10.1089/cmb.2011.0244. Epub 2012 Aug 16.
In this article, we describe our work on graph mining as applied to the cellular signaling pathways in the Signal Transduction Knowledge Environment (STKE). We present new algorithms and a graphical tool that can help biologists discover relationships between pathways by looking at structural overlaps within the database. We address the problem of determining pathway relationships by using two data mining approaches: clustering and storytelling. In the first approach, our tool brings similar pathways to the same cluster, and in the second, our tool determines intermediate overlapping pathways that can lead biologists to new hypotheses and experiments regarding relationships between the pathways. We formulate the problem of discovering pathway relationships as a subgraph discovery problem and propose a new technique called Subgraph-Extension Generation (SEG), which outperforms the traditional Frequent Subgraph Discovery (FSG) approach by magnitudes. Our tool provides an interface to compare these two approaches with a variety of similarity measures and clustering techniques as well as in terms of computational performance measures such as runtime and memory consumption.
在本文中,我们描述了我们在图挖掘方面的工作,该工作应用于信号转导知识环境(STKE)中的细胞信号通路。我们提出了新的算法和一种图形工具,通过查看数据库中的结构重叠,可帮助生物学家发现通路之间的关系。我们通过两种数据挖掘方法来解决确定通路关系的问题:聚类和叙事。在第一种方法中,我们的工具将相似的通路归为同一聚类;在第二种方法中,我们的工具确定中间重叠通路,这可引导生物学家提出有关通路之间关系的新假设和实验。我们将发现通路关系的问题表述为子图发现问题,并提出一种名为子图扩展生成(SEG)的新技术,该技术在性能上比传统的频繁子图发现(FSG)方法高出许多倍。我们的工具提供了一个接口,可使用各种相似性度量和聚类技术以及诸如运行时和内存消耗等计算性能度量来比较这两种方法。