Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2010 Apr-Jun;7(2):197-207. doi: 10.1109/TCBB.2009.80.
Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called graph pattern diffusion (GPD) kernel. Our idea is to leverage existing frequent pattern discovery methods and to explore the application of kernel classifier (e.g., support vector machine) in building highly accurate graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the graph database and use a process we call "pattern diffusion" to label nodes in the graphs. Finally, we designed a graph alignment algorithm to compute the inner product of two graphs. We have tested our algorithm using a number of chemical structure data. The experimental results demonstrate that our method is significantly better than competing methods such as those kernel functions based on paths, cycles, and subgraphs.
图数据挖掘是一个活跃的研究领域。图是一种通用的建模工具,用于组织来自异构源的信息,并已应用于许多科学、工程和商业领域。随着图数据的快速积累,为图数据构建高度精确的预测模型成为数据挖掘领域尚未充分探索的新挑战。在本文中,我们展示了一种称为图模式扩散(GPD)核的新技术。我们的想法是利用现有的频繁模式发现方法,并探索核分类器(例如支持向量机)在构建高度精确的图分类中的应用。在我们的方法中,我们首先从图数据库中识别所有的频繁模式。然后,我们将子图映射到图数据库中的图上,并使用我们称之为“模式扩散”的过程来标记图中的节点。最后,我们设计了一种图对齐算法来计算两个图的内积。我们使用一些化学结构数据对我们的算法进行了测试。实验结果表明,我们的方法明显优于基于路径、循环和子图的核函数等竞争方法。