GPD：一种图模式扩散核，用于实现化学信息学中具有应用的精确图分类。

GPD: a graph pattern diffusion kernel for accurate graph classification with applications in cheminformatics.

机构信息

Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2010 Apr-Jun;7(2):197-207. doi: 10.1109/TCBB.2009.80.

DOI:10.1109/TCBB.2009.80

PMID:20431140

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3058227/

Abstract

Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called graph pattern diffusion (GPD) kernel. Our idea is to leverage existing frequent pattern discovery methods and to explore the application of kernel classifier (e.g., support vector machine) in building highly accurate graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the graph database and use a process we call "pattern diffusion" to label nodes in the graphs. Finally, we designed a graph alignment algorithm to compute the inner product of two graphs. We have tested our algorithm using a number of chemical structure data. The experimental results demonstrate that our method is significantly better than competing methods such as those kernel functions based on paths, cycles, and subgraphs.

摘要

图数据挖掘是一个活跃的研究领域。图是一种通用的建模工具，用于组织来自异构源的信息，并已应用于许多科学、工程和商业领域。随着图数据的快速积累，为图数据构建高度精确的预测模型成为数据挖掘领域尚未充分探索的新挑战。在本文中，我们展示了一种称为图模式扩散（GPD）核的新技术。我们的想法是利用现有的频繁模式发现方法，并探索核分类器（例如支持向量机）在构建高度精确的图分类中的应用。在我们的方法中，我们首先从图数据库中识别所有的频繁模式。然后，我们将子图映射到图数据库中的图上，并使用我们称之为“模式扩散”的过程来标记图中的节点。最后，我们设计了一种图对齐算法来计算两个图的内积。我们使用一些化学结构数据对我们的算法进行了测试。实验结果表明，我们的方法明显优于基于路径、循环和子图的核函数等竞争方法。

相似文献

GPD: a graph pattern diffusion kernel for accurate graph classification with applications in cheminformatics.

IEEE/ACM Trans Comput Biol Bioinform. 2010 Apr-Jun;7(2):197-207. doi: 10.1109/TCBB.2009.80.

GPM: A Graph Pattern Matching Kernel with Diffusion for Chemical Compound Classification.

Proc IEEE Int Symp Bioinformatics Bioeng. 2008 Dec 8;2008:1-6. doi: 10.1109/BIBE.2008.4696654.

Coupling Graphs, Efficient Algorithms and B-Cell Epitope Prediction.

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):7-16. doi: 10.1109/TCBB.2013.136.

Application of kernel functions for accurate similarity search in large chemical databases.

BMC Bioinformatics. 2010 Apr 29;11 Suppl 3(Suppl 3):S8. doi: 10.1186/1471-2105-11-S3-S8.

Discovering interesting molecular substructures for molecular classification.

IEEE Trans Nanobioscience. 2010 Jun;9(2):77-89. doi: 10.1109/TNB.2010.2042609.

Graph wavelet alignment kernels for drug virtual screening.

J Bioinform Comput Biol. 2009 Jun;7(3):473-97. doi: 10.1142/s0219720009004187.

Mining the Enriched Subgraphs for Specific Vertices in a Biological Graph.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1496-1507. doi: 10.1109/TCBB.2016.2576440. Epub 2016 Jun 7.

Prediction of chemical-protein binding activity using contrast graph patterns.

Adv Exp Med Biol. 2011;696:243-53. doi: 10.1007/978-1-4419-7046-6_24.

Graph Traversal Edit Distance and Extensions.

J Comput Biol. 2020 Mar;27(3):317-329. doi: 10.1089/cmb.2019.0511. Epub 2020 Feb 13.

G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.

Adv Database Technol. 2009;360:472-480. doi: 10.1145/1516360.1516416.

引用本文的文献

Generalized adjacency and the conservation of gene clusters in genetic networks defined by synthetic lethals.

BMC Bioinformatics. 2012 Jun 11;13 Suppl 9(Suppl 9):S8. doi: 10.1186/1471-2105-13-S9-S8.

本文引用的文献

CHEMICAL COMPOUND CLASSIFICATION WITH AUTOMATICALLY MINED STRUCTURE PATTERNS.

Proc Asia Pac Bioinform Conf. 2008;6:39-48. doi: 10.1901/jaba.2008.6-39.

Systematic discovery of functional modules and context-specific functional annotation of human genome.

Bioinformatics. 2007 Jul 1;23(13):i222-9. doi: 10.1093/bioinformatics/btm222.

Small molecules, big players: the National Cancer Institute's Initiative for Chemical Genetics.

Cancer Res. 2006 Sep 15;66(18):8935-42. doi: 10.1158/0008-5472.CAN-06-2552.

Protein ranking by semi-supervised network propagation.

BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2105-7-S1-S10.

Graph kernels for chemical informatics.

Neural Netw. 2005 Oct;18(8):1093-110. doi: 10.1016/j.neunet.2005.07.009. Epub 2005 Sep 12.

Virtual screening of molecular databases using a support vector machine.

J Chem Inf Model. 2005 May-Jun;45(3):549-61. doi: 10.1021/ci049641u.

NIH Molecular Libraries Initiative.

Science. 2004 Nov 12;306(5699):1138-9. doi: 10.1126/science.1105511.

Accurate classification of protein structural families using coherent subgraph analysis.

Pac Symp Biocomput. 2004:411-22. doi: 10.1142/9789812704856_0039.

Statistical evaluation of the Predictive Toxicology Challenge 2000-2001.

Bioinformatics. 2003 Jul 1;19(10):1183-93. doi: 10.1093/bioinformatics/btg130.

Prediction of human intestinal absorption of drug compounds from molecular structure.

J Chem Inf Comput Sci. 1998 Jul-Aug;38(4):726-35. doi: 10.1021/ci980029a.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

GPD：一种图模式扩散核，用于实现化学信息学中具有应用的精确图分类。

GPD: a graph pattern diffusion kernel for accurate graph classification with applications in cheminformatics.

机构信息

Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2010 Apr-Jun;7(2):197-207. doi: 10.1109/TCBB.2009.80.

DOI:10.1109/TCBB.2009.80

PMID:20431140

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3058227/

Abstract

摘要

GPD：一种图模式扩散核，用于实现化学信息学中具有应用的精确图分类。

GPD: a graph pattern diffusion kernel for accurate graph classification with applications in cheminformatics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

GPD：一种图模式扩散核，用于实现化学信息学中具有应用的精确图分类。

GPD: a graph pattern diffusion kernel for accurate graph classification with applications in cheminformatics.

机构信息

出版信息