Ferro Alfredo, Giugno Rosalba, Mongiovì Misael, Pulvirenti Alfredo, Skripin Dmitry, Shasha Dennis
Dipartimento di Matematica e Informatica, Università di Catania, Catania, 95125, Italy.
BMC Bioinformatics. 2008 Apr 25;9 Suppl 4(Suppl 4):S10. doi: 10.1186/1471-2105-9-S4-S10.
Biomedical and chemical databases are large and rapidly growing in size. Graphs naturally model such kinds of data. To fully exploit the wealth of information in these graph databases, a key role is played by systems that search for all exact or approximate occurrences of a query graph. To deal efficiently with graph searching, advanced methods for indexing, representation and matching of graphs have been proposed.
This paper presents GraphFind. The system implements efficient graph searching algorithms together with advanced filtering techniques that allow approximate search. It allows users to select candidate subgraphs rather than entire graphs. It implements an effective data storage based also on low-support data mining.
GraphFind is compared with Frowns, GraphGrep and gIndex. Experiments show that GraphFind outperforms the compared systems on a very large collection of small graphs. The proposed low-support mining technique which applies to any searching system also allows a significant index space reduction.
生物医学和化学数据库规模庞大且增长迅速。图能够自然地对这类数据进行建模。为了充分利用这些图数据库中的丰富信息,搜索查询图所有精确或近似出现情况的系统发挥着关键作用。为了高效处理图搜索,人们提出了用于图的索引、表示和匹配的先进方法。
本文介绍了GraphFind系统。该系统实现了高效的图搜索算法以及允许近似搜索的先进过滤技术。它允许用户选择候选子图而非整个图。它还基于低支持度数据挖掘实现了有效的数据存储。
将GraphFind与Frowns、GraphGrep和gIndex进行了比较。实验表明,在非常大的小型图集合上,GraphFind的性能优于被比较的系统。所提出的适用于任何搜索系统的低支持度挖掘技术也能显著减少索引空间。