Tian Yuanyuan, McEachin Richard C, Santos Carlos, States David J, Patel Jignesh M
Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA.
Bioinformatics. 2007 Jan 15;23(2):232-9. doi: 10.1093/bioinformatics/btl571. Epub 2006 Nov 16.
With the rapid increase in the availability of biological graph datasets, there is a growing need for effective and efficient graph querying methods. Due to the noisy and incomplete characteristics of these datasets, exact graph matching methods have limited use and approximate graph matching methods are required. Unfortunately, existing graph matching methods are too restrictive as they only allow exact or near exact graph matching. This paper presents a novel approximate graph matching technique called SAGA. This technique employs a flexible model for computing graph similarity, which allows for node gaps, node mismatches and graph structural differences. SAGA employs an indexing technique that allows it to efficiently evaluate queries even against large graph datasets.
SAGA has been used to query biological pathways and literature datasets, which has revealed interesting similarities between distinct pathways that cannot be found by existing methods. These matches associate seemingly unrelated biological processes, connect studies in different sub-areas of biomedical research and thus pose hypotheses for new discoveries. SAGA is also orders of magnitude faster than existing methods.
SAGA can be accessed freely via the web at http://www.eecs.umich.edu/saga. Binaries are also freely available at this website.
随着生物图谱数据集可用性的迅速增加,对有效且高效的图谱查询方法的需求也日益增长。由于这些数据集具有噪声和不完整的特性,精确图谱匹配方法的用途有限,因此需要近似图谱匹配方法。不幸的是,现有的图谱匹配方法限制过多,因为它们只允许精确或近乎精确的图谱匹配。本文提出了一种名为SAGA的新型近似图谱匹配技术。该技术采用了一种灵活的模型来计算图谱相似度,允许节点间隙、节点不匹配以及图谱结构差异。SAGA采用了一种索引技术,使其即使针对大型图谱数据集也能高效地评估查询。
SAGA已被用于查询生物途径和文献数据集,揭示了现有方法无法发现的不同途径之间有趣的相似性。这些匹配关联了看似不相关的生物过程,连接了生物医学研究不同子领域的研究,从而为新发现提出了假设。SAGA的速度也比现有方法快几个数量级。