Suppr超能文献

基于图折叠的药物分子检索方法

[A retrieval method of drug molecules based on graph collapsing].

作者信息

Qu J W, Lv X Q, Liu Z M, Liao Y, Sun P H, Wang B, Tang Z

机构信息

Institute of Computer Science & Technology, Peking University, Beijing 100080, China.

Institute of Computer Science & Technology, Peking University, Beijing 100080, China; State Key Laboratory of Digital Publishing Technology, Beijing 100080, China.

出版信息

Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):368-374.

Abstract

OBJECTIVE

To establish a compact and efficient hypergraph representation and a graph-similarity-based retrieval method of molecules to achieve effective and efficient medicine information retrieval.

METHODS

Chemical structural formula (CSF) was a primary search target as a unique and precise identifier for each compound at the molecular level in the research field of medicine information retrieval. To retrieve medicine information effectively and efficiently, a complete workflow of the graph-based CSF retrieval system was introduced. This system accepted the photos taken from smartphones and the sketches drawn on tablet personal computers as CSF inputs, and formalized the CSFs with the corresponding graphs. Then this paper proposed a compact and efficient hypergraph representation for molecules on the basis of analyzing factors that directly affected the efficiency of graph matching. According to the characteristics of CSFs, a hierarchical collapsing method combining graph isomorphism and frequent subgraph mining was adopted. There was yet a fundamental challenge, subgraph overlapping during the collapsing procedure, which hindered the method from establishing the correct compact hypergraph of an original CSF graph. Therefore, a graph-isomorphism-based algorithm was proposed to select dominant acyclic subgraphs on the basis of overlapping analysis. Finally, the spatial similarity among graphical CSFs was evaluated by multi-dimensional measures of similarity.

RESULTS

To evaluate the performance of the proposed method, the proposed system was firstly compared with Wikipedia Chemical Structure Explorer (WCSE), the state-of-the-art system that allowed CSF similarity searching within Wikipedia molecules dataset, on retrieval accuracy. The system achieved higher values on mean average precision, discounted cumulative gain, rank-biased precision, and expected reciprocal rank than WCSE from the top-2 to the top-10 retrieved results. Specifically, the system achieved 10%, 1.41, 6.42%, and 1.32% higher than WCSE on these metrics for top-10 retrieval results, respectively. Moreover, several retrieval cases were presented to intuitively compare with WCSE. The results of the above comparative study demonstrated that the proposed method outperformed the existing method with regard to accuracy and effectiveness.

CONCLUSION

This paper proposes a graph-similarity-based retrieval approach for medicine information. To obtain satisfactory retrieval results, an isomorphism-based algorithm is proposed for dominant subgraph selection based on the subgraph overlapping analysis, as well as an effective and efficient hypergraph representation of molecules. Experiment results demonstrate the effectiveness of the proposed approach.

摘要

目的

建立一种紧凑高效的超图表示法以及基于图相似性的分子检索方法,以实现有效且高效的医学信息检索。

方法

在医学信息检索研究领域,化学结构式(CSF)作为每个化合物在分子层面的唯一且精确的标识符,是主要的搜索目标。为了有效且高效地检索医学信息,引入了基于图的CSF检索系统的完整工作流程。该系统接受智能手机拍摄的照片和平板电脑绘制的草图作为CSF输入,并将CSF形式化为相应的图。然后,在分析直接影响图匹配效率的因素的基础上,本文提出了一种紧凑高效的分子超图表示法。根据CSF的特征,采用了一种结合图同构和频繁子图挖掘的分层折叠方法。然而,在折叠过程中存在一个基本挑战,即子图重叠,这阻碍了该方法建立原始CSF图的正确紧凑超图。因此,提出了一种基于图同构的算法,在重叠分析的基础上选择主导无环子图。最后,通过多维相似性度量评估图形CSF之间的空间相似性。

结果

为了评估所提方法的性能,首先将所提系统与维基百科化学结构浏览器(WCSE)进行比较,WCSE是在维基百科分子数据集中允许进行CSF相似性搜索的最先进系统,比较检索准确性。从检索结果的前2名到前10名,该系统在平均精度均值、折损累计增益、排序偏差精度和期望倒数排名方面的值均高于WCSE。具体而言,对于前10名检索结果,该系统在这些指标上分别比WCSE高10%、1.41、6.42%和1.32%。此外,还给出了几个检索案例,以便直观地与WCSE进行比较。上述比较研究结果表明,所提方法在准确性和有效性方面优于现有方法。

结论

本文提出了一种基于图相似性的医学信息检索方法。为了获得满意的检索结果,提出了一种基于同构的算法,用于基于子图重叠分析选择主导子图,以及一种有效且高效的分子超图表示法。实验结果证明了所提方法的有效性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验