Department of Biomedical Informatics, School of Medicine, University of California, San Diego, USA.
Biomedical Knowledge Engineering Laboratory, School of Dentistry, Seoul National University, South Korea.
Comput Biol Med. 2017 Aug 1;87:217-229. doi: 10.1016/j.compbiomed.2017.05.026. Epub 2017 May 31.
The keyword-based entity search restricts search space based on the preference of search. When given keywords and preferences are not related to the same biomedical topic, existing biomedical Linked Data search engines fail to deliver satisfactory results. This research aims to tackle this issue by supporting an inter-topic search-improving search with inputs, keywords and preferences, under different topics.
This study developed an effective algorithm in which the relations between biomedical entities were used in tandem with a keyword-based entity search, Siren. The algorithm, PERank, which is an adaptation of Personalized PageRank (PPR), uses a pair of input: (1) search preferences, and (2) entities from a keyword-based entity search with a keyword query, to formalize the search results on-the-fly based on the index of the precomputed Individual Personalized PageRank Vectors (IPPVs).
Our experiments were performed over ten linked life datasets for two query sets, one with keyword-preference topic correspondence (intra-topic search), and the other without (inter-topic search). The experiments showed that the proposed method achieved better search results, for example a 14% increase in precision for the inter-topic search than the baseline keyword-based search engine.
The proposed method improved the keyword-based biomedical entity search by supporting the inter-topic search without affecting the intra-topic search based on the relations between different entities.
基于偏好的关键词实体搜索通过限制搜索空间来提高搜索效率。当提供的关键词和偏好与不同的生物医学主题相关时,现有的生物医学链接数据搜索引擎无法提供令人满意的结果。本研究旨在通过支持跨主题搜索来解决这个问题,即在不同主题下,输入关键词和偏好,以改进搜索。
本研究开发了一种有效的算法,该算法将生物医学实体之间的关系与基于关键词的实体搜索相结合,即 Siren。该算法,PERank,是个性化 PageRank (PPR) 的一种改编,使用一对输入:(1)搜索偏好,(2)来自基于关键词的实体搜索的实体,使用关键词查询,根据预先计算的个体个性化 PageRank 向量 (IPPVs) 的索引,实时形式化搜索结果。
我们在十个链接生命数据集上进行了实验,使用了两个查询集,一个具有关键词偏好主题对应关系(主题内搜索),另一个没有(主题间搜索)。实验表明,该方法在主题间搜索方面取得了更好的搜索结果,例如,与基于关键词的基线搜索引擎相比,主题间搜索的精度提高了 14%。
该方法通过支持跨主题搜索,而不影响基于不同实体之间关系的主题内搜索,提高了基于关键词的生物医学实体搜索。