Pasche Emilie, Gobeill Julien, Vishnyakova Dina, Ruch Patrick, Lovis Christian
Division of Medical Information Sciences, University Hospitals of Geneva and University of Geneva, Geneva, Switzerland.
Stud Health Technol Inform. 2013;192:1068.
The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in large biomedical collections. Therefore, using biomedical controlled vocabularies is crucial for managing these contents. We investigate the impact of query expansion based on controlled vocabularies to improve the effectiveness of two search engines. Our strategy relies on the enrichment of users' queries with additional terms, directly derived from such vocabularies applied to infectious diseases and chemical patents. We observed that query expansion based on pathogen names resulted in improvements of the top-precision of our first search engine, while the normalization of diseases degraded the top-precision. The expansion of chemical entities, which was performed on the second search engine, positively affected the mean average precision. We have shown that query expansion of some types of biomedical entities has a great potential to improve search effectiveness; therefore a fine-tuning of query expansion strategies could help improving the performances of search engines.
生物医学词汇的高度异质性是大型生物医学文献库中信息检索的主要障碍。因此,使用生物医学控制词汇对于管理这些内容至关重要。我们研究了基于控制词汇的查询扩展对提高两个搜索引擎有效性的影响。我们的策略依赖于用直接从应用于传染病和化学专利的此类词汇中派生的附加术语丰富用户查询。我们观察到,基于病原体名称的查询扩展提高了我们第一个搜索引擎的前几位精度,而疾病的规范化则降低了前几位精度。在第二个搜索引擎上进行的化学实体扩展对平均精度有积极影响。我们已经表明,某些类型生物医学实体的查询扩展具有提高搜索有效性的巨大潜力;因此,对查询扩展策略进行微调有助于提高搜索引擎的性能。