Soualmia Lina F, Dahamna Badisse, Thirion Benoît, Darmoni Stéfan J
CISMeF Team, Rouen University Hospital, Rouen, France.
Stud Health Technol Inform. 2006;124:595-600.
The amount of health data accessible on the Web is increasing and Internet has become a major source of health information. Many tools and search engines are available but medical information retrieval remains difficult for both the health professional and the patients.
In this paper we describe heuristics that aim at matching as much as possible queries with the content of the documents in the context of the CISMeF catalogue (Catalogue and Index of Health Resources in French) and its Doc'CISMeF search tool. The queries are represented by terms and the content of the documents is indexed by a terminology based on the MeSH thesaurus.
Several operations are performed to match the terms of the terminology: natural language processing techniques on multi-words queries, phonemisation, spelling correction, plain text search with adjacency etc.. Each one is tested to evaluate its contribution in matching the terminology and the indexed documents.
The implemented heuristics contribute significantly with good results in maximising as much as possible the recall of the Doc'CISMeF search tool.
网络上可获取的健康数据量不断增加,互联网已成为健康信息的主要来源。虽然有许多工具和搜索引擎,但医学信息检索对于健康专业人员和患者来说仍然困难。
在本文中,我们描述了一些启发式方法,旨在使CISMeF目录(法语健康资源目录和索引)及其Doc'CISMeF搜索工具中的查询与文档内容尽可能匹配。查询由术语表示,文档内容通过基于医学主题词表的术语进行索引。
为了使术语匹配,执行了多项操作:对多词查询进行自然语言处理技术、音素化、拼写纠正、邻接纯文本搜索等。对每一项操作进行测试,以评估其在术语与索引文档匹配中的作用。
所实施的启发式方法在尽可能提高Doc'CISMeF搜索工具的召回率方面取得了显著成效。