Choi Sungbin, Choi Jinwook, Yoo Sooyoung, Kim Heechun, Lee Youngho
Department of Biomedical Engineering, Seoul National University, Seoul, Republic of Korea.
Center for Medical Informatics, Seoul National University Bundang Hospital, Gyeonggi-do, Republic of Korea.
J Biomed Inform. 2014 Feb;47:18-27. doi: 10.1016/j.jbi.2013.08.013. Epub 2013 Sep 11.
In medical information retrieval research, semantic resources have been mostly used by expanding the original query terms or estimating the concept importance weight. However, implicit term-dependency information contained in semantic concept terms has been overlooked or at least underused in most previous studies. In this study, we incorporate a semantic concept-based term-dependence feature into a formal retrieval model to improve its ranking performance.
Standardized medical concept terms used by medical professionals were assumed to have implicit dependency within the same concept. We hypothesized that, by elaborately revising the ranking algorithms to favor documents that preserve those implicit dependencies, the ranking performance could be improved. The implicit dependence features are harvested from the original query using MetaMap. These semantic concept-based dependence features were incorporated into a semantic concept-enriched dependence model (SCDM). We designed four different variants of the model, with each variant having distinct characteristics in the feature formulation method.
We performed leave-one-out cross validations on both a clinical document corpus (TREC Medical records track) and a medical literature corpus (OHSUMED), which are representative test collections in medical information retrieval research.
Our semantic concept-enriched dependence model consistently outperformed other state-of-the-art retrieval methods. Analysis shows that the performance gain has occurred independently of the concept's explicit importance in the query.
By capturing implicit knowledge with regard to the query term relationships and incorporating them into a ranking model, we could build a more robust and effective retrieval model, independent of the concept importance.
在医学信息检索研究中,语义资源大多通过扩展原始查询词或估计概念重要性权重来使用。然而,语义概念词中包含的隐式词依赖信息在大多数先前研究中被忽视或至少未得到充分利用。在本研究中,我们将基于语义概念的词依赖特征纳入正式检索模型以提高其排序性能。
假设医学专业人员使用的标准化医学概念词在同一概念内具有隐式依赖性。我们假设,通过精心修改排序算法以支持保留这些隐式依赖性的文档,可以提高排序性能。使用MetaMap从原始查询中获取隐式依赖特征。这些基于语义概念的依赖特征被纳入一个语义概念丰富的依赖模型(SCDM)。我们设计了该模型的四种不同变体,每个变体在特征制定方法上具有不同的特点。
我们对临床文档语料库(TREC医疗记录跟踪)和医学文献语料库(OHSUMED)进行了留一法交叉验证,这是医学信息检索研究中有代表性的测试集。
我们的语义概念丰富的依赖模型始终优于其他最先进的检索方法。分析表明,性能提升与概念在查询中的明确重要性无关。
通过捕获查询词关系的隐式知识并将其纳入排序模型,我们可以构建一个更强大、更有效的检索模型,而与概念重要性无关。