Chandar Praveen, Yaman Anil, Hoxha Julia, He Zhe, Weng Chunhua
Department of Biomedical Informatics, Columbia University, New York, NY USA.
AMIA Annu Symp Proc. 2015 Nov 5;2015:386-95. eCollection 2015.
Terminologies can suffer from poor concept coverage due to delays in addition of new concepts. This study tests a similarity-based approach to recommending concepts from a text corpus to a terminology. Our approach involves extraction of candidate concepts from a given text corpus, which are represented using a set of features. The model learns the important features to characterize a concept and recommends new concepts to a terminology. Further, we propose a cost-effective evaluation methodology to estimate the effectiveness of terminology enrichment methods. To test our methodology, we use the clinical trial eligibility criteria free-text as an example text corpus to recommend concepts for SNOMED CT. We computed precision at various rank intervals to measure the performance of the methods. Results indicate that our automated algorithm is an effective method for concept recommendation.
由于添加新概念存在延迟,术语表可能存在概念覆盖不足的问题。本研究测试了一种基于相似度的方法,用于从文本语料库向术语表推荐概念。我们的方法包括从给定的文本语料库中提取候选概念,这些概念用一组特征来表示。该模型学习表征概念的重要特征,并向术语表推荐新概念。此外,我们提出了一种具有成本效益的评估方法,以估计术语丰富方法的有效性。为了测试我们的方法,我们以临床试验入选标准自由文本作为示例文本语料库,为SNOMED CT推荐概念。我们计算了不同排名区间的精确率,以衡量这些方法的性能。结果表明,我们的自动化算法是一种有效的概念推荐方法。