Wu Yonghui, Denny Joshua C, Rosenbloom S Trent, Miller Randolph A, Giuse Dario A, Xu Hua
Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.
AMIA Annu Symp Proc. 2012;2012:997-1003. Epub 2012 Nov 3.
Clinical Natural Language Processing (NLP) systems extract clinical information from narrative clinical texts in many settings. Previous research mentions the challenges of handling abbreviations in clinical texts, but provides little insight into how well current NLP systems correctly recognize and interpret abbreviations. In this paper, we compared performance of three existing clinical NLP systems in handling abbreviations: MetaMap, MedLEE, and cTAKES. The evaluation used an expert-annotated gold standard set of clinical documents (derived from from 32 de-identified patient discharge summaries) containing 1,112 abbreviations. The existing NLP systems achieved suboptimal performance in abbreviation identification, with F-scores ranging from 0.165 to 0.601. MedLEE achieved the best F-score of 0.601 for all abbreviations and 0.705 for clinically relevant abbreviations. This study suggested that accurate identification of clinical abbreviations is a challenging task and that more advanced abbreviation recognition modules might improve existing clinical NLP systems.
临床自然语言处理(NLP)系统可在多种场景下从叙述性临床文本中提取临床信息。以往研究提及了处理临床文本中缩写词的挑战,但对于当前NLP系统正确识别和解释缩写词的能力却鲜有深入探讨。在本文中,我们比较了三种现有的临床NLP系统在处理缩写词方面的性能:MetaMap、MedLEE和cTAKES。评估使用了一组由专家标注的临床文档金标准集(源自32份去标识化的患者出院小结),其中包含1112个缩写词。现有的NLP系统在缩写词识别方面表现欠佳,F值范围为0.165至0.601。MedLEE在所有缩写词上取得了最佳F值0.601,在临床相关缩写词上取得了0.705的F值。本研究表明,准确识别临床缩写词是一项具有挑战性的任务,更先进的缩写词识别模块可能会改进现有的临床NLP系统。