Liu Mei, Shah Anushi, Jiang Min, Peterson Neeraja B, Dai Qi, Aldrich Melinda C, Chen Qingxia, Bowton Erica A, Liu Hongfang, Denny Joshua C, Xu Hua
Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.
AMIA Annu Symp Proc. 2012;2012:577-86. Epub 2012 Nov 3.
Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of a patient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module in the clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of the smoking module in cTAKES on the Vanderbilt University Hospital's EMR data. Our evaluation demonstrated that modest effort of change is necessary to achieve desirable performance. We modified the system by filtering notes, annotating new data for training the machine learning classifier, and adding rules to the rule-based classifiers. Our results showed that the customized module achieved significantly higher F-measures at all levels of classification (i.e., sentence, document, patient) compared to the direct application of the cTAKES module to the Vanderbilt data.
电子病历(EMR)是临床观察性研究的宝贵资源。患者的吸烟状况是许多疾病的关键因素之一,但它常常嵌入在叙述性文本中。针对这一特定任务已开发出自然语言处理(NLP)系统,比如临床文本分析与知识提取系统(cTAKES)中的吸烟状况检测模块。本研究考察了cTAKES中吸烟模块在范德堡大学医院电子病历数据上的可移植性。我们的评估表明,要实现理想的性能需要付出适度的修改努力。我们通过筛选记录、为训练机器学习分类器标注新数据以及向基于规则的分类器添加规则来对系统进行修改。我们的结果表明,与将cTAKES模块直接应用于范德堡数据相比,定制模块在所有分类级别(即句子、文档、患者)上都取得了显著更高的F值。