McCormick Patrick J, Elhadad Noémie, Stetson Peter D
College of Physicians & Surgeons, Columbia University, New York, NY, USA.
AMIA Annu Symp Proc. 2008 Nov 6;2008:450-4.
The recent i2b2 NLP Challenge smoking classification task offers a rare chance to compare different natural language processing techniques on actual clinical data. We compare the performance of a classifier which relies on semantic features generated by an unmodified version of MedLEE, a clinical NLP engine, to one using lexical features. We also compare the performance of supervised classifiers to rule-based symbolic classifiers. Our baseline supervised classifier with lexical features yields a microaveraged F-measure of 0.81. Our rule-based classifier using MedLEE semantic features is superior, with an F-measure of 0.83. Our supervised classifier trained with semantic MedLEE features is competitive with the top-performing smoking classifier in the i2b2 NLP Challenge, with microaveraged precision of 0.90, recall of 0.89, and F-measure of 0.89.
最近的i2b2自然语言处理挑战赛吸烟分类任务提供了一个难得的机会,可在实际临床数据上比较不同的自然语言处理技术。我们将一个依赖于临床自然语言处理引擎MedLEE未修改版本生成的语义特征的分类器的性能,与使用词汇特征的分类器进行比较。我们还将监督分类器的性能与基于规则的符号分类器进行比较。我们具有词汇特征的基线监督分类器的微平均F值为0.81。我们使用MedLEE语义特征的基于规则的分类器更胜一筹,F值为0.83。我们使用MedLEE语义特征训练的监督分类器在i2b2自然语言处理挑战赛中与表现最佳的吸烟分类器具有竞争力,微平均精度为0.90,召回率为0.89,F值为0.89。