Uzuner Ozlem, Goldstein Ira, Luo Yuan, Kohane Isaac
University at Albany, SUNY, Draper 114A, 135 Western Avenue, Albany, NY 12222, USA.
J Am Med Inform Assoc. 2008 Jan-Feb;15(1):14-24. doi: 10.1197/jamia.M2408. Epub 2007 Oct 18.
The authors organized a Natural Language Processing (NLP) challenge on automatically determining the smoking status of patients from information found in their discharge records. This challenge was issued as a part of the i2b2 (Informatics for Integrating Biology to the Bedside) project, to survey, facilitate, and examine studies in medical language understanding for clinical narratives. This article describes the smoking challenge, details the data and the annotation process, explains the evaluation metrics, discusses the characteristics of the systems developed for the challenge, presents an analysis of the results of received system runs, draws conclusions about the state of the art, and identifies directions for future research. A total of 11 teams participated in the smoking challenge. Each team submitted up to three system runs, providing a total of 23 submissions. The submitted system runs were evaluated with microaveraged and macroaveraged precision, recall, and F-measure. The systems submitted to the smoking challenge represented a variety of machine learning and rule-based algorithms. Despite the differences in their approaches to smoking status identification, many of these systems provided good results. There were 12 system runs with microaveraged F-measures above 0.84. Analysis of the results highlighted the fact that discharge summaries express smoking status using a limited number of textual features (e.g., "smok", "tobac", "cigar", Social History, etc.). Many of the effective smoking status identifiers benefit from these features.
作者组织了一场自然语言处理(NLP)挑战赛,旨在根据患者出院记录中的信息自动确定其吸烟状况。作为i2b2(从生物学整合到床边的信息学)项目的一部分,发起了这项挑战赛,以调查、促进和检验医学语言理解方面针对临床叙述的研究。本文描述了吸烟状况挑战赛,详细介绍了数据和标注过程,解释了评估指标,讨论了为该挑战赛开发的系统的特点,对收到的系统运行结果进行了分析,得出了当前技术水平的结论,并确定了未来研究的方向。共有11个团队参加了吸烟状况挑战赛。每个团队最多提交三次系统运行结果,总共提交了23份。提交的系统运行结果通过微观平均和宏观平均的精确率、召回率和F值进行评估。提交给吸烟状况挑战赛的系统代表了各种机器学习和基于规则的算法。尽管它们在识别吸烟状况的方法上存在差异,但其中许多系统都取得了不错的结果。有12次系统运行的微观平均F值高于0.84。结果分析突出了这样一个事实,即出院小结使用有限的文本特征(如“smok”“tobac”“cigar”、社会史等)来表达吸烟状况。许多有效的吸烟状况识别器都受益于这些特征。