Department of Information Studies, University at Albany, State University of New York, Albany, NY, USA.
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.
The Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records focused on the identification of medications, their dosages, modes (routes) of administration, frequencies, durations, and reasons for administration in discharge summaries. This challenge is referred to as the medication challenge. For the medication challenge, i2b2 released detailed annotation guidelines along with a set of annotated discharge summaries. Twenty teams representing 23 organizations and nine countries participated in the medication challenge. The teams produced rule-based, machine learning, and hybrid systems targeted to the task. Although rule-based systems dominated the top 10, the best performing system was a hybrid. Of all medication-related fields, durations and reasons were the most difficult for all systems to detect. While medications themselves were identified with better than 0.75 F-measure by all of the top 10 systems, the best F-measure for durations and reasons were 0.525 and 0.459, respectively. State-of-the-art natural language processing systems go a long way toward extracting medication names, dosages, modes, and frequencies. However, they are limited in recognizing duration and reason fields and would benefit from future research.
第三届 i2b2 自然语言处理临床记录挑战研讨会重点关注从出院小结中识别药物、剂量、给药途径、频率、持续时间和给药原因。这一挑战被称为药物挑战。针对药物挑战,i2b2 发布了详细的标注指南以及一组标注的出院小结。20 支代表 23 个组织和 9 个国家的团队参与了药物挑战。这些团队开发了针对该任务的基于规则、机器学习和混合系统。虽然基于规则的系统在前十名中占据主导地位,但表现最好的系统是混合系统。在所有与药物相关的字段中,持续时间和原因是所有系统最难检测的。虽然所有前 10 名系统对药物的识别准确率都超过了 0.75 的 F 值,但持续时间和原因的最佳 F 值分别为 0.525 和 0.459。最先进的自然语言处理系统在提取药物名称、剂量、途径和频率方面取得了很大进展。然而,它们在识别持续时间和原因字段方面存在局限性,未来的研究将从中受益。