Department of Computing, Open University, Milton Keynes, UK.
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):545-8. doi: 10.1136/jamia.2010.003863.
This article describes a system developed for the 2009 i2b2 Medication Extraction Challenge. The purpose of this challenge is to extract medication information from hospital discharge summaries.
The system explored several linguistic natural language processing techniques (eg, term-based and token-based rule matching) to identify medication-related information in the narrative text. A number of lexical resources was constructed to profile lexical or morphological features for different categories of medication constituents.
Performance was evaluated in terms of the micro-averaged F-measure at the horizontal system level.
The automated system performed well, and achieved an F-micro of 80% for the term-level results and 81% for the token-level results, placing it sixth in exact matches and fourth in inexact matches in the i2b2 competition.
The overall results show that this relatively simple rule-based approach is capable of tackling multiple entity identification tasks such as medication extraction under situations in which few training documents are annotated for machine learning approaches, and the entity information can be characterized with a set of feature tokens.
本文描述了为 2009 年 i2b2 药物提取挑战赛开发的系统。该挑战赛的目的是从出院小结中提取药物信息。
该系统探索了几种语言自然语言处理技术(例如,基于术语和基于标记的规则匹配),以识别叙述文本中的与药物相关的信息。构建了许多词汇资源来为不同类别的药物成分描绘词汇或形态特征。
在水平系统级别,以微平均 F 度量来评估性能。
自动化系统表现良好,在术语级别上的 F-微分值达到 80%,在标记级别上的 F-微分值达到 81%,在 i2b2 竞赛中的精确匹配中排名第六,在不精确匹配中排名第四。
总体结果表明,这种相对简单的基于规则的方法能够解决多种实体识别任务,例如在很少有训练文档可用于机器学习方法的情况下进行药物提取,并且可以使用一组特征标记来描述实体信息。