Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA.
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):559-62. doi: 10.1136/jamia.2010.004028.
OBJECTIVE To describe a new medication information extraction system-Textractor-developed for the 'i2b2 medication extraction challenge'. The development, functionalities, and official evaluation of the system are detailed.
Textractor is based on the Apache Unstructured Information Management Architecture (UMIA) framework, and uses methods that are a hybrid between machine learning and pattern matching. Two modules in the system are based on machine learning algorithms, while other modules use regular expressions, rules, and dictionaries, and one module embeds MetaMap Transfer.
The official evaluation was based on a reference standard of 251 discharge summaries annotated by all teams participating in the challenge. The metrics used were recall, precision, and the F(1)-measure. They were calculated with exact and inexact matches, and were averaged at the level of systems and documents.
The reference metric for this challenge, the system-level overall F(1)-measure, reached about 77% for exact matches, with a recall of 72% and a precision of 83%. Performance was the best with route information (F(1)-measure about 86%), and was good for dosage and frequency information, with F(1)-measures of about 82-85%. Results were not as good for durations, with F(1)-measures of 36-39%, and for reasons, with F(1)-measures of 24-27%.
The official evaluation of Textractor for the i2b2 medication extraction challenge demonstrated satisfactory performance. This system was among the 10 best performing systems in this challenge.
目的描述一个新的药物信息提取系统-Textractor 开发的“i2b2 药物提取挑战”。该系统的开发、功能和正式评估都有详细说明。
Textractor 是基于 Apache 非结构化信息管理架构(UMIA)框架,使用机器学习和模式匹配的混合方法。系统中的两个模块基于机器学习算法,而其他模块则使用正则表达式、规则和字典,以及一个模块嵌入 MetaMap Transfer。
正式评估是基于 251 份由所有参与挑战的团队注释的出院小结参考标准。使用的指标是召回率、精度和 F(1)-measure。它们使用精确和不精确匹配进行计算,并在系统和文档级别进行平均。
该挑战的参考指标,即系统级总体 F(1)-measure,对于精确匹配达到约 77%,召回率为 72%,精度为 83%。路线信息的性能最好(F(1)-measure 约为 86%),剂量和频率信息的性能也很好,F(1)-measure 约为 82-85%。持续时间的结果则不太好,F(1)-measure 为 36-39%,原因的 F(1)-measure 为 24-27%。
Textractor 对 i2b2 药物提取挑战的正式评估表明性能令人满意。该系统在该挑战中是表现最好的 10 个系统之一。