LIMSI-CNRS, France.
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):555-8. doi: 10.1136/jamia.2010.003962.
While essential for patient care, information related to medication is often written as free text in clinical records and, therefore, difficult to use in computerized systems. This paper describes an approach to automatically extract medication information from clinical records, which was developed to participate in the i2b2 2009 challenge, as well as different strategies to improve the extraction.
Our approach relies on a semantic lexicon and extraction rules as a two-phase strategy: first, drug names are recognized and, then, the context of these names is explored to extract drug-related information (mode, dosage, etc) according to rules capturing the document structure and the syntax of each kind of information. Different configurations are tested to improve this baseline system along several dimensions, particularly drug name recognition-this step being a determining factor to extract drug-related information. Changes were tested at the level of the lexicons and of the extraction rules.
The initial system participating in i2b2 achieved good results (global F-measure of 77%). Further testing of different configurations substantially improved the system (global F-measure of 81%), performing well for all types of information (eg, 84% for drug names and 88% for modes), except for durations and reasons, which remain problematic.
This study demonstrates that a simple rule-based system can achieve good performance on the medication extraction task. We also showed that controlled modifications (lexicon filtering and rule refinement) were the improvements that best raised the performance.
尽管药物信息对于患者护理至关重要,但这些信息通常以临床记录中的自由文本形式呈现,因此难以在计算机化系统中使用。本文描述了一种从临床记录中自动提取药物信息的方法,该方法是为了参加 i2b2 2009 挑战赛而开发的,同时还介绍了不同的改进策略。
我们的方法依赖于语义词典和提取规则作为两阶段策略:首先识别药物名称,然后根据捕获文档结构和每种信息语法的规则探索这些名称的上下文,以提取与药物相关的信息(方式、剂量等)。测试了不同的配置以沿多个维度改进此基线系统,特别是药物名称识别——这是提取药物相关信息的决定性因素。在词汇和提取规则层面都进行了更改测试。
参与 i2b2 的初始系统取得了良好的结果(总体 F 度量为 77%)。进一步测试不同的配置大大提高了系统的性能(总体 F 度量为 81%),对于所有类型的信息(例如,药物名称为 84%,方式为 88%)都表现良好,除了持续时间和原因,这仍然是个问题。
本研究表明,简单的基于规则的系统可以在药物提取任务中取得良好的性能。我们还表明,受控修改(词汇过滤和规则细化)是提高性能的最佳改进方法。