Tao Carson, Filannino Michele, Uzuner Özlem
SUNY at Albany, Albany, NY, USA.
George Mason University, Fairfax, Virginia, USA.
AMIA Annu Symp Proc. 2018 Dec 5;2018:1534-1543. eCollection 2018.
Prescription information is an important component of electronic health records (EHRs). This information contains detailed medication instructions that are crucial for patients' well-being and is often detailed in the narrative portions of EHRs. As a result, narratives of EHRs need to be processed with natural language processing (NLP) methods that can extract medication and prescription information from free text. However, automatic methods for medication and prescription extraction from narratives face two major challenges: (1) dictionaries can fall short even when identifying well-defined and syntactically consistent categories of medication entities, (2) some categories of medication entities are sparse, and at the same time lexically (and syntactically) diverse. In this paper, we describe FABLE, a system for automatically extracting prescription information from discharge summaries. FABLE utilizes unannotated data to enhance annotated training data: it performs semi-supervised extraction of medication information using pseudo-labels with Conditional Random Fields (CRFs) to improve its understanding of incomplete, sparse, and diverse medication entities. When evaluated against the official benchmark set from the 2009 i2b2 Shared Task and Workshop on Medication Extraction, FABLE achieves a horizontal phrase-level F1-measure of 0.878, giving state-of-the-art performance and significantly improving on nearly all entity categories.
处方信息是电子健康记录(EHR)的重要组成部分。该信息包含对患者健康至关重要的详细用药说明,且通常在电子健康记录的叙述部分中有详细记录。因此,电子健康记录的叙述需要使用能够从自由文本中提取用药和处方信息的自然语言处理(NLP)方法进行处理。然而,从叙述中自动提取用药和处方信息的方法面临两个主要挑战:(1)即使在识别定义明确且句法一致的用药实体类别时,词典也可能不够用;(2)某些用药实体类别稀疏,同时在词汇(和句法)上具有多样性。在本文中,我们描述了FABLE,一种用于从出院小结中自动提取处方信息的系统。FABLE利用未标注数据来增强已标注的训练数据:它使用带有条件随机场(CRF)的伪标签进行用药信息的半监督提取,以提高对不完整、稀疏和多样的用药实体的理解。在针对2009年i2b2药物提取共享任务和研讨会上设定的官方基准进行评估时,FABLE在水平短语级别的F1值达到了0.878,给出了当前的最优性能,并且几乎在所有实体类别上都有显著提升。