Computational Intelligence Laboratory, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, 468-8511, Aichi, Japan.
Computational Intelligence Laboratory, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, 468-8511, Aichi, Japan.
J Biomed Inform. 2023 Aug;144:104416. doi: 10.1016/j.jbi.2023.104416. Epub 2023 Jun 13.
This paper describes contextualized medication event extraction for automatically identifying medication change events with their contexts from clinical notes. The striding named entity recognition (NER) model extracts medication name spans from an input text sequence using a sliding-window approach. Specifically, the striding NER model separates the input sequence into a set of overlapping subsequences of 512 tokens with 128 tokens of stride, processing each subsequence using a large pre-trained language model and aggregating the outputs from the subsequences. The event and context classification has been done with multi-turn question-answering (QA) and span-based models. The span-based model classifies the span of each medication name using the span representation of the language model. In the QA model, event classification is augmented with questions in classifying the change events of each medication name and the context of the change events, while the model architecture is a classification style that is the same as the span-based model. We evaluated our extraction system on the n2c2 2022 Track 1 dataset, which is annotated for medication extraction (ME), event classification (EC), and context classification (CC) from clinical notes. Our system is a pipeline of the striding NER model for ME and the ensemble of the span-based and QA-based models for EC and CC. Our system achieved a combined F-score of 66.47% for the end-to-end contextualized medication event extraction (Release 1), which is the highest score among the participants of the n2c2 2022 Track 1.
本文描述了上下文药物事件提取,用于从临床记录中自动识别具有上下文的药物变化事件。跨越命名实体识别(NER)模型使用滑动窗口方法从输入文本序列中提取药物名称跨度。具体来说,跨越 NER 模型将输入序列分割成一组重叠的 512 个标记的子序列,步长为 128 个标记,使用大型预训练语言模型处理每个子序列,并聚合子序列的输出。事件和上下文分类使用多轮问答(QA)和基于跨度的模型完成。基于跨度的模型使用语言模型的跨度表示对每个药物名称的跨度进行分类。在 QA 模型中,事件分类通过对每个药物名称的变化事件和变化事件的上下文进行分类来增强,而模型架构与基于跨度的模型相同,是一种分类风格。我们在 n2c2 2022 赛道 1 数据集上评估了我们的提取系统,该数据集针对临床记录中的药物提取(ME)、事件分类(EC)和上下文分类(CC)进行了注释。我们的系统是 ME 的跨越 NER 模型和 EC 和 CC 的基于跨度和 QA 的模型的集成的流水线。我们的系统在端到端上下文药物事件提取(Release 1)方面的综合 F 分数达到了 66.47%,这是 n2c2 2022 赛道 1 参与者中得分最高的。
J Biomed Inform. 2023-8
J Biomed Inform. 2024-3
2025-1
J Am Med Inform Assoc. 2024-10-1
JMIR Med Inform. 2020-11-27
J Biomed Inform. 2024-2