Department of Computer Science, University of Applied Sciences and Arts Dortmund (FH Dortmund), Emil-Figge-Straße 42, Dortmund, 44227, Germany; Institute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany.
Department of Computer Science, University of Applied Sciences and Arts Dortmund (FH Dortmund), Emil-Figge-Straße 42, Dortmund, 44227, Germany; Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany.
J Biomed Inform. 2023 Jul;143:104400. doi: 10.1016/j.jbi.2023.104400. Epub 2023 May 19.
In this work, we describe the findings of the 'WisPerMed' team from their participation in Track 1 (Contextualized Medication Event Extraction) of the n2c2 2022 challenge. We tackle two tasks: (i) medication extraction, which involves extracting all mentions of medications from the clinical notes, and (ii) event classification, which involves classifying the medication mentions based on whether a change in the medication has been discussed. To address the long lengths of clinical texts, which often exceed the maximum token length that models based on the transformer-architecture can handle, various approaches, such as the use of ClinicalBERT with a sliding window approach and Longformer-based models, are employed. In addition, domain adaptation through masked language modeling and preprocessing steps such as sentence splitting are utilized to improve model performance. Since both tasks were treated as named entity recognition (NER) problems, a sanity check was performed in the second release to eliminate possible weaknesses in the medication detection itself. This check used the medication spans to remove false positive predictions and replace missed tokens with the highest softmax probability of the disposition types. The effectiveness of these approaches is evaluated through multiple submissions to the tasks, as well as with post-challenge results, with a focus on the DeBERTa v3 model and its disentangled attention mechanism. Results show that the DeBERTa v3 model performs well in both the NER task and the event classification task.
在这项工作中,我们描述了 WisPerMed 团队在 n2c2 2022 挑战赛第 1 赛道(语境药物事件提取)中的参与情况。我们解决了两个任务:(i)药物提取,从临床记录中提取所有药物的提及;(ii)事件分类,根据是否讨论了药物的变化对药物提及进行分类。为了解决临床文本长度往往超过基于变压器架构的模型可以处理的最大令牌长度的问题,采用了各种方法,例如使用带有滑动窗口方法的 ClinicalBERT 和基于 Longformer 的模型。此外,还通过屏蔽语言建模和句子分割等预处理步骤进行领域自适应,以提高模型性能。由于这两个任务都被视为命名实体识别 (NER) 问题,因此在第二次发布中进行了合理性检查,以消除药物检测本身可能存在的弱点。该检查使用药物跨度来删除假阳性预测,并将缺失的标记用处置类型的最高 softmax 概率替换。通过对这些任务的多次提交以及对赛后结果的评估,评估了这些方法的有效性,重点关注了 DeBERTa v3 模型及其解耦注意力机制。结果表明,DeBERTa v3 模型在 NER 任务和事件分类任务中都表现良好。