Suppr超能文献

SSMT-PANBERT:一种用于非结构化临床文本中表型提取和断言否定检测的单阶段多任务模型。

SSMT-PANBERT: A single-stage multitask model for phenotype extraction and assertion negation detection in unstructured clinical text.

作者信息

Zekaoui Nour Eddine, Rhanoui Maryem, Yousfi Siham, Mikram Mounia

机构信息

Meridian Team, LYRICA Laboratory, School of Information Sciences, Rabat, Morocco.

Laboratory Health Systemic Process (P2S), UR4129, University Claude Bernard Lyon 1, University of Lyon, Lyon, France.

出版信息

Comput Biol Med. 2025 Sep;195:110651. doi: 10.1016/j.compbiomed.2025.110651. Epub 2025 Jun 22.

Abstract

Automatic phenotype extraction and assertion negation detection from large-scale accessible Electronic Health Records (EHRs), including discharge summaries and radiology reports, is a crucial task for various healthcare applications, such as disease diagnosis and treatment planning. The unstructured nature of these documents poses significant challenges for manual processing. However, prior studies exhibit several limitations, such as being restricted to a single label per sentence or omitting the extraction and negation of medical concepts, which make them prone to fail in complex circumstances. In this paper, we capitalize on the advancement of state-of-the-art pre-trained language models (PLMs) to propose a single-stage multitask solution that jointly learns to extract phenotypes and detect their assertion or negation in an end-to-end fashion. Our proposed approach aims to provide practical assistance to healthcare professionals by handling complex and diverse clinical scenarios. We evaluate our method on a validation set derived from an annotated, balanced, and validated dataset based on MIMIC-III clinical notes. The annotations were rigorously reviewed by domain experts to ensure high reliability. The top-performing model in our experiments, SSMT-PANBERT, achieves an average Macro F1 score of 92.33% and a Micro F1 score of 91.66% on the validation set, outperforming traditional pipeline approaches in terms of Macro F1 (92.33% vs. 91.66%), while reducing training time by 37%, inference time by 18.2%, and GPU memory usage by 57%. These results demonstrate the effectiveness of our unified approach in handling complex clinical scenarios while providing significant computational advantages for real-world applications. Furthermore, we conduct a thorough analysis of the model's performance and identify potential areas for future improvement.

摘要

从大规模可访问的电子健康记录(EHRs)中自动提取表型并检测断言否定,这些记录包括出院小结和放射学报告,这对于各种医疗应用(如疾病诊断和治疗规划)而言是一项关键任务。这些文档的非结构化性质给人工处理带来了重大挑战。然而,先前的研究存在若干局限性,例如每个句子仅限于单个标签,或者遗漏了医学概念的提取和否定,这使得它们在复杂情况下容易失败。在本文中,我们利用最先进的预训练语言模型(PLMs)的进展,提出了一种单阶段多任务解决方案,该方案以端到端的方式联合学习提取表型并检测其断言或否定。我们提出的方法旨在通过处理复杂多样的临床场景为医疗专业人员提供实际帮助。我们在基于MIMIC-III临床笔记的带注释、平衡且经过验证的数据集派生的验证集上评估我们的方法。注释经过领域专家的严格审查以确保高可靠性。我们实验中表现最佳的模型SSMT-PANBERT在验证集上实现了平均宏F1分数为92.33%,微F1分数为91.66%,在宏F1方面优于传统的流水线方法(92.33%对91.66%),同时将训练时间减少了37%,推理时间减少了18.2%,GPU内存使用减少了57%。这些结果证明了我们统一方法在处理复杂临床场景方面的有效性,同时为实际应用提供了显著的计算优势。此外,我们对模型的性能进行了全面分析,并确定了未来改进的潜在领域。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验