SSMT-PANBERT：一种用于非结构化临床文本中表型提取和断言否定检测的单阶段多任务模型。

SSMT-PANBERT: A single-stage multitask model for phenotype extraction and assertion negation detection in unstructured clinical text.

作者信息

Zekaoui Nour Eddine, Rhanoui Maryem, Yousfi Siham, Mikram Mounia

机构信息

Meridian Team, LYRICA Laboratory, School of Information Sciences, Rabat, Morocco.

Laboratory Health Systemic Process (P2S), UR4129, University Claude Bernard Lyon 1, University of Lyon, Lyon, France.

出版信息

Comput Biol Med. 2025 Sep;195:110651. doi: 10.1016/j.compbiomed.2025.110651. Epub 2025 Jun 22.

DOI:10.1016/j.compbiomed.2025.110651

PMID:40550201

Abstract

Automatic phenotype extraction and assertion negation detection from large-scale accessible Electronic Health Records (EHRs), including discharge summaries and radiology reports, is a crucial task for various healthcare applications, such as disease diagnosis and treatment planning. The unstructured nature of these documents poses significant challenges for manual processing. However, prior studies exhibit several limitations, such as being restricted to a single label per sentence or omitting the extraction and negation of medical concepts, which make them prone to fail in complex circumstances. In this paper, we capitalize on the advancement of state-of-the-art pre-trained language models (PLMs) to propose a single-stage multitask solution that jointly learns to extract phenotypes and detect their assertion or negation in an end-to-end fashion. Our proposed approach aims to provide practical assistance to healthcare professionals by handling complex and diverse clinical scenarios. We evaluate our method on a validation set derived from an annotated, balanced, and validated dataset based on MIMIC-III clinical notes. The annotations were rigorously reviewed by domain experts to ensure high reliability. The top-performing model in our experiments, SSMT-PANBERT, achieves an average Macro F1 score of 92.33% and a Micro F1 score of 91.66% on the validation set, outperforming traditional pipeline approaches in terms of Macro F1 (92.33% vs. 91.66%), while reducing training time by 37%, inference time by 18.2%, and GPU memory usage by 57%. These results demonstrate the effectiveness of our unified approach in handling complex clinical scenarios while providing significant computational advantages for real-world applications. Furthermore, we conduct a thorough analysis of the model's performance and identify potential areas for future improvement.

摘要

从大规模可访问的电子健康记录（EHRs）中自动提取表型并检测断言否定，这些记录包括出院小结和放射学报告，这对于各种医疗应用（如疾病诊断和治疗规划）而言是一项关键任务。这些文档的非结构化性质给人工处理带来了重大挑战。然而，先前的研究存在若干局限性，例如每个句子仅限于单个标签，或者遗漏了医学概念的提取和否定，这使得它们在复杂情况下容易失败。在本文中，我们利用最先进的预训练语言模型（PLMs）的进展，提出了一种单阶段多任务解决方案，该方案以端到端的方式联合学习提取表型并检测其断言或否定。我们提出的方法旨在通过处理复杂多样的临床场景为医疗专业人员提供实际帮助。我们在基于MIMIC-III临床笔记的带注释、平衡且经过验证的数据集派生的验证集上评估我们的方法。注释经过领域专家的严格审查以确保高可靠性。我们实验中表现最佳的模型SSMT-PANBERT在验证集上实现了平均宏F1分数为92.33%，微F1分数为91.66%，在宏F1方面优于传统的流水线方法（92.33%对91.66%），同时将训练时间减少了37%，推理时间减少了18.2%，GPU内存使用减少了57%。这些结果证明了我们统一方法在处理复杂临床场景方面的有效性，同时为实际应用提供了显著的计算优势。此外，我们对模型的性能进行了全面分析，并确定了未来改进的潜在领域。

相似文献

SSMT-PANBERT: A single-stage multitask model for phenotype extraction and assertion negation detection in unstructured clinical text.SSMT-PANBERT：一种用于非结构化临床文本中表型提取和断言否定检测的单阶段多任务模型。

Comput Biol Med. 2025 Sep;195:110651. doi: 10.1016/j.compbiomed.2025.110651. Epub 2025 Jun 22.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Community and hospital-based healthcare professionals perceptions of digital advance care planning for palliative and end-of-life care: a latent class analysis.社区和医院的医疗保健专业人员对姑息治疗和临终关怀的数字预立医疗计划的看法：一项潜在类别分析。

Health Soc Care Deliv Res. 2025 Jun 25:1-22. doi: 10.3310/XCGE3294.

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.从BERT到生成式人工智能——在一组肺癌患者中比较仅编码器模型与大语言模型用于非结构化医疗报告中的命名实体识别

Comput Biol Med. 2025 Sep;195:110665. doi: 10.1016/j.compbiomed.2025.110665. Epub 2025 Jun 24.

Assessing large language models for acute heart failure classification and information extraction from French clinical notes.评估大型语言模型用于急性心力衰竭分类及从法国临床记录中提取信息。

Comput Biol Med. 2025 Sep;195:110609. doi: 10.1016/j.compbiomed.2025.110609. Epub 2025 Jun 19.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状Meta分析。

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Radiology report generation using automatic keyword adaptation, frequency-based multi-label classification and text-to-text large language models.使用自动关键词适配、基于频率的多标签分类和文本到文本的大语言模型生成放射学报告。

Comput Biol Med. 2025 Jul 3;196(Pt A):110625. doi: 10.1016/j.compbiomed.2025.110625.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SSMT-PANBERT：一种用于非结构化临床文本中表型提取和断言否定检测的单阶段多任务模型。

SSMT-PANBERT: A single-stage multitask model for phenotype extraction and assertion negation detection in unstructured clinical text.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献