增强电子健康记录中的自杀行为检测：一种基于变压器模型和语义检索注释的多标签自然语言处理框架。

Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation.

作者信息

Zandbiglari Kimia, Kumar Shobhan, Bilal Muhammad, Goodin Amie, Rouhizadeh Masoud

机构信息

Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA.

Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA; Division of Biomedical Informatics & Data Science, Johns Hopkins University School of Medicine, Baltimore, MD, USA.

出版信息

J Biomed Inform. 2025 Jan;161:104755. doi: 10.1016/j.jbi.2024.104755. Epub 2024 Dec 2.

BACKGROUND

Suicide is a leading cause of death worldwide, making early identification of suicidal behaviors crucial for clinicians. Current Natural Language Processing (NLP) approaches for identifying suicidal behaviors in Electronic Health Records (EHRs) rely on keyword searches, rule-based methods, and binary classification, which may not fully capture the complexity and spectrum of suicidal behaviors. This study aims to create a multi-class labeled dataset with annotation guidelines and develop a novel NLP approach for fine-grained, multi-label classification of suicidal behaviors, improving the efficiency of the annotation process and accuracy of the NLP methods.

METHODS

We develop a multi-class labeling system based on guidelines from FDA, CDC, and WHO, distinguishing between six categories of suicidal behaviors and allowing for multiple labels per data sample. To efficiently create an annotated dataset, we use an MPNet-based semantic retrieval framework to extract relevant sentences from a large EHR dataset, reducing annotation space while capturing diverse expressions. Experts annotate the extracted sentences using the multi-class system. We then formulate the task as a multi-label classification problem and fine-tune transformer-based models on the curated dataset to accurately classify suicidal behaviors in EHRs.

RESULTS

Lexical analysis revealed key themes in assessing suicide risk, considering an individual's history, mental health, substance use, and family background. Fine-tuned transformer-based models effectively identified suicidal behaviors from EHRs, with Bio_ClinicalBERT, BioBERT, and XLNet achieving the F1 scores (0.81), outperforming BERT and RoBERTa. The proposed approach, based on a multi-label classification system, captures the complexity of suicidal behaviors effectively particularly "Suicide Attempt" and "Family History" instances. The proposed approach, using task-specific NLP models and a multi-label classification system, captures the complexity of suicidal behaviors more effectively than traditional binary classification. However, direct comparisons with existing studies are difficult due to varying metrics and label definitions.

CONCLUSION

This study presents a robust NLP framework for detecting suicidal behaviors in EHRs, leveraging task-specific fine-tuning of transformer-based models and a semi-automated pipeline. Despite limitations, the approach demonstrates the potential of advanced NLP techniques in enhancing the identification of suicidal behaviors. Future work should focus on model expansion and integration to further improve patient care and clinical decision-making.

背景

自杀是全球主要的死亡原因之一，因此临床医生尽早识别自杀行为至关重要。当前，用于在电子健康记录（EHR）中识别自杀行为的自然语言处理（NLP）方法依赖于关键词搜索、基于规则的方法和二元分类，这些方法可能无法完全捕捉自杀行为的复杂性和范围。本研究旨在创建一个带有注释指南的多类标记数据集，并开发一种新颖的NLP方法，用于对自杀行为进行细粒度、多标签分类，提高注释过程的效率和NLP方法的准确性。

方法

我们根据美国食品药品监督管理局（FDA）、美国疾病控制与预防中心（CDC）和世界卫生组织（WHO）的指南开发了一个多类标记系统，区分六种自杀行为类别，并允许每个数据样本有多个标签。为了高效创建一个带注释的数据集，我们使用基于MPNet的语义检索框架从一个大型EHR数据集中提取相关句子，在捕捉不同表达的同时减少注释空间。专家使用多类系统对提取的句子进行注释。然后，我们将该任务表述为一个多标签分类问题，并在经过整理的数据集上对基于Transformer的模型进行微调，以准确分类EHR中的自杀行为。

结果

词汇分析揭示了评估自杀风险时的关键主题，包括考虑个人病史、心理健康、物质使用和家庭背景。基于Transformer的微调模型有效地从EHR中识别出自杀行为，Bio_ClinicalBERT、BioBERT和XLNet的F1分数达到0.81，优于BERT和RoBERTa。所提出的基于多标签分类系统的方法有效地捕捉了自杀行为的复杂性，特别是“自杀未遂”和“家族病史”实例。所提出的使用特定任务NLP模型和多标签分类系统的方法比传统二元分类更有效地捕捉了自杀行为的复杂性。然而，由于指标和标签定义不同，难以与现有研究进行直接比较。

结论

本研究提出了一个强大的NLP框架，用于在EHR中检测自杀行为，利用基于Transformer的模型的特定任务微调以及半自动流程。尽管存在局限性，但该方法展示了先进NLP技术在加强自杀行为识别方面的潜力。未来的工作应侧重于模型扩展和整合，以进一步改善患者护理和临床决策。

Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献