Wiest Isabella Catharina, Verhees Falk Gerrik, Ferber Dyke, Zhu Jiefu, Bauer Michael, Lewitzka Ute, Pfennig Andrea, Mikolas Pavol, Kather Jakob Nikolas
Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; and Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
Department of Psychiatry and Psychotherapy, Carl Gustav Carus University Hospital, Technical University Dresden, Dresden, Germany.
Br J Psychiatry. 2024 Dec;225(6):532-537. doi: 10.1192/bjp.2024.134.
Attempts to use artificial intelligence (AI) in psychiatric disorders show moderate success, highlighting the potential of incorporating information from clinical assessments to improve the models. This study focuses on using large language models (LLMs) to detect suicide risk from medical text in psychiatric care.
To extract information about suicidality status from the admission notes in electronic health records (EHRs) using privacy-sensitive, locally hosted LLMs, specifically evaluating the efficacy of Llama-2 models.
We compared the performance of several variants of the open source LLM Llama-2 in extracting suicidality status from 100 psychiatric reports against a ground truth defined by human experts, assessing accuracy, sensitivity, specificity and F1 score across different prompting strategies.
A German fine-tuned Llama-2 model showed the highest accuracy (87.5%), sensitivity (83.0%) and specificity (91.8%) in identifying suicidality, with significant improvements in sensitivity and specificity across various prompt designs.
The study demonstrates the capability of LLMs, particularly Llama-2, in accurately extracting information on suicidality from psychiatric records while preserving data privacy. This suggests their application in surveillance systems for psychiatric emergencies and improving the clinical management of suicidality by improving systematic quality control and research.
在精神疾病中尝试使用人工智能(AI)取得了一定成功,这凸显了整合临床评估信息以改进模型的潜力。本研究聚焦于使用大语言模型(LLMs)从精神科护理中的医学文本中检测自杀风险。
使用隐私敏感的本地托管大语言模型,从电子健康记录(EHRs)中的入院记录中提取有关自杀状态的信息,特别评估Llama-2模型的有效性。
我们将开源大语言模型Llama-2的几个变体在从100份精神科报告中提取自杀状态的性能与人类专家定义的基本事实进行比较,评估不同提示策略下的准确性、敏感性、特异性和F1分数。
一个德国微调的Llama-2模型在识别自杀状态方面显示出最高的准确性(87.5%)、敏感性(83.0%)和特异性(91.8%),在各种提示设计下敏感性和特异性都有显著提高。
该研究证明了大语言模型,特别是Llama-2,在保护数据隐私的同时从精神科记录中准确提取自杀相关信息的能力。这表明它们可应用于精神科紧急情况监测系统,并通过改善系统质量控制和研究来改进自杀状态的临床管理。