• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于从电子健康记录中提取精神疾病表型的大语言模型

Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records.

作者信息

Frydman-Gani Clara, Arias Alejandro, Vallejo Maria Perez, Londoño Martínez John Daniel, Valencia-Echeverry Johanna, Castaño Mauricio, Bui Alex A T, Freimer Nelson B, Lopez-Jaramillo Carlos, Olde Loohuis Loes M

机构信息

Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, USA.

Department of Mental Health and Human Behavior, University of Caldas, Manizales, Colombia.

出版信息

medRxiv. 2025 Aug 12:2025.08.07.25333172. doi: 10.1101/2025.08.07.25333172.

DOI:10.1101/2025.08.07.25333172
PMID:40832382
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12363723/
Abstract

The accurate detection of clinical phenotypes from electronic health records (EHRs) is pivotal for advancing large-scale genetic and longitudinal studies in psychiatry. Free-text clinical notes are an essential source of symptom-level information, particularly in psychiatry. However, the automated extraction of symptoms from clinical text remains challenging. Here, we tested 11 open-source generative large language models (LLMs) for their ability to detect 109 psychiatric phenotypes from clinical text, using annotated EHR notes from a psychiatric clinic in Colombia. The LLMs were evaluated both "out-of-the-box" and after fine-tuning, and compared against a traditional natural language processing (tNLP) method developed from the same data. We show that while base LLM performance was poor to moderate (0.2-0.6 macro-F1 for zero-shot; 0.2-0.74 macro-F1 for few shot), it improved significantly after fine-tuning (0.75-0.86 macro-F1), with several fine-tuned LLMs outperforming the tNLP method. In total, 100 phenotypes could be reliably detected (F1>0.8) using either a fine-tuned LLM or tNLP. To generate a fine-tuned LLM that can be shared with the scientific and medical community, we created a fully synthetic dataset free of patient information but based on original annotations. We fine-tuned a top-performing LLM on this data, creating "Mistral-small-psych", an LLM that can detect psychiatric phenotypes from Spanish text with performance comparable to that of LLMs trained on real EHR data (macro-F1=0.79). Finally, the fine-tuned LLMs underwent an external validation using data from a large psychiatric hospital in Colombia, the Hospital Mental de Antioquia, highlighting that most LLMs generalized well (0.02-0.16 point loss in macro-F1). Our study underscores the value of domain-specific adaptation of LLMs and introduces a new model for accurate psychiatric phenotyping in Spanish text, paving the way for global precision psychiatry.

摘要

从电子健康记录(EHR)中准确检测临床表型对于推进精神病学领域的大规模基因研究和纵向研究至关重要。自由文本临床记录是症状级信息的重要来源,在精神病学领域尤为如此。然而,从临床文本中自动提取症状仍然具有挑战性。在此,我们使用来自哥伦比亚一家精神病诊所的带注释的EHR记录,测试了11个开源生成式大语言模型(LLM)从临床文本中检测109种精神疾病表型的能力。这些LLM在“开箱即用”和微调后均进行了评估,并与基于相同数据开发的传统自然语言处理(tNLP)方法进行了比较。我们发现,虽然基础LLM的性能较差至中等(零样本时的宏F1为0.2 - 0.6;少样本时的宏F1为0.2 - 0.74),但在微调后有显著改善(宏F1为0.75 - 0.86),一些微调后的LLM性能超过了tNLP方法。总体而言,使用微调后的LLM或tNLP能够可靠地检测出100种表型(F1>0.8)。为了生成一个可以与科学界和医学界共享的微调LLM,我们创建了一个完全合成的数据集,该数据集不含患者信息,但基于原始注释。我们在这些数据上对性能最佳的LLM进行了微调,创建了“Mistral-small-psych”,这是一个能够从西班牙语文本中检测精神疾病表型的LLM,其性能与在真实EHR数据上训练的LLM相当(宏F1 = 0.79)。最后,使用来自哥伦比亚一家大型精神病医院——安蒂奥基亚精神医院的数据对微调后的LLM进行了外部验证,结果表明大多数LLM具有良好的泛化能力(宏F1损失0.02 - 0.16分)。我们的研究强调了LLM进行特定领域适配的价值,并引入了一种用于准确识别西班牙语文本中精神疾病表型的新模型,为全球精准精神病学铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa19/12363723/d1d2aa8981ca/nihpp-2025.08.07.25333172v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa19/12363723/b0842a5bf01c/nihpp-2025.08.07.25333172v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa19/12363723/c42c01c8a5db/nihpp-2025.08.07.25333172v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa19/12363723/92292ec894a9/nihpp-2025.08.07.25333172v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa19/12363723/d1d2aa8981ca/nihpp-2025.08.07.25333172v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa19/12363723/b0842a5bf01c/nihpp-2025.08.07.25333172v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa19/12363723/c42c01c8a5db/nihpp-2025.08.07.25333172v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa19/12363723/92292ec894a9/nihpp-2025.08.07.25333172v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa19/12363723/d1d2aa8981ca/nihpp-2025.08.07.25333172v1-f0004.jpg

相似文献

1
Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records.用于从电子健康记录中提取精神疾病表型的大语言模型
medRxiv. 2025 Aug 12:2025.08.07.25333172. doi: 10.1101/2025.08.07.25333172.
2
Large Language Model Symptom Identification From Clinical Text: Multicenter Study.基于临床文本的大语言模型症状识别:多中心研究。
J Med Internet Res. 2025 Jul 31;27:e72984. doi: 10.2196/72984.
3
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
4
Extracting epilepsy-related information from unstructured clinic letters using large language models.使用大语言模型从非结构化临床信件中提取癫痫相关信息。
Epilepsia. 2025 Jul 10. doi: 10.1111/epi.18475.
5
Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.利用大语言模型检测医院获得性疾病:关于肺栓塞的实证研究
J Am Med Inform Assoc. 2025 May 1;32(5):876-884. doi: 10.1093/jamia/ocaf048.
6
Toward Cross-Hospital Deployment of Natural Language Processing Systems: Model Development and Validation of Fine-Tuned Large Language Models for Disease Name Recognition in Japanese.迈向自然语言处理系统的跨医院部署:用于日语疾病名称识别的微调大语言模型的模型开发与验证
JMIR Med Inform. 2025 Jul 8;13:e76773. doi: 10.2196/76773.
7
Detecting Stigmatizing Language in Clinical Notes with Large Language Models for Addiction Care.使用大语言模型在成瘾护理临床记录中检测污名化语言。
medRxiv. 2025 Aug 12:2025.08.08.25333315. doi: 10.1101/2025.08.08.25333315.
8
Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study.通过症状描述和总结调整大型语言模型以增强精神病学访谈:初步研究。
JMIR Form Res. 2024 Oct 24;8:e58418. doi: 10.2196/58418.
9
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
10
Performance of Open-Source Large Language Models in Psychiatry: Usability Study Through Comparative Analysis of Non-English Records and English Translations.开源大语言模型在精神病学中的表现:通过非英语记录与英语译文的对比分析进行可用性研究
J Med Internet Res. 2025 Aug 18;27:e69857. doi: 10.2196/69857.

本文引用的文献

1
Leveraging Natural Language Processing for Psychiatric Phenotyping from Spanish Electronic Health Records: Enabling the Investigation of Transdiagnostic Symptom Profiles at Scale.利用自然语言处理技术从西班牙电子健康记录中进行精神疾病表型分析:实现大规模跨诊断症状特征的研究。
Complex Psychiatry. 2025 Jun 7;11(1):99-112. doi: 10.1159/000546480. eCollection 2025 Jan-Dec.
2
Large Language Models and Text Embeddings for Detecting Depression and Suicide in Patient Narratives.用于在患者叙述中检测抑郁症和自杀倾向的大语言模型与文本嵌入技术
JAMA Netw Open. 2025 May 1;8(5):e2511922. doi: 10.1001/jamanetworkopen.2025.11922.
3
Enhancing medical coding efficiency through domain-specific fine-tuned large language models.
通过特定领域微调的大语言模型提高医学编码效率。
Npj Health Syst. 2025;2(1):14. doi: 10.1038/s44401-025-00018-3. Epub 2025 May 1.
4
Medical foundation large language models for comprehensive text analysis and beyond.用于综合文本分析及其他用途的医学基础大语言模型。
NPJ Digit Med. 2025 Mar 5;8(1):141. doi: 10.1038/s41746-025-01533-1.
5
Fine-tuning large language models for improved health communication in low-resource languages.微调大语言模型以改善低资源语言中的健康传播。
Comput Methods Programs Biomed. 2025 May;263:108655. doi: 10.1016/j.cmpb.2025.108655. Epub 2025 Feb 12.
6
Guardian-BERT: Early detection of self-injury and suicidal signs with language technologies in electronic health reports.Guardian-BERT:利用电子健康报告中的语言技术早期检测自我伤害和自杀迹象。
Comput Biol Med. 2025 Mar;186:109701. doi: 10.1016/j.compbiomed.2025.109701. Epub 2025 Jan 21.
7
Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.心理语言模型:通过在线文本数据利用大语言模型进行心理健康预测。
Proc ACM Interact Mob Wearable Ubiquitous Technol. 2024 Mar;8(1). doi: 10.1145/3643540. Epub 2024 Mar 6.
8
Scalable information extraction from free text electronic health records using large language models.使用大语言模型从自由文本电子健康记录中进行可扩展的信息提取。
BMC Med Res Methodol. 2025 Jan 28;25(1):23. doi: 10.1186/s12874-025-02470-z.
9
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
10
Predicting Diagnostic Conversion From Major Depressive Disorder to Bipolar Disorder: An EHR Based Study From Colombia.预测从重度抑郁症到双相情感障碍的诊断转换:一项基于哥伦比亚电子健康记录的研究
Bipolar Disord. 2025 Feb;27(1):47-56. doi: 10.1111/bdi.13512. Epub 2024 Dec 12.