• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用自然语言处理技术从西班牙电子健康记录中进行精神疾病表型分析:实现大规模跨诊断症状特征的研究。

Leveraging Natural Language Processing for Psychiatric Phenotyping from Spanish Electronic Health Records: Enabling the Investigation of Transdiagnostic Symptom Profiles at Scale.

作者信息

De La Hoz Juan F, Frydman-Gani Clara, Arias Alejandro, Perez Vallejo Maria, Londoño Martínez John Daniel, Mena Laura, Seroussi Ariel, Service Susan K, Diaz-Zuluaga Ana M, Ramirez-Diaz Ana M, Valencia-Echeverry Johanna, Castaño Mauricio, Reus Victor I, Bui Alex A T, Freimer Nelson B, Lopez-Jaramillo Carlos, Olde Loohuis Loes M

机构信息

Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.

Department of Mental Health and Human Behavior, University of Caldas, Manizales, Colombia.

出版信息

Complex Psychiatry. 2025 Jun 7;11(1):99-112. doi: 10.1159/000546480. eCollection 2025 Jan-Dec.

DOI:10.1159/000546480
PMID:40677833
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12266705/
Abstract

INTRODUCTION

Clinical notes in electronic health records offer valuable insight into the symptom profiles and trajectories of patients with severe mental illness (SMI). However, systematically extracting symptoms at scale remains a challenge, especially in languages other than English. We developed a light, accurate, and interpretable natural language processing (NLP) algorithm to extract psychiatric phenotypes from Spanish clinical notes.

METHODS

We selected a set of 136 core psychiatric phenotypes and annotated 4,000 clinical note sections (e.g., Chief Complaint, Plan; called "documents") and 240 complete visit notes (called "entries") from two psychiatric hospitals in Colombia: Hospital Mental de Antioquia (HOMO) and Clínica San Juan de Dios Manizales (CSJDM). For phenotypes meeting frequency and inter-annotator reliability thresholds, we developed three NLP algorithms (HOMO, CSJDM, and COMBINED) for phenotype extraction and context labeling (e.g., negation, family history, uncertainty). We evaluated performance at the document and entry levels, as well as across hospitals.

RESULTS

Document-level performance at both hospitals was high (average F1 scores of 0.84 and 0.85). Moreover, on phenotypes meeting our document-level performance threshold of F1 ≥0.7, entry-level performance was high as well (average F1 of 0.75 and 0.78), as was the cross-hospital transportability of the algorithms (F1 of 0.75 HOMO-to-CSJDM and 0.77 CSJDM-to-HOMO). The COMBINED algorithm improved overall recall, without significantly decreasing precision (F1 of 0.78 and 0.77 on HOMO and CSJDM, respectively). The application of our algorithm for 50 high-performing phenotypes to the notes of 9,737 SMI patients highlighted the transdiagnostic nature of many core SMI phenotypes; 44/50 phenotypes were recorded in over 10% of patients across diagnoses. Multiple correspondence analysis further revealed variation in symptom space across diagnoses; while major depressive disorder and schizophrenia form distinct clusters, patients with bipolar disorder span the entire phenotypic spectrum.

CONCLUSION

Our tool enables the systematic investigation of psychiatric symptoms from psychiatric notes, facilitating large-scale investigations in Spanish-speaking populations.

摘要

引言

电子健康记录中的临床记录为深入了解重度精神疾病(SMI)患者的症状特征和病程提供了有价值的见解。然而,大规模系统地提取症状仍然是一项挑战,尤其是在英语以外的语言中。我们开发了一种轻量级、准确且可解释的自然语言处理(NLP)算法,用于从西班牙语临床记录中提取精神疾病表型。

方法

我们选择了一组136个核心精神疾病表型,并对来自哥伦比亚两家精神病医院的4000个临床记录部分(例如,主诉、诊疗计划;称为“文档”)和240份完整的就诊记录(称为“条目”)进行了注释:安蒂奥基亚精神医院(HOMO)和马尼萨莱斯圣胡安·迪奥斯诊所(CSJDM)。对于满足频率和注释者间可靠性阈值的表型,我们开发了三种NLP算法(HOMO、CSJDM和组合算法)用于表型提取和上下文标注(例如,否定、家族史、不确定性)。我们在文档和条目级别以及跨医院评估了性能。

结果

两家医院在文档级别的性能都很高(平均F1分数分别为0.84和0.85)。此外,对于满足我们F1≥0.7的文档级别性能阈值的表型,条目级别的性能也很高(平均F1分别为0.75和0.78),算法的跨医院可移植性也很高(从HOMO到CSJDM的F1为0.75,从CSJDM到HOMO的F1为0.77)。组合算法提高了总体召回率,而没有显著降低精确率(在HOMO和CSJDM上的F1分别为0.78和0.77)。我们将算法应用于9737名SMI患者的记录中的50个高性能表型,突出了许多核心SMI表型的跨诊断性质;44/50个表型在超过10%的不同诊断患者中被记录。多重对应分析进一步揭示了不同诊断之间症状空间的差异;虽然重度抑郁症和精神分裂症形成了不同的聚类,但双相情感障碍患者跨越了整个表型谱。

结论

我们的工具能够对精神病记录中的精神症状进行系统研究,有助于在讲西班牙语的人群中进行大规模调查。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/506c3c84bb83/cxp-2025-0011-0001-546480_F06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/b71621e1146d/cxp-2025-0011-0001-546480_F01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/8528f0ee8d2a/cxp-2025-0011-0001-546480_F02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/e1d1f8e77d84/cxp-2025-0011-0001-546480_F03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/c554c1e9709c/cxp-2025-0011-0001-546480_F04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/52234216f6c1/cxp-2025-0011-0001-546480_F05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/506c3c84bb83/cxp-2025-0011-0001-546480_F06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/b71621e1146d/cxp-2025-0011-0001-546480_F01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/8528f0ee8d2a/cxp-2025-0011-0001-546480_F02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/e1d1f8e77d84/cxp-2025-0011-0001-546480_F03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/c554c1e9709c/cxp-2025-0011-0001-546480_F04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/52234216f6c1/cxp-2025-0011-0001-546480_F05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d79/12266705/506c3c84bb83/cxp-2025-0011-0001-546480_F06.jpg

相似文献

1
Leveraging Natural Language Processing for Psychiatric Phenotyping from Spanish Electronic Health Records: Enabling the Investigation of Transdiagnostic Symptom Profiles at Scale.利用自然语言处理技术从西班牙电子健康记录中进行精神疾病表型分析:实现大规模跨诊断症状特征的研究。
Complex Psychiatry. 2025 Jun 7;11(1):99-112. doi: 10.1159/000546480. eCollection 2025 Jan-Dec.
2
Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study.在大型综合医疗保健系统中使用混合自然语言处理方法从电子健康记录中识别哮喘相关症状:回顾性研究
JMIR AI. 2025 May 2;4:e69132. doi: 10.2196/69132.
3
Language Models for Multilabel Document Classification of Surgical Concepts in Exploratory Laparotomy Operative Notes: Algorithm Development Study.用于探索性剖腹手术记录中手术概念多标签文档分类的语言模型:算法开发研究
JMIR Med Inform. 2025 Jul 9;13:e71176. doi: 10.2196/71176.
4
Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study.西班牙电子健康记录中射血分数保留的心力衰竭症状检测语言模型的多标准优化:比较建模研究
J Med Internet Res. 2025 Jul 17;27:e76433. doi: 10.2196/76433.
5
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.
6
Harnessing Moderate-Sized Language Models for Reliable Patient Data Deidentification in Emergency Department Records: Algorithm Development, Validation, and Implementation Study.利用中等规模语言模型对急诊科记录中的患者数据进行可靠去识别:算法开发、验证与实施研究。
JMIR AI. 2025 Apr 1;4:e57828. doi: 10.2196/57828.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
8
The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.生成式预训练变换器4(GPT-4)分析三种不同语言医学笔记的潜力:一项回顾性模型评估研究。
Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.
9
Detecting schizophrenia, bipolar disorder, psychosis vulnerability and major depressive disorder from 5 minutes of online-collected speech.通过5分钟在线收集的语音检测精神分裂症、双相情感障碍、精神病易感性和重度抑郁症。
Transl Psychiatry. 2025 Jul 12;15(1):241. doi: 10.1038/s41398-025-03433-0.
10
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

引用本文的文献

1
Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records.用于从电子健康记录中提取精神疾病表型的大语言模型
medRxiv. 2025 Aug 12:2025.08.07.25333172. doi: 10.1101/2025.08.07.25333172.

本文引用的文献

1
Characterisation of serious mental illness trajectories through transdiagnostic clinical features.通过跨诊断临床特征对严重精神疾病轨迹进行特征描述。
Br J Psychiatry. 2025 Jun 23:1-8. doi: 10.1192/bjp.2025.107.
2
Natural language processing to identify suicidal ideation and anhedonia in major depressive disorder.利用自然语言处理技术识别重度抑郁症中的自杀意念和快感缺乏。
BMC Med Inform Decis Mak. 2025 Jan 13;25(1):20. doi: 10.1186/s12911-025-02851-w.
3
Large language models to identify social determinants of health in electronic health records.
利用大语言模型识别电子健康记录中的健康社会决定因素。
NPJ Digit Med. 2024 Jan 11;7(1):6. doi: 10.1038/s41746-023-00970-0.
4
Combining clinical notes with structured electronic health records enhances the prediction of mental health crises.将临床笔记与结构化电子健康记录相结合,可以提高心理健康危机预测的准确性。
Cell Rep Med. 2023 Nov 21;4(11):101260. doi: 10.1016/j.xcrm.2023.101260. Epub 2023 Oct 31.
5
Detecting changes in the performance of a clinical machine learning tool over time.检测临床机器学习工具性能随时间的变化。
EBioMedicine. 2023 Nov;97:104823. doi: 10.1016/j.ebiom.2023.104823. Epub 2023 Oct 2.
6
Large AI Models in Health Informatics: Applications, Challenges, and the Future.大语言模型在健康信息学中的应用、挑战与未来
IEEE J Biomed Health Inform. 2023 Dec;27(12):6074-6087. doi: 10.1109/JBHI.2023.3316750. Epub 2023 Dec 5.
7
Transformers for extracting breast cancer information from Spanish clinical narratives.从西班牙语临床叙述中提取乳腺癌信息的转换器。
Artif Intell Med. 2023 Sep;143:102625. doi: 10.1016/j.artmed.2023.102625. Epub 2023 Jul 13.
8
Lexical stability of psychiatric clinical notes from electronic health records over a decade.十年间电子健康记录中精神科临床笔记的词汇稳定性
Acta Neuropsychiatr. 2023 Aug 25;37:e16. doi: 10.1017/neu.2023.46.
9
An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C).基于电子健康记录的临床研究的开放自然语言处理 (NLP) 框架:使用国家 COVID 队列协作 (N3C) 的案例展示。
J Am Med Inform Assoc. 2023 Nov 17;30(12):2036-2040. doi: 10.1093/jamia/ocad134.
10
Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction.评估数据漂移对临床脓毒症预测中使用的机器学习模型性能的影响。
Int J Med Inform. 2023 May;173:104930. doi: 10.1016/j.ijmedinf.2022.104930. Epub 2022 Nov 19.