De La Hoz Juan F, Frydman-Gani Clara, Arias Alejandro, Perez Vallejo Maria, Londoño Martínez John Daniel, Mena Laura, Seroussi Ariel, Service Susan K, Diaz-Zuluaga Ana M, Ramirez-Diaz Ana M, Valencia-Echeverry Johanna, Castaño Mauricio, Reus Victor I, Bui Alex A T, Freimer Nelson B, Lopez-Jaramillo Carlos, Olde Loohuis Loes M
Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.
Department of Mental Health and Human Behavior, University of Caldas, Manizales, Colombia.
Complex Psychiatry. 2025 Jun 7;11(1):99-112. doi: 10.1159/000546480. eCollection 2025 Jan-Dec.
Clinical notes in electronic health records offer valuable insight into the symptom profiles and trajectories of patients with severe mental illness (SMI). However, systematically extracting symptoms at scale remains a challenge, especially in languages other than English. We developed a light, accurate, and interpretable natural language processing (NLP) algorithm to extract psychiatric phenotypes from Spanish clinical notes.
We selected a set of 136 core psychiatric phenotypes and annotated 4,000 clinical note sections (e.g., Chief Complaint, Plan; called "documents") and 240 complete visit notes (called "entries") from two psychiatric hospitals in Colombia: Hospital Mental de Antioquia (HOMO) and Clínica San Juan de Dios Manizales (CSJDM). For phenotypes meeting frequency and inter-annotator reliability thresholds, we developed three NLP algorithms (HOMO, CSJDM, and COMBINED) for phenotype extraction and context labeling (e.g., negation, family history, uncertainty). We evaluated performance at the document and entry levels, as well as across hospitals.
Document-level performance at both hospitals was high (average F1 scores of 0.84 and 0.85). Moreover, on phenotypes meeting our document-level performance threshold of F1 ≥0.7, entry-level performance was high as well (average F1 of 0.75 and 0.78), as was the cross-hospital transportability of the algorithms (F1 of 0.75 HOMO-to-CSJDM and 0.77 CSJDM-to-HOMO). The COMBINED algorithm improved overall recall, without significantly decreasing precision (F1 of 0.78 and 0.77 on HOMO and CSJDM, respectively). The application of our algorithm for 50 high-performing phenotypes to the notes of 9,737 SMI patients highlighted the transdiagnostic nature of many core SMI phenotypes; 44/50 phenotypes were recorded in over 10% of patients across diagnoses. Multiple correspondence analysis further revealed variation in symptom space across diagnoses; while major depressive disorder and schizophrenia form distinct clusters, patients with bipolar disorder span the entire phenotypic spectrum.
Our tool enables the systematic investigation of psychiatric symptoms from psychiatric notes, facilitating large-scale investigations in Spanish-speaking populations.
电子健康记录中的临床记录为深入了解重度精神疾病(SMI)患者的症状特征和病程提供了有价值的见解。然而,大规模系统地提取症状仍然是一项挑战,尤其是在英语以外的语言中。我们开发了一种轻量级、准确且可解释的自然语言处理(NLP)算法,用于从西班牙语临床记录中提取精神疾病表型。
我们选择了一组136个核心精神疾病表型,并对来自哥伦比亚两家精神病医院的4000个临床记录部分(例如,主诉、诊疗计划;称为“文档”)和240份完整的就诊记录(称为“条目”)进行了注释:安蒂奥基亚精神医院(HOMO)和马尼萨莱斯圣胡安·迪奥斯诊所(CSJDM)。对于满足频率和注释者间可靠性阈值的表型,我们开发了三种NLP算法(HOMO、CSJDM和组合算法)用于表型提取和上下文标注(例如,否定、家族史、不确定性)。我们在文档和条目级别以及跨医院评估了性能。
两家医院在文档级别的性能都很高(平均F1分数分别为0.84和0.85)。此外,对于满足我们F1≥0.7的文档级别性能阈值的表型,条目级别的性能也很高(平均F1分别为0.75和0.78),算法的跨医院可移植性也很高(从HOMO到CSJDM的F1为0.75,从CSJDM到HOMO的F1为0.77)。组合算法提高了总体召回率,而没有显著降低精确率(在HOMO和CSJDM上的F1分别为0.78和0.77)。我们将算法应用于9737名SMI患者的记录中的50个高性能表型,突出了许多核心SMI表型的跨诊断性质;44/50个表型在超过10%的不同诊断患者中被记录。多重对应分析进一步揭示了不同诊断之间症状空间的差异;虽然重度抑郁症和精神分裂症形成了不同的聚类,但双相情感障碍患者跨越了整个表型谱。
我们的工具能够对精神病记录中的精神症状进行系统研究,有助于在讲西班牙语的人群中进行大规模调查。