Suppr
超能文献

利用计算语言学和机器学习检测青少年心理健康障碍的超高风险。

Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths.

作者信息

Kho Jordon Junyang, Song Shangzheng, Tan Samuel Ming Xuan, Fitriyah Nur Hikmah, Lokadjaja Matheus Calvin, Yee Jie Yin, Yang Zixu, Chen Eric Yu Hai, Lee Jimmy, Goh Wilson Wen Bin

机构信息

Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.

School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.

出版信息

Schizophrenia (Heidelb). 2025 Jul 15;11(1):98. doi: 10.1038/s41537-025-00649-3.

DOI:10.1038/s41537-025-00649-3

PMID:40664678

Abstract

Mental illnesses often manifest through behavioral changes, with speech serving as a key medium for expressing thoughts and emotions. The use of computational linguistics on speech data in mental illnesses is a promising approach to uncover objective biomarkers for the early detection of mental illnesses. This study analyzed speech transcripts from 80 youths at ultra-high risk of psychosis (UHR) and 329 healthy controls, examining text features such as sentiment variability, cohesion, lexical sophistication, morphology, syntactic sophistication, and lexical diversity. Factor analysis revealed five key linguistic themes: Sentiment Intensity and Variability, Linguistic Register Alignment, Phonographic Uniqueness and Recognizability, Morphological Complexity and Imageability, and Lexical Richness and Typicalness. Regression analysis indicated UHR speech is characterized by diminished sentiment variability (β = -0.07), deviation from linguistic registers (β = -0.16), fewer phonographic neighbors (β = -0.11), lower morphological complexity (β = -0.36), and more predictable lexical structures (β = 0.05). Optimized machine learning (ML) models trained on Boruta-selected features achieved a mean AUC of 0.70. Our findings highlight the potential of sentiment and linguistic analyses in speech for training ML models to aid in early detection and monitoring of mental health conditions.

摘要

精神疾病常常通过行为变化表现出来，言语是表达思想和情感的关键媒介。在精神疾病中对言语数据运用计算语言学是一种很有前景的方法，有助于发现用于早期检测精神疾病的客观生物标志物。本研究分析了80名超高危精神病青年（UHR）和329名健康对照者的言语记录，考察了诸如情感变异性、衔接性、词汇复杂性、形态学、句法复杂性和词汇多样性等文本特征。因子分析揭示了五个关键的语言主题：情感强度与变异性、语言语域一致性、语音独特性与可识别性、形态复杂性与形象性、词汇丰富性与典型性。回归分析表明，超高危精神病青年的言语具有情感变异性降低（β = -0.07）、偏离语言语域（β = -0.16）、语音邻接词较少（β = -0.11）、形态复杂性较低（β = -0.36）以及词汇结构更具可预测性（β = 0.05）的特点。在Boruta选择的特征上训练的优化机器学习（ML）模型的平均AUC为0.70。我们的研究结果突出了言语中的情感和语言分析在训练机器学习模型以辅助早期检测和监测心理健康状况方面的潜力。

相似文献

Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths.

Schizophrenia (Heidelb). 2025 Jul 15;11(1):98. doi: 10.1038/s41537-025-00649-3.

Artificial intelligence-driven natural language processing for identifying linguistic patterns in Alzheimer's disease and mild cognitive impairment: A study of lexical, syntactic, and cohesive features of speech through picture description tasks.

J Alzheimers Dis. 2025 Jul;106(1):120-138. doi: 10.1177/13872877251339756. Epub 2025 May 7.

Detecting schizophrenia, bipolar disorder, psychosis vulnerability and major depressive disorder from 5 minutes of online-collected speech.

Transl Psychiatry. 2025 Jul 12;15(1):241. doi: 10.1038/s41398-025-03433-0.

Detecting schizophrenia, bipolar disorder, psychosis vulnerability and major depressive disorder from 5 minutes of online-collected speech.

medRxiv. 2024 Sep 4:2024.09.03.24313020. doi: 10.1101/2024.09.03.24313020.

A systematic review on production and comprehension of linguistic prosody in people with acquired language and communication disorders resulting from unilateral brain lesions.

J Commun Disord. 2023 Jan-Feb;101:106298. doi: 10.1016/j.jcomdis.2022.106298. Epub 2023 Jan 7.

Technological aids for the rehabilitation of memory and executive functioning in children and adolescents with acquired brain injury.

Cochrane Database Syst Rev. 2016 Jul 1;7(7):CD011020. doi: 10.1002/14651858.CD011020.pub2.

Speech changes in old age: Methodological considerations for speech-based discrimination of healthy ageing and Alzheimer's disease.

Int J Lang Commun Disord. 2024 Jan-Feb;59(1):13-37. doi: 10.1111/1460-6984.12888. Epub 2023 May 4.

Suicide Risk Screening in Jails: Protocol for a Pilot Study Leveraging the Mental Health Research Network Algorithm and Health Care Data.

JMIR Res Protoc. 2025 Jun 25;14:e68517. doi: 10.2196/68517.

Screening for speech and language delay in preschool children: systematic evidence review for the US Preventive Services Task Force.

Pediatrics. 2006 Feb;117(2):e298-319. doi: 10.1542/peds.2005-1467.

Neonatal Nurses' Understanding of the Factors That Enhance and Hinder Early Communication Between Preterm Infants and Their Parents: A Narrative Inquiry Study.

Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70093. doi: 10.1111/1460-6984.70093.

本文引用的文献

From semantic concreteness to concretism in schizophrenia: An automated linguistic analysis of speech produced in figurative language interpretation.

Clin Linguist Phon. 2025 Feb 21:1-23. doi: 10.1080/02699206.2025.2451961.

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.

Proc ACM Interact Mob Wearable Ubiquitous Technol. 2024 Mar;8(1). doi: 10.1145/3643540. Epub 2024 Mar 6.

Automated linguistic analysis in youth at clinical high risk for psychosis.

Schizophr Res. 2024 Dec;274:121-128. doi: 10.1016/j.schres.2024.09.009. Epub 2024 Sep 17.

Development and temporal validation of a clinical prediction model of transition to psychosis in individuals at ultra-high risk in the UHR 1000+ cohort.

World Psychiatry. 2024 Oct;23(3):400-410. doi: 10.1002/wps.21240.

Insights Derived From Text-Based Digital Media, in Relation to Mental Health and Suicide Prevention, Using Data Analysis and Machine Learning: Systematic Review.

JMIR Ment Health. 2024 Jun 27;11:e55747. doi: 10.2196/55747.

Emotional tone in clinical high risk for psychosis: novel insights from a natural language analysis approach.

Front Psychiatry. 2024 May 13;15:1389597. doi: 10.3389/fpsyt.2024.1389597. eCollection 2024.

The Tool for Automatic Measurement of Morphological Information (TAMMI).

Behav Res Methods. 2024 Sep;56(6):5918-5929. doi: 10.3758/s13428-023-02324-w. Epub 2023 Dec 29.

Emotional Variance Analysis: A new sentiment analysis feature set for Artificial Intelligence and Machine Learning applications.

PLoS One. 2023 Jan 12;18(1):e0274299. doi: 10.1371/journal.pone.0274299. eCollection 2023.

A systematic review and meta-analysis on the prevalence of mental disorders among children and adolescents in Europe.

Eur Child Adolesc Psychiatry. 2024 Sep;33(9):2877-2894. doi: 10.1007/s00787-022-02131-2. Epub 2022 Dec 30.

Trends in U.S. Depression Prevalence From 2015 to 2020: The Widening Treatment Gap.

Am J Prev Med. 2022 Nov;63(5):726-733. doi: 10.1016/j.amepre.2022.05.014. Epub 2022 Sep 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

利用计算语言学和机器学习检测青少年心理健康障碍的超高风险。

Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译