Nyongesa Cynthia A, Hogarth Mike, Pa Judy
Alzheimer's Disease Cooperative Study (ADCS), Department of Neurosciences, University of California, San Diego, CA, USA.
Division of Biomedical Informatics, Department of Medicine, University of California, San Diego, CA, USA.
J Alzheimers Dis. 2025 Jul;106(1):120-138. doi: 10.1177/13872877251339756. Epub 2025 May 7.
BackgroundLanguage deficits often occur early in the neurodegenerative process, yet traditional methods frequently fail to detect subtle changes. Natural language processing (NLP) offers a novel approach to identifying linguistic patterns associated with cognitive impairment.ObjectiveWe aimed to analyze linguistic features that differentiate cognitively unimpaired (CU), mild cognitive impairment (MCI), and Alzheimer's disease (AD) groups.MethodsData was extracted from picture description tasks performed by 336 participants in the DementiaBank datasets. 53 linguistic features aggregated into 4 categories: lexical, structural, syntactic, and discourse domains, were identified using NLP toolkits. With normal diagnostic cutoffs, cognitive function was evaluated with the Mini-Mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA).ResultsWith age and education as covariates, ANOVA and post-hoc Tukey's HSD tests revealed that linguistic features such as pronoun usage, syntactic complexity, and lexical sophistication showed significant differences between CU, MCI, and AD groups (p < 0.05). Notably, past tense and personal references were higher in AD than both CU and MCI (p < 0.001), while pronoun usage differed between AD and CU (p < 0.0001). Correlations indicated that higher pronoun rates and lower syntactic complexity were associated with lower MMSE scores and although some features like conjunctions and determiners approached significance, they lacked consistent differentiation.ConclusionsWith the growing adoption of artificial intelligence (AI)-based scribing, these results emphasize the potential of targeted linguistic analysis as a digital biomarker to enable continuous screening for cognitive impairment.
背景
语言缺陷常在神经退行性病变过程的早期出现,但传统方法常常难以检测到细微变化。自然语言处理(NLP)提供了一种识别与认知障碍相关语言模式的新方法。
目的
我们旨在分析能够区分认知未受损(CU)、轻度认知障碍(MCI)和阿尔茨海默病(AD)组的语言特征。
方法
数据从痴呆症数据库中336名参与者执行的图片描述任务中提取。使用NLP工具包识别了聚合为4类的53种语言特征:词汇、结构、句法和语篇领域。采用正常诊断临界值,通过简易精神状态检查表(MMSE)和蒙特利尔认知评估量表(MoCA)评估认知功能。
结果
以年龄和教育程度作为协变量,方差分析和事后Tukey's HSD检验显示,诸如代词使用、句法复杂性和词汇复杂性等语言特征在CU、MCI和AD组之间存在显著差异(p < 0.05)。值得注意的是,AD组的过去时态和人称指代高于CU组和MCI组(p < 0.001),而AD组和CU组之间的代词使用存在差异(p < 0.0001)。相关性表明,较高的代词使用率和较低的句法复杂性与较低的MMSE得分相关,尽管诸如连词和限定词等一些特征接近显著水平,但它们缺乏一致的区分性。
结论
随着基于人工智能(AI)的抄写记录越来越广泛的应用,这些结果强调了靶向语言分析作为一种数字生物标志物用于持续筛查认知障碍的潜力。