Fraser Kathleen C, Meltzer Jed A, Rudzicz Frank
Department of Computer Science, University of Toronto, Toronto, Canada.
Rotman Research Institute, Toronto, Canada.
J Alzheimers Dis. 2016;49(2):407-22. doi: 10.3233/JAD-150520.
Although memory impairment is the main symptom of Alzheimer's disease (AD), language impairment can be an important marker. Relatively few studies of language in AD quantify the impairments in connected speech using computational techniques.
We aim to demonstrate state-of-the-art accuracy in automatically identifying Alzheimer's disease from short narrative samples elicited with a picture description task, and to uncover the salient linguistic factors with a statistical factor analysis.
Data are derived from the DementiaBank corpus, from which 167 patients diagnosed with "possible" or "probable" AD provide 240 narrative samples, and 97 controls provide an additional 233. We compute a number of linguistic variables from the transcripts, and acoustic variables from the associated audio files, and use these variables to train a machine learning classifier to distinguish between participants with AD and healthy controls. To examine the degree of heterogeneity of linguistic impairments in AD, we follow an exploratory factor analysis on these measures of speech and language with an oblique promax rotation, and provide interpretation for the resulting factors.
We obtain state-of-the-art classification accuracies of over 81% in distinguishing individuals with AD from those without based on short samples of their language on a picture description task. Four clear factors emerge: semantic impairment, acoustic abnormality, syntactic impairment, and information impairment.
Modern machine learning and linguistic analysis will be increasingly useful in assessment and clustering of suspected AD.
尽管记忆障碍是阿尔茨海默病(AD)的主要症状,但语言障碍也可能是一个重要标志。相对较少的关于AD语言的研究使用计算技术来量化连贯言语中的损伤。
我们旨在展示在通过图片描述任务引出的简短叙述样本中自动识别阿尔茨海默病的最新准确性,并通过统计因子分析揭示突出的语言因素。
数据来自痴呆症语料库,其中167名被诊断为“可能”或“很可能”AD的患者提供了240个叙述样本,97名对照者又提供了233个样本。我们从转录本中计算了一些语言变量,并从相关音频文件中计算了声学变量,并使用这些变量训练一个机器学习分类器,以区分AD患者和健康对照者。为了检查AD中语言损伤的异质性程度,我们对这些言语和语言测量进行了探索性因子分析,并采用斜交普罗麦克斯旋转,对所得因子进行解释。
基于图片描述任务中个体语言的简短样本,我们在区分AD患者和非AD患者方面获得了超过81%的最新分类准确率。出现了四个明显的因素:语义损伤、声学异常、句法损伤和信息损伤。
现代机器学习和语言分析在疑似AD的评估和聚类中将越来越有用。