Suppr超能文献

利用语言模型从连续语音中检测阿尔茨海默病

Detecting Alzheimer's Disease from Continuous Speech Using Language Models.

机构信息

National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, China.

Department of Neurology, Shanghai Tongji Hospital, Tongji University School of Medicine, Shanghai, China.

出版信息

J Alzheimers Dis. 2019;70(4):1163-1174. doi: 10.3233/JAD-190452.

Abstract

BACKGROUND

Recently, many studies have been carried out to detect Alzheimer's disease (AD) from continuous speech by linguistic analysis and modeling. However, few of them utilize language models (LMs) to extract linguistic features and to investigate the lexical-level differences between AD and healthy speech.

OBJECTIVE

Our goals include obtaining state-of-art performance of automatic AD detection, emphasizing N-gram LMs as powerful tools for distinguishing AD patients' narratives from those of healthy controls, and discovering the differences of lexical usages between AD patients and healthy people.

METHOD

We utilize a subset of the DementiaBank corpus, including 242 control samples from 99 control participants and 256 AD samples from 169 "PossibleAD" or "ProbableAD" participants. Baseline models are built through area under curve-based feature selection and using five machine learning algorithms for comparison. Perplexity features are extracted using LMs to build enhanced detection models. Finally, the differences of lexical usages between AD patients and healthy people are investigated by a proportion test based on unigram probabilities.

RESULTS

Our baseline model obtains a detection accuracy of 80.7%. This accuracy increases to 85.4% after integrating the perplexity features derived from LMs. Further investigations show that AD patients tend to use more general, less informative, and less accurate words to describe characters and actions than healthy controls.

CONCLUSION

The perplexity features extracted by LMs can benefit the automatic AD detection from continuous speech. There exist lexical-level differences between AD and healthy speech that can be captured by statistical N-gram LMs.

摘要

背景

最近,许多研究通过语言分析和建模来从连续语音中检测阿尔茨海默病(AD)。然而,很少有研究利用语言模型(LM)提取语言特征,并研究 AD 与健康语音之间的词汇水平差异。

目的

我们的目标包括获得 AD 自动检测的最新性能,强调 N 元语言模型(N-gram LMs)作为区分 AD 患者叙述和健康对照者叙述的有力工具,并发现 AD 患者和健康人之间词汇用法的差异。

方法

我们利用 DementiaBank 语料库的一个子集,包括 99 名健康对照者中的 242 名对照样本和 169 名“可能 AD”或“可能 AD”参与者中的 256 名 AD 样本。通过基于曲线下面积的特征选择和使用五种机器学习算法进行比较来构建基线模型。利用语言模型提取困惑度特征来构建增强检测模型。最后,通过基于一元概率的比例检验研究 AD 患者和健康人之间词汇用法的差异。

结果

我们的基线模型的检测准确率为 80.7%。将 LMs 得出的困惑度特征整合后,准确率提高到 85.4%。进一步的研究表明,AD 患者在描述人物和动作时,倾向于使用更一般、信息量更少、准确性更低的词汇,而健康对照者则倾向于使用更具体、信息量更多、准确性更高的词汇。

结论

LM 提取的困惑度特征可用于从连续语音中进行 AD 自动检测。AD 与健康语音之间存在词汇水平差异,可以通过统计 N 元语言模型捕捉到。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验