Chlasta Karol, Wołk Krzysztof
Department of Computer Science, Polish-Japanese Academy of Information Technology, Warsaw, Poland.
Institute of Psychology, SWPS University of Social Sciences and Humanities, Warsaw, Poland.
Front Psychol. 2021 Feb 12;11:623237. doi: 10.3389/fpsyg.2020.623237. eCollection 2020.
Dementia, a prevalent disorder of the brain, has negative effects on individuals and society. This paper concerns using Spontaneous Speech (ADReSS) Challenge of Interspeech 2020 to classify Alzheimer's dementia. We used (1) VGGish, a deep, pretrained, Tensorflow model as an audio feature extractor, and Scikit-learn classifiers to detect signs of dementia in speech. Three classifiers (LinearSVM, Perceptron, 1NN) were 59.1% accurate, which was 3% above the best-performing baseline models trained on the acoustic features used in the challenge. We also proposed (2) DemCNN, a new PyTorch raw waveform-based convolutional neural network model that was 63.6% accurate, 7% more accurate then the best-performing baseline linear discriminant analysis model. We discovered that audio transfer learning with a pretrained VGGish feature extractor performs better than the baseline approach using automatically extracted acoustic features. Our DepCNN exhibits good generalization capabilities. Both methods presented in this paper offer progress toward new, innovative, and more effective computer-based screening of dementia through spontaneous speech.
痴呆症是一种常见的脑部疾病,对个人和社会都有负面影响。本文关注利用2020年国际语音通信协会(Interspeech)的自发语音(ADReSS)挑战赛对阿尔茨海默病性痴呆进行分类。我们使用(1)VGGish,一个深度预训练的Tensorflow模型作为音频特征提取器,以及Scikit-learn分类器来检测语音中的痴呆迹象。三种分类器(线性支持向量机、感知器、1近邻)的准确率为59.1%,比在挑战赛中使用的声学特征上训练的最佳性能基线模型高出3%。我们还提出了(2)DemCNN,一种基于PyTorch原始波形的新型卷积神经网络模型,其准确率为63.6%,比最佳性能基线线性判别分析模型高出7%。我们发现,使用预训练的VGGish特征提取器进行音频迁移学习比使用自动提取的声学特征的基线方法表现更好。我们的DemCNN具有良好的泛化能力。本文提出的两种方法都朝着通过自发语音进行新型、创新且更有效的基于计算机的痴呆症筛查取得了进展。