用于阿尔茨海默病痴呆症语音自动分类的自发独白与对话之间的跨语料库特征学习

Cross-corpus Feature Learning between Spontaneous Monologue and Dialogue for Automatic Classification of Alzheimer's Dementia Speech.

作者信息

la Fuente Garcia Sofia de, Haider Fasih, Luz Saturnino

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5851-5855. doi: 10.1109/EMBC44109.2020.9176305.

DOI:10.1109/EMBC44109.2020.9176305

PMID:33019304

Abstract

Speech analysis could help develop clinical tools for automatic detection of Alzheimer's disease and monitoring of its progression. However, datasets containing both clinical information and spontaneous speech suitable for statistical learning are relatively scarce. In addition, speech data are often collected under different conditions, such as monologue and dialogue recording protocols. Therefore, there is a need for methods to allow the combination of these scarce resources. In this paper, we propose two feature extraction and representation models, based on neural networks and trained on monologue and dialogue data recorded in clinical settings. These models are evaluated not only for AD recognition, but also with respect to their potential to generalise across both datasets. They provide good results when trained and tested on the same data set (72.56% UAR for monologue data and 85.21% for dialogue). A decrease in UAR is observed in transfer training, where feature extraction models trained on dialogues provide better average UAR on monologues (63.72%) than the other way around (58.94%). When the choice of classifiers is independent of feature extraction, transfer from monologue models to dialogues result in a maximum UAR of 81.04% and transfer from dialogue features to monologue achieve a maximum UAR of 70.73%, evidencing the generalisability of the feature model.

摘要

语音分析有助于开发用于自动检测阿尔茨海默病及其病情进展监测的临床工具。然而，包含适合统计学习的临床信息和自然语音的数据集相对较少。此外，语音数据通常在不同条件下收集，例如独白和对话记录协议。因此，需要一些方法来整合这些稀缺资源。在本文中，我们提出了两种基于神经网络的特征提取和表示模型，并在临床环境中记录的独白和对话数据上进行训练。这些模型不仅针对阿尔茨海默病识别进行评估，还评估了它们在两个数据集上进行泛化的潜力。当在同一数据集上进行训练和测试时，它们取得了良好的结果（独白数据的未加权平均召回率为72.56%，对话数据为85.21%）。在迁移训练中观察到未加权平均召回率有所下降，其中在对话数据上训练的特征提取模型在独白数据上提供的平均未加权平均召回率（63.72%）比相反情况（58.94%）更好。当分类器的选择与特征提取无关时，从独白模型迁移到对话的未加权平均召回率最高为81.04%，从对话特征迁移到独白的未加权平均召回率最高为70.73%，证明了特征模型的可泛化性。