Department of Biomedical Engineering, College of Engineering, Shantou University, Shantou 515041, China.
Sensors (Basel). 2021 Apr 7;21(8):2582. doi: 10.3390/s21082582.
Speech assessment is an essential part of the rehabilitation procedure for patients with aphasia (PWA). It is a comprehensive and time-consuming process that aims to discriminate between healthy individuals and aphasic patients, determine the type of aphasia syndrome, and determine the patients' impairment severity levels (these are referred to here as aphasia assessment tasks). Hence, the automation of aphasia assessment tasks is essential. In this study, the performance of three automatic speech assessment models based on the speech was investigated. Three types of datasets were used: healthy subjects' dataset, aphasic patients' dataset, and a combination of healthy and aphasic datasets. Two machine learning (ML)-based frameworks, classical machine learning (CML) and deep neural network (DNN), were considered in the design of the proposed speech assessment models. In this paper, the DNN-based framework was based on a convolutional neural network (CNN). Direct or indirect transformation of these models to achieve the aphasia assessment tasks was investigated. Comparative performance results for each of the speech assessment models showed that quadrature-based high-resolution time-frequency images with a CNN framework outperformed all the CML frameworks over the three dataset-types. The CNN-based framework reported an accuracy of 99.23 ± 0.003% with the healthy individuals' dataset and 67.78 ± 0.047% with the aphasic patients' dataset. Moreover, direct or transformed relationships between the proposed speech assessment models and the aphasia assessment tasks are attainable, given a suitable dataset-type, a reasonably sized dataset, and appropriate decision logic in the ML framework.
言语评估是失语症患者(PWA)康复程序的重要组成部分。这是一个全面且耗时的过程,旨在区分健康个体和失语症患者,确定失语症综合征的类型,并确定患者的损伤严重程度级别(此处称为失语症评估任务)。因此,自动化失语症评估任务至关重要。在这项研究中,研究了基于语音的三种自动言语评估模型的性能。使用了三种类型的数据集:健康受试者数据集、失语症患者数据集以及健康和失语症数据集的组合。在设计所提出的语音评估模型时,考虑了两种机器学习(ML)框架,即经典机器学习(CML)和深度神经网络(DNN)。在本文中,基于 DNN 的框架基于卷积神经网络(CNN)。研究了这些模型的直接或间接转换,以实现失语症评估任务。每个语音评估模型的比较性能结果表明,基于正交的高分辨率时频图像和基于 CNN 的框架在三种数据集类型上均优于所有 CML 框架。基于 CNN 的框架在健康个体数据集上的准确率为 99.23±0.003%,在失语症患者数据集上的准确率为 67.78±0.047%。此外,在适当的数据集类型、合理大小的数据集以及 ML 框架中的适当决策逻辑的情况下,可以实现所提出的语音评估模型与失语症评估任务之间的直接或转换关系。