Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:60-64. doi: 10.1109/EMBC48229.2022.9871531.
Generally, those patients with dysarthria utter a distorted sound and the restrained intelligibility of a speech for both human and machine. To enhance the intelligibility of dysarthric speech, we applied a deep learning-based speech enhancement (SE) system in this task. Conventional SE approaches are used for shrinking noise components from the noise-corrupted input, and thus improve the sound quality and intelligibility simultaneously. In this study, we are focusing on reconstructing the severely distorted signal from the dysarthric speech for improving intelligibility. The proposed SE system prepares a convolutional neural network (CNN) model in the training phase, which is then used to process the dysarthric speech in the testing phase. During training, paired dysarthric-normal speech utterances are required. We adopt a dynamic time warping technique to align the dysarthric-normal utter-ances. The gained training data are used to train a CNN - based SE model. The proposed SE system is evaluated on the Google automatic speech recognition (ASR) system and a subjective listening test. The results showed that the proposed method could notably enhance the recognition performance for more than 10% in each of ASR and human recognitions from the unprocessed dysarthric speech. Clinical Relevance- This study enhances the intelligibility and ASR accuracy from a dysarthria speech to more than 10.
一般来说,那些患有构音障碍的患者会发出扭曲的声音,并且人机对话的可理解度都受到限制。为了提高构音障碍语音的可理解度,我们在这项任务中应用了基于深度学习的语音增强(SE)系统。传统的 SE 方法用于从噪声污染的输入中缩小噪声分量,从而同时提高声音质量和可理解度。在这项研究中,我们专注于从构音障碍语音中重建严重失真的信号,以提高可理解度。所提出的 SE 系统在训练阶段准备卷积神经网络(CNN)模型,然后在测试阶段用于处理构音障碍语音。在训练期间,需要配对的构音障碍-正常语音语句。我们采用动态时间规整技术对齐构音障碍-正常语音语句。获得的训练数据用于训练基于 CNN 的 SE 模型。所提出的 SE 系统在 Google 自动语音识别(ASR)系统和主观听力测试上进行了评估。结果表明,与未经处理的构音障碍语音相比,该方法可以显著提高超过 10%的 ASR 和人类识别的识别性能。临床相关性-本研究提高了构音障碍语音的可理解度和 ASR 准确性,超过 10。