言语视觉：基于端到端深度学习的构音障碍自动语音识别系统。

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System.

出版信息

IEEE Trans Neural Syst Rehabil Eng. 2021;29:852-861. doi: 10.1109/TNSRE.2021.3076778. Epub 2021 May 7.

DOI:10.1109/TNSRE.2021.3076778

Abstract

Dysarthria is a disorder that affects an individual's speech intelligibility due to the paralysis of muscles and organs involved in the articulation process. As the condition is often associated with physically debilitating disabilities, not only do such individuals face communication problems, but also interactions with digital devices can become a burden. For these individuals, automatic speech recognition (ASR) technologies can make a significant difference in their lives as computing and portable digital devices can become an interaction medium, enabling them to communicate with others and computers. However, ASR technologies have performed poorly in recognizing dysarthric speech, especially for severe dysarthria, due to multiple challenges facing dysarthric ASR systems. We identified these challenges are due to the alternation and inaccuracy of dysarthric phonemes, the scarcity of dysarthric speech data, and the phoneme labeling imprecision. This paper reports on our second dysarthric-specific ASR system, called Speech Vision (SV) that tackles these challenges by adopting a novel approach towards dysarthric ASR in which speech features are extracted visually, then SV learns to see the shape of the words pronounced by dysarthric individuals. This visual acoustic modeling feature of SV eliminates phoneme-related challenges. To address the data scarcity problem, SV adopts visual data augmentation techniques, generates synthetic dysarthric acoustic visuals, and leverages transfer learning. Benchmarking with other state-of-the-art dysarthric ASR considered in this study, SV outperformed them by improving recognition accuracies for 67% of UA-Speech speakers, where the biggest improvements were achieved for severe dysarthria.

摘要

构音障碍是一种由于参与发音过程的肌肉和器官瘫痪而影响个体言语清晰度的疾病。由于这种情况通常与身体残疾有关，因此这些患者不仅面临沟通问题，而且与数字设备的交互也可能成为负担。对于这些患者来说，自动语音识别 (ASR) 技术可以极大地改善他们的生活，因为计算和便携式数字设备可以成为交互媒介，使他们能够与他人和计算机进行交流。然而，由于构音障碍 ASR 系统面临着多种挑战，ASR 技术在识别构音障碍语音方面表现不佳，尤其是对于严重的构音障碍。我们发现这些挑战归因于构音障碍音素的变化和不准确、构音障碍语音数据的稀缺以及音素标注的不精确。本文介绍了我们的第二个专门针对构音障碍的 ASR 系统，称为 Speech Vision (SV)，它通过采用一种新颖的方法来解决这些挑战，即通过视觉方式提取语音特征，然后让 SV 学习识别构音障碍患者发音的单词形状。SV 的这种视觉声学建模功能消除了与音素相关的挑战。为了解决数据稀缺问题，SV 采用了视觉数据增强技术，生成了合成的构音障碍声学视觉，并利用了迁移学习。与本研究中考虑的其他最先进的构音障碍 ASR 进行基准测试，SV 通过提高 67%的 UA-Speech 说话者的识别准确率超越了它们，其中严重构音障碍的识别准确率提高幅度最大。

相似文献

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System.言语视觉：基于端到端深度学习的构音障碍自动语音识别系统。

IEEE Trans Neural Syst Rehabil Eng. 2021;29:852-861. doi: 10.1109/TNSRE.2021.3076778. Epub 2021 May 7.

Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System.构音障碍语音转换器：一种序列到序列的构音障碍语音识别系统。

IEEE Trans Neural Syst Rehabil Eng. 2023;31:3407-3416. doi: 10.1109/TNSRE.2023.3307020. Epub 2023 Aug 29.

The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance.构音障碍性言语中的感知障碍与自动语音识别性能之间的关系。

J Acoust Soc Am. 2016 Nov;140(5):EL416. doi: 10.1121/1.4967208.

Vocal tract representation in the recognition of cerebral palsied speech.声道特征在脑瘫语音识别中的应用。

J Speech Lang Hear Res. 2012 Aug;55(4):1190-207. doi: 10.1044/1092-4388(2011/11-0223). Epub 2012 Jan 23.

Speech technology-based assessment of phoneme intelligibility in dysarthria.基于语音技术的构音障碍语音清晰度评估。

Int J Lang Commun Disord. 2009 Sep-Oct;44(5):716-30. doi: 10.1080/13682820802342062.

Evaluation of an Automatic Speech Recognition Platform for Dysarthric Speech.用于构音障碍语音的自动语音识别平台评估

Folia Phoniatr Logop. 2021;73(5):432-441. doi: 10.1159/000511042. Epub 2020 Nov 13.

Improving Dysarthric Speech Segmentation With Emulated and Synthetic Augmentation.通过仿真和合成增强改进构音障碍语音分割。

IEEE J Transl Eng Health Med. 2024 Mar 11;12:382-389. doi: 10.1109/JTEHM.2024.3375323. eCollection 2024.

Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients.基于语音后图的语音转换系统，提高构音障碍患者的言语可懂度。

Comput Methods Programs Biomed. 2022 Mar;215:106602. doi: 10.1016/j.cmpb.2021.106602. Epub 2021 Dec 26.

Estimation of phoneme-specific HMM topologies for the automatic recognition of dysarthric speech.用于语音识别的特定音位 HMM 拓扑结构的估计。

Comput Math Methods Med. 2013;2013:297860. doi: 10.1155/2013/297860. Epub 2013 Oct 8.

A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks.一种使用多网络人工神经网络的多视图多学习者方法用于构音障碍语音识别。

IEEE Trans Neural Syst Rehabil Eng. 2014 Sep;22(5):1053-63. doi: 10.1109/TNSRE.2014.2309336. Epub 2014 Mar 11.

引用本文的文献

A novel Swin transformer based framework for speech recognition for dysarthria.一种基于新型Swin变压器的构音障碍语音识别框架。

Sci Rep. 2025 Jun 16;15(1):20070. doi: 10.1038/s41598-025-02042-7.

Protein structure prediction via deep learning: an in-depth review.基于深度学习的蛋白质结构预测：深入综述

Front Pharmacol. 2025 Apr 3;16:1498662. doi: 10.3389/fphar.2025.1498662. eCollection 2025.

Flash Memory for Synaptic Plasticity in Neuromorphic Computing: A Review.用于神经形态计算中突触可塑性的闪存：综述

Biomimetics (Basel). 2025 Feb 18;10(2):121. doi: 10.3390/biomimetics10020121.

Co-designing the integration of voice-based conversational AI and web augmentation to amplify web inclusivity.共同设计基于语音的对话式人工智能与网页增强的整合，以增强网页的包容性。

Sci Rep. 2024 Jul 13;14(1):16162. doi: 10.1038/s41598-024-66725-3.

FLMatchQA: a recursive neural network-based question answering with customized federated learning model.FLMatchQA：一种基于递归神经网络的问答系统，采用定制的联邦学习模型。

PeerJ Comput Sci. 2024 Jun 28;10:e2092. doi: 10.7717/peerj-cs.2092. eCollection 2024.

Exploring the Role of Machine Learning in Diagnosing and Treating Speech Disorders: A Systematic Literature Review.探索机器学习在言语障碍诊断与治疗中的作用：一项系统文献综述。

Psychol Res Behav Manag. 2024 May 31;17:2205-2232. doi: 10.2147/PRBM.S460283. eCollection 2024.

[A multiscale feature extraction algorithm for dysarthric speech recognition].[一种用于构音障碍语音识别的多尺度特征提取算法]

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2023 Feb 25;40(1):44-50. doi: 10.7507/1001-5515.202205049.

The Role of Deep Learning in Advancing Breast Cancer Detection Using Different Imaging Modalities: A Systematic Review.深度学习在推进使用不同成像方式进行乳腺癌检测中的作用：一项系统综述。

Cancers (Basel). 2022 Oct 29;14(21):5334. doi: 10.3390/cancers14215334.

Deep Mobile Linguistic Therapy for Patients with ASD.深度移动语言治疗自闭症患者。

Int J Environ Res Public Health. 2022 Oct 7;19(19):12857. doi: 10.3390/ijerph191912857.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

言语视觉：基于端到端深度学习的构音障碍自动语音识别系统。

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献