• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于改善构音障碍语音自动语音识别性能的两阶段数据增强

Two-stage data augmentation for improved ASR performance for dysarthric speech.

作者信息

Bhat Chitralekha, Strik Helmer

机构信息

Centre for Language and Speech Technology (CLST), Radboud University Nijmegen, The Netherlands.

Centre for Language and Speech Technology (CLST), Radboud University Nijmegen, The Netherlands; Centre for Language Studies (CLS), Radboud University Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, The Netherlands.

出版信息

Comput Biol Med. 2025 May;189:109954. doi: 10.1016/j.compbiomed.2025.109954. Epub 2025 Mar 13.

DOI:10.1016/j.compbiomed.2025.109954
PMID:40086291
Abstract

Machine learning (ML) and Deep Neural Networks (DNN) have greatly aided the problem of Automatic Speech Recognition (ASR). However, accurate ASR for dysarthric speech remains a serious challenge. The dearth of usable data remains a problem in applying ML and DNN techniques for dysarthric speech recognition. In the current research, we address this challenge using a novel two-stage data augmentation scheme, a combination of static and dynamic data augmentation techniques, designed by leveraging an understanding of the characteristics of dysarthric speech. We explore speaker-independent ASR using modifications to healthy speech using various perturbations, devoicing of consonants, and voice conversion, comprising stage one or static augmentations. Subsequent to the first stage, a modified SpecAugment algorithm tailored for dysarthric speech is employed. This variant, termed Dysarthric SpecAugment, leverages the characteristics of dysarthric speech and forms the second stage of the two-stage augmentation approach. This acoustic model is used to pre-train a speaker-dependent ASR using dysarthric speech. The objective of this work is to improve the ASR performance for dysarthric speech using the two-stage data augmentation scheme. An end-to-end ASR using a Transformer acoustic model is used to evaluate the data augmentation scheme on speech from the UA dysarthric speech corpus. We achieve an absolute improvement of 10.7% and a relative improvement of 29.2% in word error rate (WER) over a baseline with no augmentation, with a final WER of 25.9% for the speaker-dependent system.

摘要

机器学习(ML)和深度神经网络(DNN)极大地推动了自动语音识别(ASR)问题的解决。然而,对构音障碍语音进行准确的ASR仍然是一项严峻的挑战。在将ML和DNN技术应用于构音障碍语音识别时,可用数据的匮乏仍然是一个问题。在当前的研究中,我们通过一种新颖的两阶段数据增强方案来应对这一挑战,该方案结合了静态和动态数据增强技术,是在对构音障碍语音特征的理解基础上设计的。我们通过对健康语音进行各种扰动、辅音清化和语音转换等修改来探索与说话者无关的ASR,这构成了第一阶段或静态增强。在第一阶段之后,采用了一种针对构音障碍语音量身定制的改进型SpecAugment算法。这种变体称为构音障碍SpecAugment,它利用了构音障碍语音的特征,构成了两阶段增强方法的第二阶段。这个声学模型用于使用构音障碍语音对与说话者相关的ASR进行预训练。这项工作的目标是使用两阶段数据增强方案来提高构音障碍语音的ASR性能。使用Transformer声学模型的端到端ASR用于评估来自UA构音障碍语音语料库的语音上的数据增强方案。与无增强的基线相比,我们在单词错误率(WER)上实现了10.7%的绝对提升和29.2%的相对提升,与说话者相关系统的最终WER为25.9%。

相似文献

1
Two-stage data augmentation for improved ASR performance for dysarthric speech.用于改善构音障碍语音自动语音识别性能的两阶段数据增强
Comput Biol Med. 2025 May;189:109954. doi: 10.1016/j.compbiomed.2025.109954. Epub 2025 Mar 13.
2
Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System.构音障碍语音转换器:一种序列到序列的构音障碍语音识别系统。
IEEE Trans Neural Syst Rehabil Eng. 2023;31:3407-3416. doi: 10.1109/TNSRE.2023.3307020. Epub 2023 Aug 29.
3
Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System.言语视觉:基于端到端深度学习的构音障碍自动语音识别系统。
IEEE Trans Neural Syst Rehabil Eng. 2021;29:852-861. doi: 10.1109/TNSRE.2021.3076778. Epub 2021 May 7.
4
Improving Dysarthric Speech Segmentation With Emulated and Synthetic Augmentation.通过仿真和合成增强改进构音障碍语音分割。
IEEE J Transl Eng Health Med. 2024 Mar 11;12:382-389. doi: 10.1109/JTEHM.2024.3375323. eCollection 2024.
5
Multi-Stage Audio-Visual Fusion for Dysarthric Speech Recognition With Pre-Trained Models.基于预训练模型的构音障碍语音识别的多阶段视听融合
IEEE Trans Neural Syst Rehabil Eng. 2023;31:1912-1921. doi: 10.1109/TNSRE.2023.3262001.
6
Enhancing dysarthric speech recognition through SepFormer and hierarchical attention network models with multistage transfer learning.通过使用多阶段迁移学习的 SepFormer 和分层注意力网络模型增强构音障碍语音识别。
Sci Rep. 2024 Nov 27;14(1):29455. doi: 10.1038/s41598-024-80764-w.
7
A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks.一种使用多网络人工神经网络的多视图多学习者方法用于构音障碍语音识别。
IEEE Trans Neural Syst Rehabil Eng. 2014 Sep;22(5):1053-63. doi: 10.1109/TNSRE.2014.2309336. Epub 2014 Mar 11.
8
The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance.构音障碍性言语中的感知障碍与自动语音识别性能之间的关系。
J Acoust Soc Am. 2016 Nov;140(5):EL416. doi: 10.1121/1.4967208.
9
Vocal tract representation in the recognition of cerebral palsied speech.声道特征在脑瘫语音识别中的应用。
J Speech Lang Hear Res. 2012 Aug;55(4):1190-207. doi: 10.1044/1092-4388(2011/11-0223). Epub 2012 Jan 23.
10
Severity-based adaptation with limited data for ASR to aid dysarthric speakers.基于严重程度的适应性调整,利用有限数据进行自动语音识别,以帮助构音障碍患者。
PLoS One. 2014 Jan 23;9(1):e86285. doi: 10.1371/journal.pone.0086285. eCollection 2014.

引用本文的文献

1
Spectro-Image Analysis with Vision Graph Neural Networks and Contrastive Learning for Parkinson's Disease Detection.基于视觉图神经网络和对比学习的光谱图像分析用于帕金森病检测
J Imaging. 2025 Jul 2;11(7):220. doi: 10.3390/jimaging11070220.