• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过使用多阶段迁移学习的 SepFormer 和分层注意力网络模型增强构音障碍语音识别。

Enhancing dysarthric speech recognition through SepFormer and hierarchical attention network models with multistage transfer learning.

机构信息

Division of Robotics Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India.

Division of Biomedical Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India.

出版信息

Sci Rep. 2024 Nov 27;14(1):29455. doi: 10.1038/s41598-024-80764-w.

DOI:10.1038/s41598-024-80764-w
PMID:39604526
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11603152/
Abstract

Dysarthria, a motor speech disorder that impacts articulation and speech clarity, presents significant challenges for Automatic Speech Recognition (ASR) systems. This study proposes a groundbreaking approach to enhance the accuracy of Dysarthric Speech Recognition (DSR). A primary innovation lies in the integration of the SepFormer-Speech Enhancement Generative Adversarial Network (S-SEGAN), an advanced generative adversarial network tailored for Dysarthric Speech Enhancement (DSE), as a front-end processing stage for DSR systems. The S-SEGAN integrates SEGAN's adversarial learning with SepFormer speech separation capabilities, demonstrating significant improvements in performance. Furthermore, a multistage transfer learning approach is employed to assess the DSR models for both word-level and sentence-level DSR. These DSR models are first trained on a large speech dataset (LibriSpeech) and then fine-tuned on dysarthric speech data (both isolated and augmented). Evaluations demonstrate significant DSR accuracy improvements in DSE integration. The Dysarthric Speech (DS)-baseline models (without DSE), Transformer and Conformer achieved Word Recognition Accuracy (WRA) percentages of 68.60% and 69.87%, respectively. The introduction of Hierarchical Attention Network (HAN) with the Transformer and Conformer architectures resulted in improved performance, with T-HAN achieving a WRA of 71.07% and C-HAN reaching 73%. The Transformer model with DSE + DSR for isolated words achieves a WRA of 73.40%, while that of the Conformer model reaches 74.33%. Notably, the T-HAN and C-HAN models with DSE + DSR demonstrate even more substantial enhancements, with WRAs of 75.73% and 76.87%, respectively. Augmenting words further boosts model performance, with the Transformer and Conformer models achieving WRAs of 76.47% and 79.20%, respectively. Remarkably, the T-HAN and C-HAN models with DSE + DSR and augmented words exhibit WRAs of 82.13% and 84.07%, respectively, with C-HAN displaying the highest performance among all proposed models.

摘要

运动性言语障碍(Dysarthria)是一种影响发音和言语清晰度的言语障碍,给自动语音识别(ASR)系统带来了重大挑战。本研究提出了一种开创性的方法,以提高言语障碍识别(DSR)的准确性。一个主要的创新点在于将 SepFormer-Speech Enhancement Generative Adversarial Network(S-SEGAN)集成到 DSR 系统中,S-SEGAN 是一种针对言语障碍增强(DSE)的先进生成对抗网络,作为前端处理阶段。S-SEGAN 将 SEGAN 的对抗学习与 SepFormer 语音分离能力相结合,在性能上有了显著的提高。此外,还采用了多阶段迁移学习方法来评估基于词级和句子级的 DSR 模型。这些 DSR 模型首先在大型语音数据集(LibriSpeech)上进行训练,然后在言语障碍数据(孤立和增强)上进行微调。评估表明,在 DSE 集成中,DSR 准确性有了显著提高。没有 DSE 的言语障碍(DS)-基线模型(Transformer 和 Conformer)的单词识别准确率(Word Recognition Accuracy,WRA)分别为 68.60%和 69.87%。在 Transformer 和 Conformer 架构中引入层次注意网络(Hierarchical Attention Network,HAN)后,性能得到了提高,其中 T-HAN 的 WRA 为 71.07%,C-HAN 为 73%。孤立词的 Transformer 模型与 DSE+DSR 的 WRA 为 73.40%,而 Conformer 模型的 WRA 为 74.33%。值得注意的是,具有 DSE+DSR 的 T-HAN 和 C-HAN 模型的 WRA 分别提高到 75.73%和 76.87%,甚至更为显著。进一步增强单词可以提高模型性能,Transformer 和 Conformer 模型的 WRA 分别为 76.47%和 79.20%。值得注意的是,具有 DSE+DSR 和增强单词的 T-HAN 和 C-HAN 模型的 WRA 分别为 82.13%和 84.07%,其中 C-HAN 在所有提出的模型中表现出最高的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/dca250412682/41598_2024_80764_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/e5f6662c1f68/41598_2024_80764_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/23ea1622410b/41598_2024_80764_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/96ba5a6b9017/41598_2024_80764_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/4ca344c72de6/41598_2024_80764_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/7f9c63c0b01b/41598_2024_80764_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/84de2a0f5f92/41598_2024_80764_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/a1c40c7e81c2/41598_2024_80764_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/d5d423a013f7/41598_2024_80764_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/2ad0f1a5b08c/41598_2024_80764_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/e4bf1c5e5679/41598_2024_80764_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/13fc8866d6a7/41598_2024_80764_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/700569a8648a/41598_2024_80764_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/dca250412682/41598_2024_80764_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/e5f6662c1f68/41598_2024_80764_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/23ea1622410b/41598_2024_80764_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/96ba5a6b9017/41598_2024_80764_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/4ca344c72de6/41598_2024_80764_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/7f9c63c0b01b/41598_2024_80764_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/84de2a0f5f92/41598_2024_80764_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/a1c40c7e81c2/41598_2024_80764_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/d5d423a013f7/41598_2024_80764_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/2ad0f1a5b08c/41598_2024_80764_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/e4bf1c5e5679/41598_2024_80764_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/13fc8866d6a7/41598_2024_80764_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/700569a8648a/41598_2024_80764_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2168/11603152/dca250412682/41598_2024_80764_Fig13_HTML.jpg

相似文献

1
Enhancing dysarthric speech recognition through SepFormer and hierarchical attention network models with multistage transfer learning.通过使用多阶段迁移学习的 SepFormer 和分层注意力网络模型增强构音障碍语音识别。
Sci Rep. 2024 Nov 27;14(1):29455. doi: 10.1038/s41598-024-80764-w.
2
Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System.构音障碍语音转换器:一种序列到序列的构音障碍语音识别系统。
IEEE Trans Neural Syst Rehabil Eng. 2023;31:3407-3416. doi: 10.1109/TNSRE.2023.3307020. Epub 2023 Aug 29.
3
Two-stage data augmentation for improved ASR performance for dysarthric speech.用于改善构音障碍语音自动语音识别性能的两阶段数据增强
Comput Biol Med. 2025 May;189:109954. doi: 10.1016/j.compbiomed.2025.109954. Epub 2025 Mar 13.
4
Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System.言语视觉:基于端到端深度学习的构音障碍自动语音识别系统。
IEEE Trans Neural Syst Rehabil Eng. 2021;29:852-861. doi: 10.1109/TNSRE.2021.3076778. Epub 2021 May 7.
5
Multi-Stage Audio-Visual Fusion for Dysarthric Speech Recognition With Pre-Trained Models.基于预训练模型的构音障碍语音识别的多阶段视听融合
IEEE Trans Neural Syst Rehabil Eng. 2023;31:1912-1921. doi: 10.1109/TNSRE.2023.3262001.
6
Improving Dysarthric Speech Segmentation With Emulated and Synthetic Augmentation.通过仿真和合成增强改进构音障碍语音分割。
IEEE J Transl Eng Health Med. 2024 Mar 11;12:382-389. doi: 10.1109/JTEHM.2024.3375323. eCollection 2024.
7
Dysarthric Speech Enhancement Based on Convolution Neural Network.基于卷积神经网络的构音障碍语音增强。
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:60-64. doi: 10.1109/EMBC48229.2022.9871531.
8
Estimation of phoneme-specific HMM topologies for the automatic recognition of dysarthric speech.用于语音识别的特定音位 HMM 拓扑结构的估计。
Comput Math Methods Med. 2013;2013:297860. doi: 10.1155/2013/297860. Epub 2013 Oct 8.
9
Vocal tract representation in the recognition of cerebral palsied speech.声道特征在脑瘫语音识别中的应用。
J Speech Lang Hear Res. 2012 Aug;55(4):1190-207. doi: 10.1044/1092-4388(2011/11-0223). Epub 2012 Jan 23.
10
A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks.一种使用多网络人工神经网络的多视图多学习者方法用于构音障碍语音识别。
IEEE Trans Neural Syst Rehabil Eng. 2014 Sep;22(5):1053-63. doi: 10.1109/TNSRE.2014.2309336. Epub 2014 Mar 11.