• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

[一种用于构音障碍语音识别的多尺度特征提取算法]

[A multiscale feature extraction algorithm for dysarthric speech recognition].

作者信息

Zhao Jianxing, Xue Peiyun, Bai Jing, Shi Chenkang, Yuan Bo, Shi Tongtong

机构信息

School of Information and Computer Science, Taiyuan University of Technology, Taiyuan 030024, P. R. China.

出版信息

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2023 Feb 25;40(1):44-50. doi: 10.7507/1001-5515.202205049.

DOI:10.7507/1001-5515.202205049
PMID:36854547
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9989754/
Abstract

In this paper, we propose a multi-scale mel domain feature map extraction algorithm to solve the problem that the speech recognition rate of dysarthria is difficult to improve. We used the empirical mode decomposition method to decompose speech signals and extracted Fbank features and their first-order differences for each of the three effective components to construct a new feature map, which could capture details in the frequency domain. Secondly, due to the problems of effective feature loss and high computational complexity in the training process of single channel neural network, we proposed a speech recognition network model in this paper. Finally, training and decoding were performed on the public UA-Speech dataset. The experimental results showed that the accuracy of the speech recognition model of this method reached 92.77%. Therefore, the algorithm proposed in this paper can effectively improve the speech recognition rate of dysarthria.

摘要

在本文中,我们提出了一种多尺度梅尔域特征图提取算法,以解决构音障碍语音识别率难以提高的问题。我们采用经验模态分解方法对语音信号进行分解,并为三个有效分量中的每一个提取Fbank特征及其一阶差分,以构建一个能够捕捉频域细节的新特征图。其次,针对单通道神经网络训练过程中存在的有效特征丢失和计算复杂度高的问题,我们在本文中提出了一种语音识别网络模型。最后,在公开的UA-Speech数据集上进行训练和解码。实验结果表明,该方法的语音识别模型准确率达到了92.77%。因此,本文提出的算法能够有效提高构音障碍的语音识别率。

相似文献

1
[A multiscale feature extraction algorithm for dysarthric speech recognition].[一种用于构音障碍语音识别的多尺度特征提取算法]
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2023 Feb 25;40(1):44-50. doi: 10.7507/1001-5515.202205049.
2
Multi-Stage Audio-Visual Fusion for Dysarthric Speech Recognition With Pre-Trained Models.基于预训练模型的构音障碍语音识别的多阶段视听融合
IEEE Trans Neural Syst Rehabil Eng. 2023;31:1912-1921. doi: 10.1109/TNSRE.2023.3262001.
3
Dysarthric Speech Enhancement Based on Convolution Neural Network.基于卷积神经网络的构音障碍语音增强。
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:60-64. doi: 10.1109/EMBC48229.2022.9871531.
4
Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals.使用构音障碍(失真)语音信号的倒谱分析对隐马尔可夫模型/人工神经网络混合结构在模式识别应用中的研究。
Med Eng Phys. 2006 Oct;28(8):741-8. doi: 10.1016/j.medengphy.2005.11.002. Epub 2005 Dec 15.
5
A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks.一种使用多网络人工神经网络的多视图多学习者方法用于构音障碍语音识别。
IEEE Trans Neural Syst Rehabil Eng. 2014 Sep;22(5):1053-63. doi: 10.1109/TNSRE.2014.2309336. Epub 2014 Mar 11.
6
Estimation of phoneme-specific HMM topologies for the automatic recognition of dysarthric speech.用于语音识别的特定音位 HMM 拓扑结构的估计。
Comput Math Methods Med. 2013;2013:297860. doi: 10.1155/2013/297860. Epub 2013 Oct 8.
7
[Psychosis speech recognition algorithm based on deep embedded sparse stacked autoencoder and manifold ensemble].基于深度嵌入式稀疏堆叠自动编码器和流形集成的精神病语音识别算法
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2021 Aug 25;38(4):655-662. doi: 10.7507/1001-5515.202010050.
8
Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System.言语视觉:基于端到端深度学习的构音障碍自动语音识别系统。
IEEE Trans Neural Syst Rehabil Eng. 2021;29:852-861. doi: 10.1109/TNSRE.2021.3076778. Epub 2021 May 7.
9
A Classroom Emotion Recognition Model Based on a Convolutional Neural Network Speech Emotion Algorithm.基于卷积神经网络语音情感算法的课堂情感识别模型。
Occup Ther Int. 2022 Jul 7;2022:9563877. doi: 10.1155/2022/9563877. eCollection 2022.
10
Evaluation of an Automatic Speech Recognition Platform for Dysarthric Speech.用于构音障碍语音的自动语音识别平台评估
Folia Phoniatr Logop. 2021;73(5):432-441. doi: 10.1159/000511042. Epub 2020 Nov 13.

本文引用的文献

1
Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System.言语视觉:基于端到端深度学习的构音障碍自动语音识别系统。
IEEE Trans Neural Syst Rehabil Eng. 2021;29:852-861. doi: 10.1109/TNSRE.2021.3076778. Epub 2021 May 7.
2
[Parkinson's disease diagnosis based on local statistics of speech signal in time-frequency domain].基于语音信号时频域局部统计的帕金森病诊断
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2021 Feb 25;38(1):21-29. doi: 10.7507/1001-5515.202001024.
3
Evaluation of an Automatic Speech Recognition Platform for Dysarthric Speech.用于构音障碍语音的自动语音识别平台评估
Folia Phoniatr Logop. 2021;73(5):432-441. doi: 10.1159/000511042. Epub 2020 Nov 13.
4
Investigation of Different Time-Frequency Representations for Intelligibility Assessment of Dysarthric Speech.不同时频表示在构音障碍语音可懂度评估中的研究。
IEEE Trans Neural Syst Rehabil Eng. 2020 Dec;28(12):2880-2889. doi: 10.1109/TNSRE.2020.3035392. Epub 2021 Jan 28.
5
[An acoustic-articulatory study of the nasal finals in students with and without hearing loss].[有听力损失和无听力损失学生鼻韵母的声学-发音研究]
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2018 Apr 25;35(2):198-205. doi: 10.7507/1001-5515.201706007.
6
Improving Acoustic Models in TORGO Dysarthric Speech Database.改善 TORGO 构音障碍语音数据库中的声学模型。
IEEE Trans Neural Syst Rehabil Eng. 2018 Mar;26(3):637-645. doi: 10.1109/TNSRE.2018.2802914.
7
Representation Learning Based Speech Assistive System for Persons With Dysarthria.基于表示学习的构音障碍患者语音辅助系统。
IEEE Trans Neural Syst Rehabil Eng. 2017 Sep;25(9):1510-1517. doi: 10.1109/TNSRE.2016.2638830. Epub 2016 Dec 13.