• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

优化说话人相关特征提取参数,以提高构音障碍患者的自动语音识别性能。

Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria.

机构信息

Department of Information Engineering, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy.

出版信息

Sensors (Basel). 2021 Sep 27;21(19):6460. doi: 10.3390/s21196460.

DOI:10.3390/s21196460
PMID:34640780
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8512569/
Abstract

Within the field of Automatic Speech Recognition (ASR) systems, facing impaired speech is a big challenge because standard approaches are ineffective in the presence of dysarthria. The first aim of our work is to confirm the effectiveness of a new speech analysis technique for speakers with dysarthria. This new approach exploits the fine-tuning of the size and shift parameters of the spectral analysis window used to compute the initial short-time Fourier transform, to improve the performance of a speaker-dependent ASR system. The second aim is to define if there exists a correlation among the speaker's voice features and the optimal window and shift parameters that minimises the error of an ASR system, for that specific speaker. For our experiments, we used both impaired and unimpaired Italian speech. Specifically, we used 30 speakers with dysarthria from the IDEA database and 10 professional speakers from the CLIPS database. Both databases are freely available. The results confirm that, if a standard ASR system performs poorly with a speaker with dysarthria, it can be improved by using the new speech analysis. Otherwise, the new approach is ineffective in cases of unimpaired and low impaired speech. Furthermore, there exists a correlation between some speaker's voice features and their optimal parameters.

摘要

在自动语音识别 (ASR) 系统领域,面对受损语音是一个巨大的挑战,因为标准方法在存在构音障碍时效果不佳。我们工作的首要目标是确认一种新的语音分析技术对于构音障碍者的有效性。这种新方法利用了频谱分析窗口大小和移动参数的微调,该窗口用于计算初始短时傅里叶变换,以提高特定于说话者的 ASR 系统的性能。第二个目标是确定在特定说话者的情况下,说话者的语音特征和最小化 ASR 系统错误的最佳窗口和移动参数之间是否存在相关性。对于我们的实验,我们同时使用了受损和未受损的意大利语语音。具体来说,我们使用了来自 IDEA 数据库的 30 名构音障碍者和来自 CLIPS 数据库的 10 名专业演讲者。这两个数据库都是免费提供的。结果证实,如果标准的 ASR 系统在构音障碍者的表现不佳,可以通过使用新的语音分析来改进。否则,在未受损和轻度受损语音的情况下,新方法无效。此外,一些说话者的语音特征与其最佳参数之间存在相关性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e78/8512569/ccaced4d47b5/sensors-21-06460-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e78/8512569/524a36a92298/sensors-21-06460-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e78/8512569/7c83eea6b273/sensors-21-06460-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e78/8512569/da7583d54644/sensors-21-06460-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e78/8512569/ccaced4d47b5/sensors-21-06460-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e78/8512569/524a36a92298/sensors-21-06460-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e78/8512569/7c83eea6b273/sensors-21-06460-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e78/8512569/da7583d54644/sensors-21-06460-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e78/8512569/ccaced4d47b5/sensors-21-06460-g004.jpg

相似文献

1
Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria.优化说话人相关特征提取参数,以提高构音障碍患者的自动语音识别性能。
Sensors (Basel). 2021 Sep 27;21(19):6460. doi: 10.3390/s21196460.
2
Deep learning applications in telerehabilitation speech therapy scenarios.深度学习在远程康复语音治疗场景中的应用。
Comput Biol Med. 2022 Sep;148:105864. doi: 10.1016/j.compbiomed.2022.105864. Epub 2022 Jul 12.
3
Automatic Assessment of Intelligibility in Noise in Parkinson Disease: Validation Study.帕金森病噪声环境下言语可懂度的自动评估:验证研究。
J Med Internet Res. 2022 Oct 20;24(10):e40567. doi: 10.2196/40567.
4
Evaluation of an Automatic Speech Recognition Platform for Dysarthric Speech.用于构音障碍语音的自动语音识别平台评估
Folia Phoniatr Logop. 2021;73(5):432-441. doi: 10.1159/000511042. Epub 2020 Nov 13.
5
The use of speech recognition technology by people living with amyotrophic lateral sclerosis: a scoping review.肌萎缩侧索硬化症患者使用语音识别技术:范围综述。
Disabil Rehabil Assist Technol. 2023 Oct;18(7):1043-1055. doi: 10.1080/17483107.2021.1974961. Epub 2021 Sep 11.
6
Interaction between people with dysarthria and speech recognition systems: A review.言语障碍者与语音识别系统的交互:综述
Assist Technol. 2023 Jul 4;35(4):330-338. doi: 10.1080/10400435.2022.2061085. Epub 2022 Apr 18.
7
Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.基于机器学习的方言阿萨姆语语音自动识别样本提取。
Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.
8
Severity-based adaptation with limited data for ASR to aid dysarthric speakers.基于严重程度的适应性调整,利用有限数据进行自动语音识别,以帮助构音障碍患者。
PLoS One. 2014 Jan 23;9(1):e86285. doi: 10.1371/journal.pone.0086285. eCollection 2014.
9
Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features.基于高斯混合模型、倒谱分析和遗传选择独特特征的自动说话人识别系统。
Sensors (Basel). 2022 Dec 1;22(23):9370. doi: 10.3390/s22239370.
10
Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System.构音障碍语音转换器:一种序列到序列的构音障碍语音识别系统。
IEEE Trans Neural Syst Rehabil Eng. 2023;31:3407-3416. doi: 10.1109/TNSRE.2023.3307020. Epub 2023 Aug 29.

引用本文的文献

1
Pareto-Optimized Non-Negative Matrix Factorization Approach to the Cleaning of Alaryngeal Speech Signals.用于清洁无喉语音信号的帕累托优化非负矩阵分解方法
Cancers (Basel). 2023 Jul 16;15(14):3644. doi: 10.3390/cancers15143644.
2
Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm.基于机器学习算法的语音信号特征参数提取改进。
Sensors (Basel). 2022 Oct 24;22(21):8122. doi: 10.3390/s22218122.
3
Automatic Assessment of Aphasic Speech Sensed by Audio Sensors for Classification into Aphasia Severity Levels to Recommend Speech Therapies.

本文引用的文献

1
Improving Acoustic Models in TORGO Dysarthric Speech Database.改善 TORGO 构音障碍语音数据库中的声学模型。
IEEE Trans Neural Syst Rehabil Eng. 2018 Mar;26(3):637-645. doi: 10.1109/TNSRE.2018.2802914.
2
Assessment of voice quality: Current state-of-the-art.嗓音质量评估:当前的技术水平。
Auris Nasus Larynx. 2015 Jun;42(3):183-8. doi: 10.1016/j.anl.2014.11.001. Epub 2014 Nov 28.
3
Spectral moments of the long-term average spectrum: sensitive indices of voice change after therapy?长期平均频谱的谱矩:治疗后嗓音变化的敏感指标?
利用音频传感器自动评估失语症患者的语音,以对失语症严重程度进行分类,从而推荐相应的语言治疗方法。
Sensors (Basel). 2022 Sep 14;22(18):6966. doi: 10.3390/s22186966.
J Voice. 2005 Jun;19(2):211-22. doi: 10.1016/j.jvoice.2004.02.005.