• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

量化并提高语音识别系统对嗓音障碍语音的性能。

Quantifying and Improving the Performance of Speech Recognition Systems on Dysphonic Speech.

作者信息

Hidalgo Lopez Julio C, Sandeep Shelly, Wright MaKayla, Wandell Grace M, Law Anthony B

机构信息

Emory University School of Medicine, Atlanta, Georgia, USA.

Georgia State University, Atlanta, Georgia, USA.

出版信息

Otolaryngol Head Neck Surg. 2023 May;168(5):1130-1138. doi: 10.1002/ohn.170. Epub 2023 Jan 24.

DOI:10.1002/ohn.170
PMID:36939576
Abstract

OBJECTIVE

This study seeks to quantify how current speech recognition systems perform on dysphonic input and if they can be improved.

STUDY DESIGN

Experimental machine learning methods based on a retrospective database.

SETTING

Single academic voice center.

METHODS

A database of dysphonic speech recordings was created and tested against 3 speech recognition platforms. Platform performance on dysphonic voice input was compared to platform performance on normal voice input. A custom speech recognition model was trained on voice from patients with spasmodic dysphonia or vocal cord paralysis. Custom model performance was compared to base model performance.

RESULTS

All platforms performed well on normal voice, and 2 platforms performed significantly worse on dysphonic speech. Accuracy metrics on dysphonic speech returned values of 84.55%, 88.57%, and 93.56% for International Business Machines (IBM) Watson, Amazon Transcribe, and Microsoft Azure, respectively. The secondary analysis demonstrated that the lower performance of IBM Watson and Amazon Transcribe was driven by performance on spasmodic dysphonia and vocal fold paralysis. Thus, a custom model was built to increase the accuracy of these pathologies on the Microsoft platform. Overall, the performance of the custom model on dysphonic voices was 96.43% and on normal voices was 97.62%.

CONCLUSION

Current speech recognition systems generally perform worse on dysphonic speech than on normal speech. We theorize that poor performance is a consequence of a lack of dysphonic voices in each platform's original training dataset. We address this limitation with transfer learning used to increase the performance of these systems on all dysphonic speech.

摘要

目的

本研究旨在量化当前语音识别系统在发音障碍语音输入上的表现,以及它们是否可以得到改进。

研究设计

基于回顾性数据库的实验性机器学习方法。

研究地点

单一学术语音中心。

方法

创建了一个发音障碍语音记录数据库,并在3个语音识别平台上进行测试。将发音障碍语音输入时平台的性能与正常语音输入时平台的性能进行比较。在痉挛性发音障碍或声带麻痹患者的语音上训练了一个定制语音识别模型。将定制模型的性能与基础模型的性能进行比较。

结果

所有平台在正常语音上表现良好,2个平台在发音障碍语音上表现明显更差。对于国际商业机器公司(IBM)的沃森、亚马逊转录和微软Azure,发音障碍语音的准确率指标分别为84.55%、88.57%和93.56%。二次分析表明,IBM沃森和亚马逊转录的较低性能是由痉挛性发音障碍和声带麻痹的表现驱动的。因此,构建了一个定制模型以提高微软平台上这些病症的准确率。总体而言,定制模型在发音障碍语音上的性能为96.43%,在正常语音上的性能为97.62%。

结论

当前语音识别系统在发音障碍语音上的表现通常比在正常语音上更差。我们推测性能不佳是每个平台原始训练数据集中缺乏发音障碍语音的结果。我们通过使用迁移学习来解决这一限制,以提高这些系统在所有发音障碍语音上的性能。

相似文献

1
Quantifying and Improving the Performance of Speech Recognition Systems on Dysphonic Speech.量化并提高语音识别系统对嗓音障碍语音的性能。
Otolaryngol Head Neck Surg. 2023 May;168(5):1130-1138. doi: 10.1002/ohn.170. Epub 2023 Jan 24.
2
Acoustic and Perceptual Classification of Within-sample Normal, Intermittently Dysphonic, and Consistently Dysphonic Voice Types.样本内正常、间歇性发声障碍和持续性发声障碍嗓音类型的声学及感知分类
J Voice. 2017 Mar;31(2):218-228. doi: 10.1016/j.jvoice.2016.04.016. Epub 2016 May 27.
3
Use of cepstral analysis for differentiating dysphonic from normal voices in children.运用倒谱分析鉴别儿童嗓音障碍与正常嗓音。
Int J Pediatr Otorhinolaryngol. 2019 Jan;116:107-113. doi: 10.1016/j.ijporl.2018.10.029. Epub 2018 Oct 23.
4
Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.深度学习在嗓音障碍自动检测中的应用:比较声学特征并开发一个可推广的框架。
Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.
5
Dysphonia Interference in Schoolteachers' Speech Intelligibility in the Classroom.教师课堂嗓音障碍对言语清晰度的影响。
J Voice. 2024 Mar;38(2):316-324. doi: 10.1016/j.jvoice.2021.09.004. Epub 2021 Nov 9.
6
Comparison of Two Multiparameter Acoustic Indices of Dysphonia Severity: The Acoustic Voice Quality Index and Cepstral Spectral Index of Dysphonia.两种嗓音障碍严重程度多参数声学指标的比较:嗓音声学质量指数和嗓音障碍的谐波倒谱谱指数
J Voice. 2018 Jul;32(4):515.e1-515.e13. doi: 10.1016/j.jvoice.2017.06.012. Epub 2017 Jul 21.
7
Hey Siri: How Effective are Common Voice Recognition Systems at Recognizing Dysphonic Voices?嘿,Siri:常见语音识别系统在识别嗓音障碍者的声音方面效果如何?
Laryngoscope. 2021 Jul;131(7):1599-1607. doi: 10.1002/lary.29082. Epub 2020 Sep 19.
8
Predictive value and discriminant capacity of cepstral- and spectral-based measures during continuous speech.基于倒谱和谱的语音连续语音分析的预测价值和判别能力。
J Voice. 2013 Jul;27(4):393-400. doi: 10.1016/j.jvoice.2013.02.005. Epub 2013 May 16.
9
Vocal stability in functional dysphonic versus healthy voices at different times of voice loading.功能性发声障碍患者与健康人在不同发声负荷时间下的嗓音稳定性
J Voice. 2004 Dec;18(4):443-53. doi: 10.1016/j.jvoice.2004.01.002.
10
Children's Subjective Ratings and Opinions of Typical and Dysphonic Voice After Performing a Language Comprehension Task in Background Noise.儿童在背景噪音中执行语言理解任务后对典型嗓音和嗓音障碍的主观评分及看法。
J Voice. 2015 Sep;29(5):624-30. doi: 10.1016/j.jvoice.2014.11.003. Epub 2015 Apr 11.

引用本文的文献

1
Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech.自动语音识别系统对聋人及听力障碍者语音的性能量化
Laryngoscope. 2025 Jan;135(1):191-197. doi: 10.1002/lary.31713. Epub 2024 Aug 19.