自动语音识别系统对聋人及听力障碍者语音的性能量化

Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech.

作者信息

Zhao Robin, Choi Anna S G, Koenecke Allison, Rameau Anaïs

机构信息

Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, U.S.A.

Department of Information Science, Cornell University, Ithaca, New York, U.S.A.

出版信息

Laryngoscope. 2025 Jan;135(1):191-197. doi: 10.1002/lary.31713. Epub 2024 Aug 19.

DOI:10.1002/lary.31713

PMID:39157956

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11637924/

Abstract

OBJECTIVE

To evaluate the performance of commercial automatic speech recognition (ASR) systems on d/Deaf and hard-of-hearing (d/Dhh) speech.

METHODS

A corpus containing 850 audio files of d/Dhh and normal hearing (NH) speech from the University of Memphis Speech Perception Assessment Laboratory was tested on four speech-to-text application program interfaces (APIs): Amazon Web Services, Microsoft Azure, Google Chirp, and OpenAI Whisper. We quantified the Word Error Rate (WER) of API transcriptions for 24 d/Dhh and nine NH participants and performed subgroup analysis by speech intelligibility classification (SIC), hearing loss (HL) onset, and primary communication mode.

RESULTS

Mean WER averaged across APIs was 10 times higher for the d/Dhh group (52.6%) than the NH group (5.0%). APIs performed significantly worse for "low" and "medium" SIC (85.9% and 46.6% WER, respectively) as compared to "high" SIC group (9.5% WER, comparable to NH group). APIs performed significantly worse for speakers with prelingual HL relative to postlingual HL (80.5% and 37.1% WER, respectively). APIs performed significantly worse for speakers primarily communicating with sign language (70.2% WER) relative to speakers with both oral and sign language communication (51.5%) or oral communication only (19.7%).

CONCLUSION

Commercial ASR systems underperform for d/Dhh individuals, especially those with "low" and "medium" SIC, prelingual onset of HL, and sign language as primary communication mode. This contrasts with Big Tech companies' promises of accessibility, indicating the need for ASR systems ethically trained on heterogeneous d/Dhh speech data.

LEVEL OF EVIDENCE

3 Laryngoscope, 135:191-197, 2025.

摘要

目的

评估商用自动语音识别（ASR）系统对聋/重听（d/Dhh）人群语音的识别性能。

方法

从孟菲斯大学语音感知评估实验室获取了一个包含850个d/Dhh和正常听力（NH）语音音频文件的语料库，在四个语音转文本应用程序接口（API）上进行测试：亚马逊网络服务、微软Azure、谷歌Chirp和OpenAI Whisper。我们对24名d/Dhh参与者和9名NH参与者的API转录的单词错误率（WER）进行了量化，并通过语音可懂度分类（SIC）、听力损失（HL）发病时间和主要交流方式进行了亚组分析。

结果

d/Dhh组的平均WER（52.6%）是NH组（5.0%）的10倍。与“高”SIC组（9.5%的WER，与NH组相当）相比，“低”和“中”SIC组的API表现明显更差（分别为85.9%和46.6%的WER）。与语后聋HL的受试者相比，语前聋HL受试者的API表现明显更差（分别为80.5%和37.1%的WER）。与同时使用口语和手语交流（51.5%）或仅使用口语交流（19.7%）的受试者相比，主要使用手语交流的受试者的API表现明显更差（70.2%的WER）。

结论

商用ASR系统对d/Dhh个体的表现不佳，尤其是那些具有“低”和“中”SIC、语前聋HL以及以手语为主要交流方式的个体。这与科技巨头公司在无障碍访问方面的承诺形成对比，表明需要对ASR系统进行基于异质d/Dhh语音数据的伦理训练。

证据水平

3《喉镜》，135:191 - 197，2025年。

相似文献

Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech.自动语音识别系统对聋人及听力障碍者语音的性能量化

Laryngoscope. 2025 Jan;135(1):191-197. doi: 10.1002/lary.31713. Epub 2024 Aug 19.

The influence of various factors on the performance of repetition tests in adults with cochlear implants.各种因素对植入人工耳蜗的成年人重复测试表现的影响。

Eur Arch Otorhinolaryngol. 2012 Mar;269(3):739-45. doi: 10.1007/s00405-011-1699-3. Epub 2011 Jul 8.

A proof-of-concept study for automatic speech recognition to transcribe AAC speakers' speech from high-technology AAC systems.一种用于自动语音识别的概念验证研究，旨在转录高科技辅助沟通系统中辅助沟通者的语音。

Assist Technol. 2024 Jul 3;36(4):319-326. doi: 10.1080/10400435.2023.2260860. Epub 2023 Oct 5.

Preliminary Validation of Measures of Experienced, Perceived, and Internalized Stigma Among Adults Who Are d/Deaf or Hard of Hearing in the United States and Ghana.美国和加纳聋人或重听成年人经历的、感知到的和内化的污名化的衡量标准的初步验证。

Ear Hear. 2024;45(Suppl 1):17S-25S. doi: 10.1097/AUD.0000000000001476. Epub 2024 Sep 19.

Ambulatory Phonation Monitoring in Prelingual and Postlingual Deaf Patients after Cochlear Implantation.人工耳蜗植入术后语前和语后聋患者的动态发声监测

Audiol Neurootol. 2023;28(1):52-62. doi: 10.1159/000526936. Epub 2022 Oct 4.

Using HIPAA (Health Insurance Portability and Accountability Act)-Compliant Transcription Services for Virtual Psychiatric Interviews: Pilot Comparison Study.将符合《健康保险流通与责任法案》（HIPAA）的转录服务用于虚拟精神科访谈：试点比较研究

JMIR Ment Health. 2023 Oct 31;10:e48517. doi: 10.2196/48517.

The development of an automatic speech recognition model using interview data from long-term care for older adults.利用老年人长期护理访谈数据开发自动语音识别模型。

J Am Med Inform Assoc. 2023 Feb 16;30(3):411-417. doi: 10.1093/jamia/ocac241.

The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.年龄、听力和工作记忆对从自动语音识别系统获得的语音理解增益的影响。

Ear Hear. 2009 Apr;30(2):262-72. doi: 10.1097/AUD.0b013e3181987063.

Deaf and hard-of-hearing patients are unsatisfied with and avoid German health care: Results from an online survey in German Sign Language.聋人和重听患者对德国的医疗保健不满意并加以回避：基于德国手语的在线调查结果。

BMC Public Health. 2023 Oct 18;23(1):2026. doi: 10.1186/s12889-023-16924-w.

Prosodic and segmental aspects of nonword repetition in 4- to 6-year-old children who are deaf and hard of hearing compared to controls with normal hearing.与听力正常的对照组相比，4至6岁失聪和听力障碍儿童对非单词重复的韵律和音段方面的研究。

Clin Linguist Phon. 2018;32(10):950-971. doi: 10.1080/02699206.2018.1469671. Epub 2018 May 3.

引用本文的文献

Spoken Language Analysis in Aging Research: The Validity of AI-Generated Speech to Text Using OpenAI's Whisper.衰老研究中的口语分析：使用OpenAI的Whisper将人工智能生成的语音转换为文本的有效性。

Gerontology. 2025;71(5):417-424. doi: 10.1159/000545244. Epub 2025 Mar 13.

Language-agnostic, Automated Assessment of Listeners' Speech Recall Using Large Language Models.使用大语言模型对听众言语回忆进行与语言无关的自动评估。

Trends Hear. 2025 Jan-Dec;29:23312165251347131. doi: 10.1177/23312165251347131. Epub 2025 May 30.

本文引用的文献

Quantifying and Improving the Performance of Speech Recognition Systems on Dysphonic Speech.量化并提高语音识别系统对嗓音障碍语音的性能。

Otolaryngol Head Neck Surg. 2023 May;168(5):1130-1138. doi: 10.1002/ohn.170. Epub 2023 Jan 24.

Hey Siri: How Effective are Common Voice Recognition Systems at Recognizing Dysphonic Voices?嘿，Siri：常见语音识别系统在识别嗓音障碍者的声音方面效果如何？

Laryngoscope. 2021 Jul;131(7):1599-1607. doi: 10.1002/lary.29082. Epub 2020 Sep 19.

Racial disparities in automated speech recognition.种族差异与自动化语音识别。

Proc Natl Acad Sci U S A. 2020 Apr 7;117(14):7684-7689. doi: 10.1073/pnas.1915768117. Epub 2020 Mar 23.

The effectiveness of cognitive rehabilitation program on auditory perception and verbal intelligibility of deaf children.认知康复方案对聋童听觉感知和言语可懂度的影响。

Am J Otolaryngol. 2019 Sep-Oct;40(5):724-728. doi: 10.1016/j.amjoto.2019.06.011. Epub 2019 Jun 28.

Corpus of deaf speech for acoustic and speech production research.用于声学和语音产生研究的聋人语音语料库。

J Acoust Soc Am. 2017 Jul;142(1):EL102. doi: 10.1121/1.4994288.

The benefit of bilateral versus unilateral cochlear implantation to speech intelligibility in noise.双侧与单侧人工耳蜗植入对噪声下言语可懂度的影响。

Ear Hear. 2012 Nov-Dec;33(6):673-82. doi: 10.1097/AUD.0b013e3182587356.

Weighting of cues for fricative place of articulation perception by children wearing cochlear implants.人工耳蜗植入儿童对擦音发音部位感知线索的加权。

Int J Audiol. 2011 Aug;50(8):540-7. doi: 10.3109/14992027.2010.549515. Epub 2011 May 23.

Spatial hearing and speech intelligibility in bilateral cochlear implant users.双侧人工耳蜗植入使用者的空间听觉与言语可懂度

Ear Hear. 2009 Aug;30(4):419-31. doi: 10.1097/AUD.0b013e3181a165be.

Speech perception and speech intelligibility in children after cochlear implantation.人工耳蜗植入术后儿童的言语感知与言语可懂度

Int J Pediatr Otorhinolaryngol. 2004 Mar;68(3):347-51. doi: 10.1016/j.ijporl.2003.11.006.

Connected speech intelligibility of children with cochlear implants and children with normal hearing.人工耳蜗植入儿童与听力正常儿童的连贯言语可懂度。

Am J Speech Lang Pathol. 2003 Nov;12(4):440-51. doi: 10.1044/1058-0360(2003/090).

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验