Zhao Robin, Choi Anna S G, Koenecke Allison, Rameau Anaïs
Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, U.S.A.
Department of Information Science, Cornell University, Ithaca, New York, U.S.A.
Laryngoscope. 2025 Jan;135(1):191-197. doi: 10.1002/lary.31713. Epub 2024 Aug 19.
To evaluate the performance of commercial automatic speech recognition (ASR) systems on d/Deaf and hard-of-hearing (d/Dhh) speech.
A corpus containing 850 audio files of d/Dhh and normal hearing (NH) speech from the University of Memphis Speech Perception Assessment Laboratory was tested on four speech-to-text application program interfaces (APIs): Amazon Web Services, Microsoft Azure, Google Chirp, and OpenAI Whisper. We quantified the Word Error Rate (WER) of API transcriptions for 24 d/Dhh and nine NH participants and performed subgroup analysis by speech intelligibility classification (SIC), hearing loss (HL) onset, and primary communication mode.
Mean WER averaged across APIs was 10 times higher for the d/Dhh group (52.6%) than the NH group (5.0%). APIs performed significantly worse for "low" and "medium" SIC (85.9% and 46.6% WER, respectively) as compared to "high" SIC group (9.5% WER, comparable to NH group). APIs performed significantly worse for speakers with prelingual HL relative to postlingual HL (80.5% and 37.1% WER, respectively). APIs performed significantly worse for speakers primarily communicating with sign language (70.2% WER) relative to speakers with both oral and sign language communication (51.5%) or oral communication only (19.7%).
Commercial ASR systems underperform for d/Dhh individuals, especially those with "low" and "medium" SIC, prelingual onset of HL, and sign language as primary communication mode. This contrasts with Big Tech companies' promises of accessibility, indicating the need for ASR systems ethically trained on heterogeneous d/Dhh speech data.
3 Laryngoscope, 135:191-197, 2025.
评估商用自动语音识别(ASR)系统对聋/重听(d/Dhh)人群语音的识别性能。
从孟菲斯大学语音感知评估实验室获取了一个包含850个d/Dhh和正常听力(NH)语音音频文件的语料库,在四个语音转文本应用程序接口(API)上进行测试:亚马逊网络服务、微软Azure、谷歌Chirp和OpenAI Whisper。我们对24名d/Dhh参与者和9名NH参与者的API转录的单词错误率(WER)进行了量化,并通过语音可懂度分类(SIC)、听力损失(HL)发病时间和主要交流方式进行了亚组分析。
d/Dhh组的平均WER(52.6%)是NH组(5.0%)的10倍。与“高”SIC组(9.5%的WER,与NH组相当)相比,“低”和“中”SIC组的API表现明显更差(分别为85.9%和46.6%的WER)。与语后聋HL的受试者相比,语前聋HL受试者的API表现明显更差(分别为80.5%和37.1%的WER)。与同时使用口语和手语交流(51.5%)或仅使用口语交流(19.7%)的受试者相比,主要使用手语交流的受试者的API表现明显更差(70.2%的WER)。
商用ASR系统对d/Dhh个体的表现不佳,尤其是那些具有“低”和“中”SIC、语前聋HL以及以手语为主要交流方式的个体。这与科技巨头公司在无障碍访问方面的承诺形成对比,表明需要对ASR系统进行基于异质d/Dhh语音数据的伦理训练。
3《喉镜》,135:191 - 197,2025年。