Suppr超能文献

自动语音识别系统对聋人及听力障碍者语音的性能量化

Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech.

作者信息

Zhao Robin, Choi Anna S G, Koenecke Allison, Rameau Anaïs

机构信息

Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, U.S.A.

Department of Information Science, Cornell University, Ithaca, New York, U.S.A.

出版信息

Laryngoscope. 2025 Jan;135(1):191-197. doi: 10.1002/lary.31713. Epub 2024 Aug 19.

Abstract

OBJECTIVE

To evaluate the performance of commercial automatic speech recognition (ASR) systems on d/Deaf and hard-of-hearing (d/Dhh) speech.

METHODS

A corpus containing 850 audio files of d/Dhh and normal hearing (NH) speech from the University of Memphis Speech Perception Assessment Laboratory was tested on four speech-to-text application program interfaces (APIs): Amazon Web Services, Microsoft Azure, Google Chirp, and OpenAI Whisper. We quantified the Word Error Rate (WER) of API transcriptions for 24 d/Dhh and nine NH participants and performed subgroup analysis by speech intelligibility classification (SIC), hearing loss (HL) onset, and primary communication mode.

RESULTS

Mean WER averaged across APIs was 10 times higher for the d/Dhh group (52.6%) than the NH group (5.0%). APIs performed significantly worse for "low" and "medium" SIC (85.9% and 46.6% WER, respectively) as compared to "high" SIC group (9.5% WER, comparable to NH group). APIs performed significantly worse for speakers with prelingual HL relative to postlingual HL (80.5% and 37.1% WER, respectively). APIs performed significantly worse for speakers primarily communicating with sign language (70.2% WER) relative to speakers with both oral and sign language communication (51.5%) or oral communication only (19.7%).

CONCLUSION

Commercial ASR systems underperform for d/Dhh individuals, especially those with "low" and "medium" SIC, prelingual onset of HL, and sign language as primary communication mode. This contrasts with Big Tech companies' promises of accessibility, indicating the need for ASR systems ethically trained on heterogeneous d/Dhh speech data.

LEVEL OF EVIDENCE

3 Laryngoscope, 135:191-197, 2025.

摘要

目的

评估商用自动语音识别(ASR)系统对聋/重听(d/Dhh)人群语音的识别性能。

方法

从孟菲斯大学语音感知评估实验室获取了一个包含850个d/Dhh和正常听力(NH)语音音频文件的语料库,在四个语音转文本应用程序接口(API)上进行测试:亚马逊网络服务、微软Azure、谷歌Chirp和OpenAI Whisper。我们对24名d/Dhh参与者和9名NH参与者的API转录的单词错误率(WER)进行了量化,并通过语音可懂度分类(SIC)、听力损失(HL)发病时间和主要交流方式进行了亚组分析。

结果

d/Dhh组的平均WER(52.6%)是NH组(5.0%)的10倍。与“高”SIC组(9.5%的WER,与NH组相当)相比,“低”和“中”SIC组的API表现明显更差(分别为85.9%和46.6%的WER)。与语后聋HL的受试者相比,语前聋HL受试者的API表现明显更差(分别为80.5%和37.1%的WER)。与同时使用口语和手语交流(51.5%)或仅使用口语交流(19.7%)的受试者相比,主要使用手语交流的受试者的API表现明显更差(70.2%的WER)。

结论

商用ASR系统对d/Dhh个体的表现不佳,尤其是那些具有“低”和“中”SIC、语前聋HL以及以手语为主要交流方式的个体。这与科技巨头公司在无障碍访问方面的承诺形成对比,表明需要对ASR系统进行基于异质d/Dhh语音数据的伦理训练。

证据水平

3《喉镜》,135:191 - 197,2025年。

相似文献

本文引用的文献

3
Racial disparities in automated speech recognition.种族差异与自动化语音识别。
Proc Natl Acad Sci U S A. 2020 Apr 7;117(14):7684-7689. doi: 10.1073/pnas.1915768117. Epub 2020 Mar 23.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验