Suppr超能文献

种族差异与自动化语音识别。

Racial disparities in automated speech recognition.

机构信息

Institute for Computational & Mathematical Engineering, Stanford University, Stanford, CA 94305.

Department of Psychology, Stanford University, Stanford, CA 94305.

出版信息

Proc Natl Acad Sci U S A. 2020 Apr 7;117(14):7684-7689. doi: 10.1073/pnas.1915768117. Epub 2020 Mar 23.

Abstract

Automated speech recognition (ASR) systems, which use sophisticated machine-learning algorithms to convert spoken language to text, have become increasingly widespread, powering popular virtual assistants, facilitating automated closed captioning, and enabling digital dictation platforms for health care. Over the last several years, the quality of these systems has dramatically improved, due both to advances in deep learning and to the collection of large-scale datasets used to train the systems. There is concern, however, that these tools do not work equally well for all subgroups of the population. Here, we examine the ability of five state-of-the-art ASR systems-developed by Amazon, Apple, Google, IBM, and Microsoft-to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. In total, this corpus spans five US cities and consists of 19.8 h of audio matched on the age and gender of the speaker. We found that all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers. We trace these disparities to the underlying acoustic models used by the ASR systems as the race gap was equally large on a subset of identical phrases spoken by black and white individuals in our corpus. We conclude by proposing strategies-such as using more diverse training datasets that include African American Vernacular English-to reduce these performance differences and ensure speech recognition technology is inclusive.

摘要

自动语音识别 (ASR) 系统使用复杂的机器学习算法将口语转换为文本,已经越来越普及,为流行的虚拟助手提供支持,促进自动字幕生成,并为医疗保健提供数字听写平台。在过去的几年中,由于深度学习的进步和用于训练系统的大规模数据集的收集,这些系统的质量有了显著提高。然而,人们担心这些工具并非对所有人群的亚组都同样有效。在这里,我们检查了五个最先进的 ASR 系统——由亚马逊、苹果、谷歌、国际商用机器公司和微软开发——转录与 42 位白种人和 73 位黑种人进行的结构化访谈的能力。总共有五个美国城市的音频资料,跨越 19.8 小时,与说话者的年龄和性别相匹配。我们发现,所有五个 ASR 系统都存在显著的种族差异,黑种人的平均单词错误率 (WER) 为 0.35,而白种人的平均单词错误率为 0.19。我们将这些差异追溯到 ASR 系统使用的基础声学模型,因为在我们语料库中,黑人说的一组相同短语和白人说的相同短语之间的种族差距同样大。我们最后提出了一些策略,例如使用更具多样性的训练数据集,其中包括非裔美国人的白话英语,以减少这些性能差异,确保语音识别技术具有包容性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02d5/7149386/da16edda99a5/pnas.1915768117fig01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验