嘿，Siri：常见语音识别系统在识别嗓音障碍者的声音方面效果如何？

Hey Siri: How Effective are Common Voice Recognition Systems at Recognizing Dysphonic Voices?

作者信息

Rohlfing Matthew L, Buckley Daniel P, Piraquive Jacquelyn, Stepp Cara E, Tracy Lauren F

机构信息

Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center Boston University School of Medicine, Boston, Massachusetts, U.S.A.

Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts, U.S.A.

出版信息

Laryngoscope. 2021 Jul;131(7):1599-1607. doi: 10.1002/lary.29082. Epub 2020 Sep 19.

DOI:10.1002/lary.29082

PMID:32949415

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9009156/

Abstract

OBJECTIVES/HYPOTHESIS: Interaction with voice recognition systems, such as Siri™ and Alexa™, is an increasingly important part of everyday life. Patients with voice disorders may have difficulty with this technology, leading to frustration and reduction in quality of life. This study evaluates the ability of common voice recognition systems to transcribe dysphonic voices.

STUDY DESIGN

Retrospective evaluation of "Rainbow Passage" voice samples from patients with and without voice disorders.

METHODS

Participants with (n = 30) and without (n = 23) voice disorders were recorded reading the "Rainbow Passage". Recordings were played at standardized intensity and distance-to-dictation programs on Apple iPhone 6S™, Apple iPhone 11 Pro™, and Google Voice™. Word recognition scores were calculated as the proportion of correctly transcribed words. Word recognition scores were compared to auditory-perceptual and acoustic measures.

RESULTS

Mean word recognition scores for participants with and without voice disorders were, respectively, 68.6% and 91.9% for Apple iPhone 6S™ (P < .001), 71.2% and 93.7% for Apple iPhone 11 Pro™ (P < .001), and 68.7% and 93.8% for Google Voice™ (P < .001). There were strong, approximately linear associations between CAPE-V ratings of overall severity of dysphonia and word recognition score, with correlation coefficients (R ) of 0.609 (iPhone 6S™), 0.670 (iPhone 11 Pro™), and 0.619 (Google Voice™). These relationships persisted when controlling for diagnosis, age, gender, fundamental frequency, and speech rate (P < .001 for all systems).

CONCLUSION

Common voice recognition systems function well with nondysphonic voices but are poor at accurately transcribing dysphonic voices. There was a strong negative correlation with word recognition scores and perceptual voice evaluation. As our society increasingly interfaces with automated voice recognition technology, the needs of patients with voice disorders should be considered.

LEVEL OF EVIDENCE

4 Laryngoscope, 131:1599-1607, 2021.

摘要

目的/假设：与语音识别系统（如Siri™和Alexa™）的交互在日常生活中变得越来越重要。语音障碍患者在使用这项技术时可能会遇到困难，从而导致沮丧情绪并降低生活质量。本研究评估了常见语音识别系统转录嗓音障碍患者语音的能力。

研究设计

对有和没有语音障碍患者的“彩虹段落”语音样本进行回顾性评估。

方法

记录了有语音障碍的参与者（n = 30）和无语音障碍的参与者（n = 23）朗读“彩虹段落”的情况。录音在苹果iPhone 6S™、苹果iPhone 11 Pro™和谷歌语音™上以标准化强度和距离听写程序播放。单词识别分数以正确转录单词的比例计算。将单词识别分数与听觉感知和声学测量结果进行比较。

结果

对于苹果iPhone 6S™，有语音障碍和无语音障碍参与者的平均单词识别分数分别为68.6%和91.9%（P <.001）；对于苹果iPhone 11 Pro™，分别为71.2%和93.7%（P <.001）；对于谷歌语音™，分别为68.7%和93.8%（P <.001）。嗓音障碍总体严重程度的CAPE-V评分与单词识别分数之间存在强且近似线性的关联，相关系数（R）分别为0.609（iPhone 6S™）、0.670（iPhone 11 Pro™）和0.619（谷歌语音™）。在控制诊断、年龄、性别、基频和语速后，这些关系依然存在（所有系统的P均<.001）。

结论

常见语音识别系统对无嗓音障碍的语音功能良好，但在准确转录嗓音障碍语音方面表现不佳。单词识别分数与感知语音评估之间存在很强的负相关性。随着我们的社会越来越多地与自动语音识别技术交互，应考虑语音障碍患者的需求。

证据水平

4 《喉镜》，131:1599 - 1607，2021年。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

嘿，Siri：常见语音识别系统在识别嗓音障碍者的声音方面效果如何？

Hey Siri: How Effective are Common Voice Recognition Systems at Recognizing Dysphonic Voices?

作者信息

机构信息

出版信息

STUDY DESIGN

METHODS

RESULTS

CONCLUSION

LEVEL OF EVIDENCE

研究设计

方法

结果

结论

证据水平

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

嘿，Siri：常见语音识别系统在识别嗓音障碍者的声音方面效果如何？

Hey Siri: How Effective are Common Voice Recognition Systems at Recognizing Dysphonic Voices?

作者信息

机构信息

出版信息

STUDY DESIGN

METHODS

RESULTS

CONCLUSION

LEVEL OF EVIDENCE

研究设计

方法

结果

结论

证据水平