Xu Zidu, Vergez Sasha, Esmaeili Elyas, Zolnour Ali, Briggs Krystal Anne, Scroggins Jihye Kim, Hosseini Ebrahimabad Seyed Farid, Noble James M, Topaz Maxim, Bakken Suzanne, Bowles Kathryn H, Spens Ian, Onorato Nicole, Sridharan Sridevi, McDonald Margaret V, Zolnoori Maryam
School of Nursing, Columbia University, New York, NY 10032, United States.
Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States.
Stud Health Technol Inform. 2025 Aug 7;329:1904-1906. doi: 10.3233/SHTI251273.
Integrating automatic speech recognition (ASR) systems into home healthcare workflows can enhance risk prediction models. However, ASR systems exhibit disparities in transcription accuracy across racial and linguistic groups, highlighting an equity gap that could bias healthcare delivery. We evaluated four ASR systems-AWS General, AWS Medical, Whisper, and Wave2Vec-in transcribing 860 patient-nurse utterances (475 Black, 385 White). Word error rate (WER) was the primary measure. AWS General achieved the highest accuracy (median WER 39%), but all systems were less accurate for Black patients, particularly in linguistic domains "Affect," "Social," and "Drives." AWS Medical outperformed others on medical terms, although filler words, repetition, and nonmedical terms challenged every system. These findings underscore the need for diverse training datasets and improved dialect sensitivity to ensure equitable ASR performance and robust risk identification in home healthcare.
将自动语音识别(ASR)系统集成到家庭医疗工作流程中可以增强风险预测模型。然而,ASR系统在不同种族和语言群体的转录准确性方面存在差异,凸显了一个可能导致医疗服务产生偏差的公平差距。我们评估了四个ASR系统——亚马逊云科技通用版、亚马逊云科技医疗版、Whisper和Wave2Vec——对860条患者与护士对话(475条来自黑人患者,385条来自白人患者)的转录情况。单词错误率(WER)是主要衡量指标。亚马逊云科技通用版的准确性最高(中位数WER为39%),但所有系统对黑人患者的转录准确性都较低,尤其是在“情感”“社交”和“驱动力”等语言领域。亚马逊云科技医疗版在医学术语方面表现优于其他系统,不过填充词、重复内容和非医学术语对每个系统都构成了挑战。这些发现强调了需要多样化的训练数据集并提高方言敏感性,以确保家庭医疗中ASR性能的公平性和强大的风险识别能力。