使用可解释机器学习和临床医生评级来识别从音频记录中检测声带麻痹的模型中的偏差。

Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings.

作者信息

Low Daniel M, Rao Vishwanatha, Randolph Gregory, Song Phillip C, Ghosh Satrajit S

机构信息

Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, United States of America.

McGovern Institute for Brain Research, MIT, Cambridge, Massachusetts, United States of America.

出版信息

PLOS Digit Health. 2024 May 30;3(5):e0000516. doi: 10.1371/journal.pdig.0000516. eCollection 2024 May.

DOI:10.1371/journal.pdig.0000516

PMID:38814939

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11139298/

Abstract

Detecting voice disorders from voice recordings could allow for frequent, remote, and low-cost screening before costly clinical visits and a more invasive laryngoscopy examination. Our goals were to detect unilateral vocal fold paralysis (UVFP) from voice recordings using machine learning, to identify which acoustic variables were important for prediction to increase trust, and to determine model performance relative to clinician performance. Patients with confirmed UVFP through endoscopic examination (N = 77) and controls with normal voices matched for age and sex (N = 77) were included. Voice samples were elicited by reading the Rainbow Passage and sustaining phonation of the vowel "a". Four machine learning models of differing complexity were used. SHapley Additive exPlanations (SHAP) was used to identify important features. The highest median bootstrapped ROC AUC score was 0.87 and beat clinician's performance (range: 0.74-0.81) based on the recordings. Recording durations were different between UVFP recordings and controls due to how that data was originally processed when storing, which we can show can classify both groups. And counterintuitively, many UVFP recordings had higher intensity than controls, when UVFP patients tend to have weaker voices, revealing a dataset-specific bias which we mitigate in an additional analysis. We demonstrate that recording biases in audio duration and intensity created dataset-specific differences between patients and controls, which models used to improve classification. Furthermore, clinician's ratings provide further evidence that patients were over-projecting their voices and being recorded at a higher amplitude signal than controls. Interestingly, after matching audio duration and removing variables associated with intensity in order to mitigate the biases, the models were able to achieve a similar high performance. We provide a set of recommendations to avoid bias when building and evaluating machine learning models for screening in laryngology.

摘要

通过语音记录检测语音障碍可以在进行昂贵的临床就诊和侵入性更强的喉镜检查之前，实现频繁、远程且低成本的筛查。我们的目标是使用机器学习从语音记录中检测单侧声带麻痹（UVFP），确定哪些声学变量对预测很重要以增强可信度，并确定模型相对于临床医生表现的性能。纳入了经内镜检查确诊为UVFP的患者（N = 77）以及年龄和性别匹配的嗓音正常的对照组（N = 77）。通过朗读彩虹段落和持续发元音“a”来获取语音样本。使用了四种不同复杂度的机器学习模型。采用SHapley加性解释（SHAP）来识别重要特征。基于录音，最高的中位数自展ROC AUC分数为0.87，超过了临床医生的表现（范围：0.74 - 0.81）。由于UVFP录音和对照组录音在存储时的原始数据处理方式不同，导致两者的录音时长不同，但我们可以证明这两组都能被分类。而且与直觉相反的是，许多UVFP录音的强度高于对照组，而UVFP患者的嗓音往往较弱，这揭示了特定于数据集的偏差，我们在额外的分析中对其进行了缓解。我们证明，音频时长和强度方面的录音偏差在患者和对照组之间造成了特定于数据集的差异，模型利用这些差异来改进分类。此外，临床医生的评分进一步证明，患者在过度发声，并且录音时的信号幅度高于对照组。有趣的是，在匹配音频时长并去除与强度相关的变量以减轻偏差后，模型能够实现类似的高性能。我们提供了一组建议，以避免在构建和评估用于喉科学筛查的机器学习模型时出现偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cf3/11139298/2f469ea4c787/pdig.0000516.g001.jpg

相似文献

Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings.使用可解释机器学习和临床医生评级来识别从音频记录中检测声带麻痹的模型中的偏差。

PLOS Digit Health. 2024 May 30;3(5):e0000516. doi: 10.1371/journal.pdig.0000516. eCollection 2024 May.

medRxiv. 2024 Mar 20:2020.11.23.20235945. doi: 10.1101/2020.11.23.20235945.

Relating Cepstral Peak Prominence to Cyclical Parameters of Vocal Fold Vibration from High-Speed Videoendoscopy Using Machine Learning: A Pilot Study.使用机器学习将声道倒谱峰值凸显度与声带高速视频内窥镜检查的周期性参数相关联：一项初步研究。

J Voice. 2021 Sep;35(5):703-716. doi: 10.1016/j.jvoice.2020.01.026. Epub 2020 Mar 12.

Comparison of voice therapy and selective electrical stimulation of the larynx in early unilateral vocal fold paralysis after thyroid surgery: A retrospective data analysis.甲状腺手术后早期单侧声带麻痹的嗓音治疗与喉选择性电刺激的比较：回顾性数据分析。

Clin Otolaryngol. 2021 May;46(3):530-537. doi: 10.1111/coa.13703. Epub 2021 Jan 27.

Perceptual ratings of vocal characteristics and voicing features in untreated patients with unilateral vocal fold paralysis.单侧声带麻痹未经治疗患者的嗓音特征和发声特点的感知评分

J Commun Disord. 2005 May-Jun;38(3):163-85. doi: 10.1016/j.jcomdis.2004.08.001.

Glottal Stop Production in Controls and Patients With Unilateral Vocal Fold Paresis/Paralysis.声门紧闭在单侧声带麻痹/瘫痪患者与正常对照中的产生情况。

J Speech Lang Hear Res. 2022 Sep 12;65(9):3392-3404. doi: 10.1044/2022_JSLHR-21-00599. Epub 2022 Aug 31.

Unilateral Vocal Fold Paralysis and Voice Therapy: Does Age Matter? A Prospective Study With 100 Consecutive Patients.单侧声带麻痹与嗓音治疗：年龄有影响吗？一项对100例连续患者的前瞻性研究。

Ear Nose Throat J. 2021 Sep;100(5_suppl):489S-494S. doi: 10.1177/0145561319882116. Epub 2019 Oct 17.

Effect of intralaryngeal muscle synkinesis on perception of voice handicap in patients with unilateral vocal fold paralysis.喉内肌联带运动对单侧声带麻痹患者嗓音障碍感知的影响

Laryngoscope. 2017 Jul;127(7):1628-1632. doi: 10.1002/lary.26390. Epub 2017 Jan 20.

Analysis of vocal fold function from acoustic data simultaneously recorded with high-speed endoscopy.从高速内窥镜同时记录的声学数据中分析声带功能。

J Voice. 2012 Nov;26(6):726-33. doi: 10.1016/j.jvoice.2012.02.001. Epub 2012 May 25.

Longitudinal Voice Outcomes After Voice Therapy in Unilateral Vocal Fold Paralysis.单侧声带麻痹患者嗓音治疗后的纵向嗓音结果

J Voice. 2016 Nov;30(6):767.e9-767.e15. doi: 10.1016/j.jvoice.2015.10.018. Epub 2015 Dec 3.

引用本文的文献

SHAP-Based Identification of Potential Acoustic Biomarkers in Patients with Post-Thyroidectomy Voice Disorder.基于SHAP的甲状腺切除术后嗓音障碍患者潜在声学生物标志物的识别

Diagnostics (Basel). 2025 Aug 18;15(16):2065. doi: 10.3390/diagnostics15162065.

Research on automatic assessment of the severity of unilateral vocal cord paralysis based on Mel-spectrogram and convolutional neural networks.基于梅尔频谱图和卷积神经网络的单侧声带麻痹严重程度自动评估研究

Biomed Eng Online. 2025 Jun 21;24(1):76. doi: 10.1186/s12938-025-01401-9.

A Deep-Learning Model for Multi-class Audio Classification of Vocal Fold Pathologies in Office Stroboscopy.一种用于办公室频闪喉镜检查中声带病变多类别音频分类的深度学习模型。

Laryngoscope. 2025 Jul;135(7):2428-2436. doi: 10.1002/lary.32036. Epub 2025 Feb 5.

Social Determinants of Health and Functional Brain Connectivity Predict Long-Term Physical Activity in Older Adults with a New Cardiovascular Diagnosis.健康的社会决定因素和功能性脑连接可预测新诊断为心血管疾病的老年人的长期身体活动情况。

medRxiv. 2025 Jan 9:2024.09.30.24314678. doi: 10.1101/2024.09.30.24314678.

AI-readiness for Biomedical Data: Bridge2AI Recommendations.生物医学数据的人工智能准备情况：Bridge2AI 建议

bioRxiv. 2024 Nov 24:2024.10.23.619844. doi: 10.1101/2024.10.23.619844.

New developments in the application of artificial intelligence to laryngology.人工智能在喉科学中的应用新进展。

Curr Opin Otolaryngol Head Neck Surg. 2024 Dec 1;32(6):391-397. doi: 10.1097/MOO.0000000000000999. Epub 2024 Jul 24.

本文引用的文献

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.停止为高风险决策解释黑箱机器学习模型，转而使用可解释模型。

Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.

Digital medicine and the curse of dimensionality.数字医学与维度诅咒

NPJ Digit Med. 2021 Oct 28;4(1):153. doi: 10.1038/s41746-021-00521-5.

Preventing dataset shift from breaking machine-learning biomarkers.防止数据集转移导致机器学习生物标志物失效。

Gigascience. 2021 Sep 28;10(9). doi: 10.1093/gigascience/giab055.

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study.深度学习在声门疾病预测中的应用：通过语音识别——初步开发研究

J Med Internet Res. 2021 Jun 8;23(6):e25247. doi: 10.2196/25247.

Reproducibility of Voice Analysis with Machine Learning.机器学习语音分析的可重复性

Mov Disord. 2021 May;36(5):1282-1283. doi: 10.1002/mds.28604.

Cepstral Peak Prominence Values for Clinical Voice Evaluation.复声强度值在临床嗓音评估中的应用。

Am J Speech Lang Pathol. 2020 Aug 4;29(3):1596-1607. doi: 10.1044/2020_AJSLP-20-00001. Epub 2020 Jul 13.

Automated assessment of psychiatric disorders using speech: A systematic review.使用语音对精神疾病进行自动评估：一项系统综述。

Laryngoscope Investig Otolaryngol. 2020 Jan 31;5(1):96-116. doi: 10.1002/lio2.354. eCollection 2020 Feb.

Decoding phonation with artificial intelligence (DeP AI): Proof of concept.利用人工智能解读发声（DeP AI）：概念验证

Laryngoscope Investig Otolaryngol. 2019 Mar 25;4(3):328-334. doi: 10.1002/lio2.259. eCollection 2019 Jun.

Interpreting encoding and decoding models.解释编码和解码模型。

Curr Opin Neurobiol. 2019 Apr;55:167-179. doi: 10.1016/j.conb.2019.04.002. Epub 2019 Apr 28.

The Voice and the Larynx in Older Adults: What's Normal, and Who Decides?老年人的嗓音与喉部：何为正常，由谁判定？

JAMA Otolaryngol Head Neck Surg. 2018 Jul 1;144(7):572-573. doi: 10.1001/jamaoto.2018.0412.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用可解释机器学习和临床医生评级来识别从音频记录中检测声带麻痹的模型中的偏差。

Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献