• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用可解释机器学习和临床医生评级来识别从音频记录中检测声带麻痹的模型中的偏差。

Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings.

作者信息

Low Daniel M, Rao Vishwanatha, Randolph Gregory, Song Phillip C, Ghosh Satrajit S

机构信息

Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA.

McGovern Institute for Brain Research, MIT, Cambridge, MA, USA.

出版信息

medRxiv. 2024 Mar 20:2020.11.23.20235945. doi: 10.1101/2020.11.23.20235945.

DOI:10.1101/2020.11.23.20235945
PMID:33501466
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7836138/
Abstract

INTRODUCTION

Detecting voice disorders from voice recordings could allow for frequent, remote, and low-cost screening before costly clinical visits and a more invasive laryngoscopy examination. Our goals were to detect unilateral vocal fold paralysis (UVFP) from voice recordings using machine learning, to identify which acoustic variables were important for prediction to increase trust, and to determine model performance relative to clinician performance.

METHODS

Patients with confirmed UVFP through endoscopic examination (N=77) and controls with normal voices matched for age and sex (N=77) were included. Voice samples were elicited by reading the Rainbow Passage and sustaining phonation of the vowel "a". Four machine learning models of differing complexity were used. SHapley Additive explanations (SHAP) was used to identify important features.

RESULTS

The highest median bootstrapped ROC AUC score was 0.87 and beat clinician's performance (range: 0.74 - 0.81) based on the recordings. Recording durations were different between UVFP recordings and controls due to how that data was originally processed when storing, which we can show can classify both groups. And counterintuitively, many UVFP recordings had higher intensity than controls, when UVFP patients tend to have weaker voices, revealing a dataset-specific bias which we mitigate in an additional analysis.

CONCLUSION

We demonstrate that recording biases in audio duration and intensity created dataset-specific differences between patients and controls, which models used to improve classification. Furthermore, clinician's ratings provide further evidence that patients were over-projecting their voices and being recorded at a higher amplitude signal than controls. Interestingly, after matching audio duration and removing variables associated with intensity in order to mitigate the biases, the models were able to achieve a similar high performance. We provide a set of recommendations to avoid bias when building and evaluating machine learning models for screening in laryngology.

摘要

引言

通过语音记录检测语音障碍可以在进行昂贵的临床就诊和侵入性更强的喉镜检查之前,实现频繁、远程且低成本的筛查。我们的目标是使用机器学习从语音记录中检测单侧声带麻痹(UVFP),确定哪些声学变量对预测很重要以增强可信度,并确定模型相对于临床医生表现的性能。

方法

纳入经内镜检查确诊为UVFP的患者(N = 77)以及年龄和性别匹配的嗓音正常的对照组(N = 77)。通过朗读《彩虹段落》和持续发元音“a”来获取语音样本。使用了四种不同复杂度的机器学习模型。采用SHapley加性解释(SHAP)来识别重要特征。

结果

基于记录,最高的中位数自展ROC AUC分数为0.87,超过了临床医生的表现(范围:0.74 - 0.81)。由于存储时数据的原始处理方式,UVFP记录和对照组的录音时长不同,不过我们可以证明这能够对两组进行分类。而且与直觉相反的是,许多UVFP记录的强度高于对照组,而UVFP患者的嗓音往往较弱,这揭示了特定于数据集的偏差,我们在额外分析中对其进行了缓解。

结论

我们证明了音频时长和强度方面的记录偏差在患者和对照组之间造成了特定于数据集的差异,模型利用这些差异来改善分类。此外,临床医生的评分进一步证明患者在过度发声,并且录音时的信号幅度高于对照组。有趣的是,在匹配音频时长并去除与强度相关的变量以减轻偏差后

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/c0875e994ce2/nihpp-2020.11.23.20235945v8-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/4e585ec770ca/nihpp-2020.11.23.20235945v8-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/b3c732b46f00/nihpp-2020.11.23.20235945v8-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/9260f3130a41/nihpp-2020.11.23.20235945v8-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/8c7673fa336b/nihpp-2020.11.23.20235945v8-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/02564b4bee2a/nihpp-2020.11.23.20235945v8-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/7c433fbcd594/nihpp-2020.11.23.20235945v8-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/0c20e023f08a/nihpp-2020.11.23.20235945v8-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/c0875e994ce2/nihpp-2020.11.23.20235945v8-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/4e585ec770ca/nihpp-2020.11.23.20235945v8-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/b3c732b46f00/nihpp-2020.11.23.20235945v8-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/9260f3130a41/nihpp-2020.11.23.20235945v8-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/8c7673fa336b/nihpp-2020.11.23.20235945v8-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/02564b4bee2a/nihpp-2020.11.23.20235945v8-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/7c433fbcd594/nihpp-2020.11.23.20235945v8-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/0c20e023f08a/nihpp-2020.11.23.20235945v8-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/954a/10958572/c0875e994ce2/nihpp-2020.11.23.20235945v8-f0008.jpg

相似文献

1
Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings.使用可解释机器学习和临床医生评级来识别从音频记录中检测声带麻痹的模型中的偏差。
medRxiv. 2024 Mar 20:2020.11.23.20235945. doi: 10.1101/2020.11.23.20235945.
2
Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings.使用可解释机器学习和临床医生评级来识别从音频记录中检测声带麻痹的模型中的偏差。
PLOS Digit Health. 2024 May 30;3(5):e0000516. doi: 10.1371/journal.pdig.0000516. eCollection 2024 May.
3
Relating Cepstral Peak Prominence to Cyclical Parameters of Vocal Fold Vibration from High-Speed Videoendoscopy Using Machine Learning: A Pilot Study.使用机器学习将声道倒谱峰值凸显度与声带高速视频内窥镜检查的周期性参数相关联:一项初步研究。
J Voice. 2021 Sep;35(5):703-716. doi: 10.1016/j.jvoice.2020.01.026. Epub 2020 Mar 12.
4
Perceptual ratings of vocal characteristics and voicing features in untreated patients with unilateral vocal fold paralysis.单侧声带麻痹未经治疗患者的嗓音特征和发声特点的感知评分
J Commun Disord. 2005 May-Jun;38(3):163-85. doi: 10.1016/j.jcomdis.2004.08.001.
5
Comparison of voice therapy and selective electrical stimulation of the larynx in early unilateral vocal fold paralysis after thyroid surgery: A retrospective data analysis.甲状腺手术后早期单侧声带麻痹的嗓音治疗与喉选择性电刺激的比较:回顾性数据分析。
Clin Otolaryngol. 2021 May;46(3):530-537. doi: 10.1111/coa.13703. Epub 2021 Jan 27.
6
Analysis of vocal fold function from acoustic data simultaneously recorded with high-speed endoscopy.从高速内窥镜同时记录的声学数据中分析声带功能。
J Voice. 2012 Nov;26(6):726-33. doi: 10.1016/j.jvoice.2012.02.001. Epub 2012 May 25.
7
Longitudinal Voice Outcomes After Voice Therapy in Unilateral Vocal Fold Paralysis.单侧声带麻痹患者嗓音治疗后的纵向嗓音结果
J Voice. 2016 Nov;30(6):767.e9-767.e15. doi: 10.1016/j.jvoice.2015.10.018. Epub 2015 Dec 3.
8
Phonatory aerodynamics in connected speech.连贯言语中的发声空气动力学。
Laryngoscope. 2015 Dec;125(12):2764-71. doi: 10.1002/lary.25458. Epub 2015 Jul 21.
9
Glottal Stop Production in Controls and Patients With Unilateral Vocal Fold Paresis/Paralysis.声门紧闭在单侧声带麻痹/瘫痪患者与正常对照中的产生情况。
J Speech Lang Hear Res. 2022 Sep 12;65(9):3392-3404. doi: 10.1044/2022_JSLHR-21-00599. Epub 2022 Aug 31.
10
Effect of intralaryngeal muscle synkinesis on perception of voice handicap in patients with unilateral vocal fold paralysis.喉内肌联带运动对单侧声带麻痹患者嗓音障碍感知的影响
Laryngoscope. 2017 Jul;127(7):1628-1632. doi: 10.1002/lary.26390. Epub 2017 Jan 20.

本文引用的文献

1
Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.停止为高风险决策解释黑箱机器学习模型,转而使用可解释模型。
Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.
2
Digital medicine and the curse of dimensionality.数字医学与维度诅咒
NPJ Digit Med. 2021 Oct 28;4(1):153. doi: 10.1038/s41746-021-00521-5.
3
Preventing dataset shift from breaking machine-learning biomarkers.防止数据集转移导致机器学习生物标志物失效。
Gigascience. 2021 Sep 28;10(9). doi: 10.1093/gigascience/giab055.
4
Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study.深度学习在声门疾病预测中的应用:通过语音识别——初步开发研究
J Med Internet Res. 2021 Jun 8;23(6):e25247. doi: 10.2196/25247.
5
Reproducibility of Voice Analysis with Machine Learning.机器学习语音分析的可重复性
Mov Disord. 2021 May;36(5):1282-1283. doi: 10.1002/mds.28604.
6
Cepstral Peak Prominence Values for Clinical Voice Evaluation.复声强度值在临床嗓音评估中的应用。
Am J Speech Lang Pathol. 2020 Aug 4;29(3):1596-1607. doi: 10.1044/2020_AJSLP-20-00001. Epub 2020 Jul 13.
7
Automated assessment of psychiatric disorders using speech: A systematic review.使用语音对精神疾病进行自动评估:一项系统综述。
Laryngoscope Investig Otolaryngol. 2020 Jan 31;5(1):96-116. doi: 10.1002/lio2.354. eCollection 2020 Feb.
8
Decoding phonation with artificial intelligence (DeP AI): Proof of concept.利用人工智能解读发声(DeP AI):概念验证
Laryngoscope Investig Otolaryngol. 2019 Mar 25;4(3):328-334. doi: 10.1002/lio2.259. eCollection 2019 Jun.
9
Interpreting encoding and decoding models.解释编码和解码模型。
Curr Opin Neurobiol. 2019 Apr;55:167-179. doi: 10.1016/j.conb.2019.04.002. Epub 2019 Apr 28.
10
The Voice and the Larynx in Older Adults: What's Normal, and Who Decides?老年人的嗓音与喉部:何为正常,由谁判定?
JAMA Otolaryngol Head Neck Surg. 2018 Jul 1;144(7):572-573. doi: 10.1001/jamaoto.2018.0412.