Suppr超能文献

通过可解释人工智能实现语音障碍的可微性。

Differentiability of voice disorders through explainable AI.

作者信息

Özcan Fatma

机构信息

Biophysics Department in Faculty of Medicine, Kahramanmaras Sutcu Imam University, 46100, Kahramanmaraş, Turkey.

出版信息

Sci Rep. 2025 May 25;15(1):18250. doi: 10.1038/s41598-025-03444-3.

Abstract

The voice can be affected by various types of pathology. The phoniatric medical examination is the acoustic analysis, which evaluates the characteristic parameters extracted from the vocal signal. Computer-assisted decision-making systems can help specialists to detect vocal pathologies using only the patient's voice. In this study, transfer learning techniques are used to perform the acoustic analysis. Fine-tuned OpenL3 then predicts whether or not the signals contain a pathology by classifying them under 8 different pathologies. A publicly available dataset is used with the categories Hyperkinetic dysphonia, Hypokinetic dysphonia, reflux laryngitis vocal fold nodules, prolapse, glottic insufficiency and vocal fold paralysis in addition to the Healthy class. The results obtained are very convincing. The accuracy with OpenL3, using tranfer learning, was 99.44%. In addition, explainable decision support systems (XDSS) provide an in-depth understanding of the decision-making process. Obtaining an image resulting from the averaging of all the Occlusion Sensitivity maps will enable us to understand the spatio-temporal characteristics of the disordered voices used for classification. Thanks to explainability methods, a new term, the differentiability, can be discussed to explain the black-box operation of deep networks. For purposes of rapid diagnosis and prevention, this work could provide more detail on disordered voices by enabling a promising explainable diagnosis.

摘要

嗓音会受到各种类型病变的影响。嗓音医学检查是声学分析,它评估从嗓音信号中提取的特征参数。计算机辅助决策系统可以帮助专家仅通过患者的嗓音来检测嗓音病变。在本研究中,使用迁移学习技术进行声学分析。经过微调的OpenL3然后通过将信号分类到8种不同病变中来预测信号是否包含病变。除了健康类别外,还使用了一个公开可用的数据集,其类别包括运动亢进性发音障碍、运动减退性发音障碍、反流性喉炎、声带小结、脱垂、声门功能不全和声带麻痹。获得的结果非常令人信服。使用迁移学习的OpenL3的准确率为99.44%。此外,可解释决策支持系统(XDSS)提供了对决策过程的深入理解。获得由所有遮挡敏感度图的平均值产生的图像将使我们能够理解用于分类的紊乱嗓音的时空特征。借助可解释性方法,可以讨论一个新术语——可微性,以解释深度网络的黑箱操作。为了快速诊断和预防,这项工作可以通过实现有前景的可解释诊断来提供关于紊乱嗓音的更多细节。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验