Suppr超能文献

基于 SincNet 学习病理性嗓音障碍。

Using SincNet for Learning Pathological Voice Disorders.

机构信息

Department of Electrical Engineering, Yuan Ze University, Taoyuan 320, Taiwan.

Department of Otolaryngology Head and Neck Surgery, Far Eastern Memorial Hospital, New Taipei City 220, Taiwan.

出版信息

Sensors (Basel). 2022 Sep 2;22(17):6634. doi: 10.3390/s22176634.

Abstract

Deep learning techniques such as convolutional neural networks (CNN) have been successfully applied to identify pathological voices. However, the major disadvantage of using these advanced models is the lack of interpretability in explaining the predicted outcomes. This drawback further introduces a bottleneck for promoting the classification or detection of voice-disorder systems, especially in this pandemic period. In this paper, we proposed using a series of learnable sinc functions to replace the very first layer of a commonly used CNN to develop an explainable SincNet system for classifying or detecting pathological voices. The applied sinc filters, a front-end signal processor in SincNet, are critical for constructing the meaningful layer and are directly used to extract the acoustic features for following networks to generate high-level voice information. We conducted our tests on three different Far Eastern Memorial Hospital voice datasets. From our evaluations, the proposed approach achieves the highest 7%-accuracy and 9%-sensitivity improvements from conventional methods and thus demonstrates superior performance in predicting input pathological waveforms of the SincNet system. More importantly, we intended to give possible explanations between the system output and the first-layer extracted speech features based on our evaluated results.

摘要

深度学习技术,如卷积神经网络 (CNN),已成功应用于识别病理性嗓音。然而,使用这些先进模型的主要缺点是缺乏可解释性来解释预测结果。这一缺陷进一步为促进嗓音障碍系统的分类或检测带来了瓶颈,尤其是在当前大流行期间。在本文中,我们提出使用一系列可学习的 sinc 函数来替代常用 CNN 的第一层,以开发一个可解释的 sincNet 系统,用于对病理性嗓音进行分类或检测。应用的 sinc 滤波器是 sincNet 的前端信号处理器,对于构建有意义的层至关重要,并且直接用于提取后续网络的声学特征,以生成高级别的语音信息。我们在三个不同的远东纪念医院语音数据集上进行了测试。从我们的评估结果来看,与传统方法相比,所提出的方法在预测 sincNet 系统的输入病理性波形方面取得了最高 7%的准确率和 9%的灵敏度提高,从而展示了卓越的性能。更重要的是,我们根据评估结果,试图在系统输出和提取的第一层语音特征之间给出可能的解释。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验