• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多任务和迁移学习方法在联合分类和嗓音障碍严重程度估计中的应用。

Multitask and Transfer Learning Approach for Joint Classification and Severity Estimation of Dysphonia.

机构信息

Department of Telecommunications and Media InformaticsBudapest University of Technology and Economics 1117 Budapest Hungary.

出版信息

IEEE J Transl Eng Health Med. 2023 Dec 7;12:233-244. doi: 10.1109/JTEHM.2023.3340345. eCollection 2024.

DOI:10.1109/JTEHM.2023.3340345
PMID:38196819
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10776101/
Abstract

OBJECTIVE

Despite speech being the primary communication medium, it carries valuable information about a speaker's health, emotions, and identity. Various conditions can affect the vocal organs, leading to speech difficulties. Extensive research has been conducted by voice clinicians and academia in speech analysis. Previous approaches primarily focused on one particular task, such as differentiating between normal and dysphonic speech, classifying different voice disorders, or estimating the severity of voice disorders.

METHODS AND PROCEDURES

This study proposes an approach that combines transfer learning and multitask learning (MTL) to simultaneously perform dysphonia classification and severity estimation. Both tasks use a shared representation; network is learned from these shared features. We employed five computer vision models and changed their architecture to support multitask learning. Additionally, we conducted binary 'healthy vs. dysphonia' and multiclass 'healthy vs. organic and functional dysphonia' classification using multitask learning, with the speaker's sex as an auxiliary task.

RESULTS

The proposed method achieved improved performance across all classification metrics compared to single-task learning (STL), which only performs classification or severity estimation. Specifically, the model achieved F1 scores of 93% and 90% in MTL and STL, respectively. Moreover, we observed considerable improvements in both classification tasks by evaluating beta values associated with the weight assigned to the sex-predicting auxiliary task. MTL achieved an accuracy of 77% compared to the STL score of 73.2%. However, the performance of severity estimation in MTL was comparable to STL.

CONCLUSION

Our goal is to improve how voice pathologists and clinicians understand patients' conditions, make it easier to track their progress, and enhance the monitoring of vocal quality and treatment procedures. Clinical and Translational Impact Statement: By integrating both classification and severity estimation of dysphonia using multitask learning, we aim to enable clinicians to gain a better understanding of the patient's situation, effectively monitor their progress and voice quality.

摘要

目的

尽管言语是主要的交流媒介,但它携带着有关说话者健康、情绪和身份的有价值的信息。各种情况都可能影响发声器官,导致言语困难。语音临床医生和学术界已经对语音分析进行了广泛的研究。以前的方法主要侧重于一个特定的任务,例如区分正常语音和发声障碍语音、对不同的语音障碍进行分类,或估计语音障碍的严重程度。

方法和程序

本研究提出了一种结合迁移学习和多任务学习(MTL)的方法,以同时进行发声障碍分类和严重程度估计。这两个任务都使用共享表示;网络从这些共享特征中学习。我们采用了五个计算机视觉模型,并改变了它们的架构以支持多任务学习。此外,我们还使用多任务学习进行了二进制“健康与发声障碍”和多类“健康与器质性和功能性发声障碍”分类,将说话者的性别作为辅助任务。

结果

与仅执行分类或严重程度估计的单任务学习(STL)相比,所提出的方法在所有分类指标上都取得了更好的性能。具体来说,该模型在 MTL 和 STL 中的 F1 得分分别为 93%和 90%。此外,通过评估与性别预测辅助任务相关的权重分配的 beta 值,我们观察到在这两个分类任务中都有相当大的改进。MTL 的准确率为 77%,而 STL 的准确率为 73.2%。然而,MTL 中严重程度估计的性能与 STL 相当。

结论

我们的目标是改善语音病理学家和临床医生对患者病情的理解,使其更容易跟踪患者的进展,并增强对嗓音质量和治疗过程的监测。临床和转化影响陈述:通过使用多任务学习集成发声障碍的分类和严重程度估计,我们旨在使临床医生能够更好地了解患者的情况,有效地监测他们的进展和嗓音质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/f3fd29faf819/aziz6-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/127435c2b50c/aziz1-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/be98765087ad/aziz2-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/159dd5c14236/aziz3-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/ddb3222d8ce1/aziz4-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/9dbfc453d7af/aziz5-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/f3fd29faf819/aziz6-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/127435c2b50c/aziz1-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/be98765087ad/aziz2-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/159dd5c14236/aziz3-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/ddb3222d8ce1/aziz4-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/9dbfc453d7af/aziz5-3340345.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca9e/10776101/f3fd29faf819/aziz6-3340345.jpg

相似文献

1
Multitask and Transfer Learning Approach for Joint Classification and Severity Estimation of Dysphonia.多任务和迁移学习方法在联合分类和嗓音障碍严重程度估计中的应用。
IEEE J Transl Eng Health Med. 2023 Dec 7;12:233-244. doi: 10.1109/JTEHM.2023.3340345. eCollection 2024.
2
Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.深度学习在嗓音障碍自动检测中的应用:比较声学特征并开发一个可推广的框架。
Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.
3
Dysphonia Interference in Schoolteachers' Speech Intelligibility in the Classroom.教师课堂嗓音障碍对言语清晰度的影响。
J Voice. 2024 Mar;38(2):316-324. doi: 10.1016/j.jvoice.2021.09.004. Epub 2021 Nov 9.
4
Quantifying and Improving the Performance of Speech Recognition Systems on Dysphonic Speech.量化并提高语音识别系统对嗓音障碍语音的性能。
Otolaryngol Head Neck Surg. 2023 May;168(5):1130-1138. doi: 10.1002/ohn.170. Epub 2023 Jan 24.
5
A comparison of Dysphonia Severity Index and Acoustic Voice Quality Index measures in differentiating normal and dysphonic voices.嗓音障碍严重程度指数与声学嗓音质量指数在区分正常嗓音和嗓音障碍嗓音方面的比较。
Eur Arch Otorhinolaryngol. 2018 Apr;275(4):949-958. doi: 10.1007/s00405-018-4903-x. Epub 2018 Feb 13.
6
End-to-end deep learning classification of vocal pathology using stacked vowels.使用叠加元音的端到端深度学习进行嗓音病理学分类
Laryngoscope Investig Otolaryngol. 2023 Aug 31;8(5):1312-1318. doi: 10.1002/lio2.1144. eCollection 2023 Oct.
7
Student training to perceptually assess severity of dysphonia using the dysphonic severity percentage scale.使用嗓音障碍严重程度百分比分级对学生进行感知评估嗓音障碍严重程度的培训。
J Voice. 2013 Sep;27(5):611-6. doi: 10.1016/j.jvoice.2013.03.016. Epub 2013 Jul 19.
8
Voice analysis in adductor spasmodic dysphonia: Objective diagnosis and response to botulinum toxin.喉肌痉挛性发音障碍的嗓音分析:客观诊断和肉毒毒素反应。
Parkinsonism Relat Disord. 2020 Apr;73:23-30. doi: 10.1016/j.parkreldis.2020.03.012. Epub 2020 Mar 19.
9
Predictive value and discriminant capacity of cepstral- and spectral-based measures during continuous speech.基于倒谱和谱的语音连续语音分析的预测价值和判别能力。
J Voice. 2013 Jul;27(4):393-400. doi: 10.1016/j.jvoice.2013.02.005. Epub 2013 May 16.
10
Acoustic and Perceptual Classification of Within-sample Normal, Intermittently Dysphonic, and Consistently Dysphonic Voice Types.样本内正常、间歇性发声障碍和持续性发声障碍嗓音类型的声学及感知分类
J Voice. 2017 Mar;31(2):218-228. doi: 10.1016/j.jvoice.2016.04.016. Epub 2016 May 27.

本文引用的文献

1
Treatment of relapsing functional and organic dysphonia: a narrative literature review.复发性功能性和器质性发声障碍的治疗:一项叙述性文献综述。
Acta Otorhinolaryngol Ital. 2023 Apr;43(Suppl 1):S84-S94. doi: 10.14639/0392-100X-suppl.1-43-2023-11.
2
Different Performances of Machine Learning Models to Classify Dysphonic and Non-Dysphonic Voices.机器学习模型对有声障碍和无声音障碍语音进行分类的不同表现。
J Voice. 2025 May;39(3):577-590. doi: 10.1016/j.jvoice.2022.11.001. Epub 2022 Dec 10.
3
The applicability of the Beck Depression Inventory and Hamilton Depression Scale in the automatic recognition of depression based on speech signal processing.
贝克抑郁量表和汉密尔顿抑郁量表在基于语音信号处理的抑郁症自动识别中的适用性。
Front Psychiatry. 2022 Aug 4;13:879896. doi: 10.3389/fpsyt.2022.879896. eCollection 2022.
4
Frameworks, Terminology and Definitions Used for the Classification of Voice Disorders: A Scoping Review.用于语音障碍分类的框架、术语和定义:范围综述。
J Voice. 2024 Sep;38(5):1070-1087. doi: 10.1016/j.jvoice.2022.02.009. Epub 2022 Mar 20.
5
Voice in Parkinson's Disease: A Machine Learning Study.帕金森病中的语音:一项机器学习研究。
Front Neurol. 2022 Feb 15;13:831428. doi: 10.3389/fneur.2022.831428. eCollection 2022.
6
Development of a machine-learning based voice disorder screening tool.基于机器学习的语音障碍筛查工具的开发。
Am J Otolaryngol. 2022 Mar-Apr;43(2):103327. doi: 10.1016/j.amjoto.2021.103327. Epub 2021 Dec 15.
7
Comparative Analysis of CNN and RNN for Voice Pathology Detection.卷积神经网络(CNN)和循环神经网络(RNN)在语音病理学检测中的比较分析。
Biomed Res Int. 2021 Apr 14;2021:6635964. doi: 10.1155/2021/6635964. eCollection 2021.
8
Deep Neural Network for Automatic Classification of Pathological Voice Signals.深度神经网络在病理嗓音信号自动分类中的应用。
J Voice. 2022 Mar;36(2):288.e15-288.e24. doi: 10.1016/j.jvoice.2020.05.029. Epub 2020 Jul 10.
9
Convolutional Neural Networks for Pathological Voice Detection.用于病理性语音检测的卷积神经网络
Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:1-4. doi: 10.1109/EMBC.2018.8513222.
10
Clinical Practice Guideline: Hoarseness (Dysphonia) (Update).临床实践指南:声音嘶哑(发声障碍)(更新)。
Otolaryngol Head Neck Surg. 2018 Mar;158(1_suppl):S1-S42. doi: 10.1177/0194599817751030.