• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用叠加元音的端到端深度学习进行嗓音病理学分类

End-to-end deep learning classification of vocal pathology using stacked vowels.

作者信息

Liu George S, Hodges Jordan M, Yu Jingzhi, Sung C Kwang, Erickson-DiRenzo Elizabeth, Doyle Philip C

机构信息

Department of Otolaryngology Head and Neck Surgery Stanford University School of Medicine, Stanford University Stanford California USA.

Division of Laryngology Stanford University School of Medicine, Stanford University Stanford California USA.

出版信息

Laryngoscope Investig Otolaryngol. 2023 Aug 31;8(5):1312-1318. doi: 10.1002/lio2.1144. eCollection 2023 Oct.

DOI:10.1002/lio2.1144
PMID:37899847
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10601590/
Abstract

OBJECTIVES

Advances in artificial intelligence (AI) technology have increased the feasibility of classifying voice disorders using voice recordings as a screening tool. This work develops upon previous models that take in single vowel recordings by analyzing multiple vowel recordings simultaneously to enhance prediction of vocal pathology.

METHODS

Voice samples from the Saarbruecken Voice Database, including three sustained vowels (/a/, /i/, /u/) from 687 healthy human participants and 334 dysphonic patients, were used to train 1-dimensional convolutional neural network models for multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings. Three models were trained: (1) a baseline model that analyzed individual vowels in isolation, (2) a stacked vowel model that analyzed three vowels (/a/, /i/, /u/) in the neutral pitch simultaneously, and (3) a stacked pitch model that analyzed the /a/ vowel in three pitches (low, neutral, and high) simultaneously.

RESULTS

For multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings, the stacked vowel model demonstrated higher performance compared with the baseline and stacked pitch models (F1 score 0.81 vs. 0.77 and 0.78, respectively). Specifically, the stacked vowel model achieved higher performance for class-specific classification of hyperfunctional dysphonia voice samples compared with the baseline and stacked pitch models (F1 score 0.56 vs. 0.49 and 0.50, respectively).

CONCLUSIONS

This study demonstrates the feasibility and potential of analyzing multiple sustained vowel recordings simultaneously to improve AI-driven screening and classification of vocal pathology. The stacked vowel model architecture in particular offers promise to enhance such an approach.

LAY SUMMARY

AI analysis of multiple vowel recordings can improve classification of voice pathologies compared with models using a single sustained vowel and offer a strategy to enhance AI-driven screening of voice disorders.

LEVEL OF EVIDENCE

摘要

目的

人工智能(AI)技术的进步提高了使用语音记录作为筛查工具对语音障碍进行分类的可行性。这项工作在之前仅分析单个元音记录的模型基础上进行拓展,通过同时分析多个元音记录来增强对嗓音病理学的预测。

方法

来自萨尔布吕肯语音数据库的语音样本,包括687名健康人类参与者和334名发声障碍患者的三个持续元音(/a/、/i/、/u/),用于训练一维卷积神经网络模型,以对健康、功能亢进性发声障碍和喉炎语音记录进行多类分类。训练了三个模型:(1)一个单独分析单个元音的基线模型;(2)一个同时分析中性音高的三个元音(/a/、/i/、/u/)的堆叠元音模型;(3)一个同时分析三个音高(低、中性和高)的/a/元音的堆叠音高模型。

结果

对于健康、功能亢进性发声障碍和喉炎语音记录的多类分类,堆叠元音模型与基线模型和堆叠音高模型相比表现出更高的性能(F1分数分别为0.81、0.77和0.78)。具体而言,与基线模型和堆叠音高模型相比,堆叠元音模型在功能亢进性发声障碍语音样本的特定类别分类中表现出更高的性能(F1分数分别为0.56、0.49和0.50)。

结论

本研究证明了同时分析多个持续元音记录以改善人工智能驱动的嗓音病理学筛查和分类的可行性和潜力。特别是堆叠元音模型架构有望增强这种方法。

简要总结

与使用单个持续元音的模型相比,对多个元音记录进行人工智能分析可以改善语音病理学的分类,并提供一种增强人工智能驱动的语音障碍筛查的策略。

证据水平

3级

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f511/10601590/ff3295db061e/LIO2-8-1312-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f511/10601590/7b74c014dd05/LIO2-8-1312-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f511/10601590/ff3295db061e/LIO2-8-1312-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f511/10601590/7b74c014dd05/LIO2-8-1312-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f511/10601590/ff3295db061e/LIO2-8-1312-g002.jpg

相似文献

1
End-to-end deep learning classification of vocal pathology using stacked vowels.使用叠加元音的端到端深度学习进行嗓音病理学分类
Laryngoscope Investig Otolaryngol. 2023 Aug 31;8(5):1312-1318. doi: 10.1002/lio2.1144. eCollection 2023 Oct.
2
Developing an Artificial Intelligence Tool to Predict Vocal Cord Pathology in Primary Care Settings.开发一种人工智能工具,以预测初级保健环境中的声带病变。
Laryngoscope. 2023 Aug;133(8):1952-1960. doi: 10.1002/lary.30432. Epub 2022 Oct 13.
3
Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.深度学习在嗓音障碍自动检测中的应用:比较声学特征并开发一个可推广的框架。
Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.
4
Multitask and Transfer Learning Approach for Joint Classification and Severity Estimation of Dysphonia.多任务和迁移学习方法在联合分类和嗓音障碍严重程度估计中的应用。
IEEE J Transl Eng Health Med. 2023 Dec 7;12:233-244. doi: 10.1109/JTEHM.2023.3340345. eCollection 2024.
5
The Formant Bandwidth as a Measure of Vowel Intelligibility in Dysphonic Speech.共振峰带宽作为衡量嗓音障碍语音中元音可懂度的指标。
J Voice. 2023 Mar;37(2):173-177. doi: 10.1016/j.jvoice.2020.10.012. Epub 2020 Nov 2.
6
Voice disorder discrimination using vowel acoustic measures in female speakers.基于元音声学特征的女性嗓音障碍判别。
Int J Lang Commun Disord. 2024 Sep-Oct;59(5):2087-2102. doi: 10.1111/1460-6984.13081. Epub 2024 Jun 17.
7
Design and Validation of a New Diagnostic Tool for the Differentiation of Pathological Voices in Parkinsonian Patients.设计和验证一种用于帕金森病患者病理性声音鉴别诊断的新工具。
Adv Exp Med Biol. 2021;1339:77-83. doi: 10.1007/978-3-030-78787-5_11.
8
Perceptual and Quantitative Assessment of Dysphonia Across Vowel Categories.在不同元音类别下的嗓音障碍的感知和定量评估。
J Voice. 2019 Jul;33(4):473-481. doi: 10.1016/j.jvoice.2017.12.018. Epub 2018 May 24.
9
Perception and Acoustic Studies of Vowel Intelligibility in Dysphonic Speech.嗓音障碍言语中元音可懂度的感知与声学研究。
J Voice. 2021 Jul;35(4):659.e11-659.e24. doi: 10.1016/j.jvoice.2019.12.022. Epub 2020 Jan 15.
10
Effects of vocal intensity and vowel type on cepstral analysis of voice.嗓音的声强和母音类型对声道倒频谱分析的影响。
J Voice. 2012 Sep;26(5):670.e15-20. doi: 10.1016/j.jvoice.2011.12.001. Epub 2012 Apr 3.

引用本文的文献

1
Scoping review of deep learning research illuminates artificial intelligence chasm in otolaryngology-head and neck surgery.深度学习研究的范围综述揭示了耳鼻咽喉头颈外科领域人工智能的差距。
NPJ Digit Med. 2025 May 10;8(1):265. doi: 10.1038/s41746-025-01693-0.

本文引用的文献

1
Signal to noise ratio quantifies the contribution of spectral channels to classification of human head and neck tissues using deep learning and multispectral imaging.信噪比通过深度学习和多光谱成像量化了光谱通道对人体头颈部组织分类的贡献。
J Biomed Opt. 2023 Jan;28(1):016004. doi: 10.1117/1.JBO.28.1.016004. Epub 2023 Jan 28.
2
The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection.MFCC 帧数对自动语音病理学检测的影响。
J Voice. 2024 Sep;38(5):975-982. doi: 10.1016/j.jvoice.2022.03.021. Epub 2022 Apr 27.
3
Deep learning classification of inverted papilloma malignant transformation using 3D convolutional neural networks and magnetic resonance imaging.
使用 3D 卷积神经网络和磁共振成像对内翻性乳头状瘤恶变进行深度学习分类。
Int Forum Allergy Rhinol. 2022 Aug;12(8):1025-1033. doi: 10.1002/alr.22958. Epub 2022 Jan 18.
4
Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study.深度学习在声门疾病预测中的应用:通过语音识别——初步开发研究
J Med Internet Res. 2021 Jun 8;23(6):e25247. doi: 10.2196/25247.
5
Comparative Analysis of CNN and RNN for Voice Pathology Detection.卷积神经网络(CNN)和循环神经网络(RNN)在语音病理学检测中的比较分析。
Biomed Res Int. 2021 Apr 14;2021:6635964. doi: 10.1155/2021/6635964. eCollection 2021.
6
Evaluation of Acoustic Analyses of Voice in Nonoptimized Conditions.非优化条件下嗓音声学分析评估。
J Speech Lang Hear Res. 2020 Dec 14;63(12):3991-3999. doi: 10.1044/2020_JSLHR-20-00212. Epub 2020 Nov 13.
7
An Updated Theoretical Framework for Vocal Hyperfunction.发声亢进的更新理论框架
Am J Speech Lang Pathol. 2020 Nov 12;29(4):2254-2260. doi: 10.1044/2020_AJSLP-20-00104. Epub 2020 Oct 2.
8
Deep Neural Network for Automatic Classification of Pathological Voice Signals.深度神经网络在病理嗓音信号自动分类中的应用。
J Voice. 2022 Mar;36(2):288.e15-288.e24. doi: 10.1016/j.jvoice.2020.05.029. Epub 2020 Jul 10.
9
Decoding phonation with artificial intelligence (DeP AI): Proof of concept.利用人工智能解读发声(DeP AI):概念验证
Laryngoscope Investig Otolaryngol. 2019 Mar 25;4(3):328-334. doi: 10.1002/lio2.259. eCollection 2019 Jun.
10
A contemporary review of machine learning in otolaryngology-head and neck surgery.当代综述:机器学习在耳鼻喉-头颈外科中的应用。
Laryngoscope. 2020 Jan;130(1):45-51. doi: 10.1002/lary.27850. Epub 2019 Feb 1.