• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用语音数据进行喉疾病分类:倍频程滤波器与梅尔频率滤波器

Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters.

作者信息

Song Jaemin, Kim Hyunbum, Lee Yong Oh

机构信息

Department of Industrial and Data Engineering, Hongik University, Seoul, South Korea.

Department of Otolaryngology-Head and Neck Surgery, The Catholic University of Korea, Seoul, South Korea.

出版信息

Heliyon. 2024 Nov 30;10(24):e40748. doi: 10.1016/j.heliyon.2024.e40748. eCollection 2024 Dec 30.

DOI:10.1016/j.heliyon.2024.e40748
PMID:39720068
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11667598/
Abstract

INTRODUCTION

Laryngeal cancer diagnosis relies on specialist examinations, but non-invasive methods using voice data are emerging with artificial intelligence (AI) advancements. Mel Frequency Cepstral Coefficients (MFCCs) are widely used for voice analysis, but Octave Frequency Spectrum Energy (OFSE) may offer better accuracy in detecting subtle voice changes.

PROBLEM STATEMENT

Accurate early diagnosis of laryngeal cancer through voice data is challenging with current methods like MFCC.

OBJECTIVES

This study compares the effectiveness of MFCC and OFSE in classifying voice data into healthy, laryngeal cancer, benign mucosal disease, and vocal fold paralysis categories.

METHODS

Voice samples from 363 patients were analyzed using CNN models, employing MFCC and OFSE with 1/3 octave band filters. Grad-Class Activation Mapping (Grad-CAM) was used to visualize key voice features.

RESULTS

OFSE with 1/3 octave band filters outperformed MFCC in classification accuracy, especially in multi-class classification including laryngeal cancer, benign mucosal disease, and vocal fold paralysis groups (0.9398 ± 0.0232 vs. 0.7061 ± 0.0561). Grad-CAM analysis revealed that OFSE with 1/3 octave band filters effectively distinguished laryngeal cancer from healthy voices by focusing on increased noise in the over-formant area and changes in the fundamental frequency. The analysis also highlighted that specific narrow frequency areas, particularly in vocal fold paralysis, were critical for classification, and benign mucosal diseases occasionally resembled healthy voices, making AI differentiation between benign conditions and laryngeal cancer a significant challenge.

CONCLUSION

OFSE with 1/3 octave band filters provides superior accuracy in diagnosing laryngeal diseases including laryngeal cancer, showing potential for non-invasive, AI-driven early detection.

摘要

引言

喉癌诊断依赖于专业检查,但随着人工智能(AI)的发展,利用语音数据的非侵入性方法正在兴起。梅尔频率倒谱系数(MFCCs)被广泛用于语音分析,但倍频程频谱能量(OFSE)在检测细微语音变化方面可能具有更高的准确性。

问题陈述

目前使用MFCC等方法通过语音数据准确早期诊断喉癌具有挑战性。

目的

本研究比较了MFCC和OFSE在将语音数据分类为健康、喉癌、良性黏膜疾病和声带麻痹类别方面的有效性。

方法

使用卷积神经网络(CNN)模型对363名患者的语音样本进行分析,采用带有1/3倍频程带通滤波器的MFCC和OFSE。梯度类激活映射(Grad-CAM)用于可视化关键语音特征。

结果

带有1/3倍频程带通滤波器的OFSE在分类准确性方面优于MFCC,尤其是在包括喉癌、良性黏膜疾病和声带麻痹组的多类别分类中(0.9398±0.0232对0.7061±0.0561)。Grad-CAM分析表明,带有1/3倍频程带通滤波器的OFSE通过关注共振峰上方区域增加的噪声和基频变化,有效地将喉癌与健康语音区分开来。分析还强调,特定的窄频率区域,特别是在声带麻痹中,对分类至关重要,并且良性黏膜疾病偶尔与健康语音相似,这使得人工智能区分良性疾病和喉癌成为一项重大挑战。

结论

带有1/3倍频程带通滤波器的OFSE在诊断包括喉癌在内的喉部疾病方面具有更高的准确性,显示出非侵入性、人工智能驱动的早期检测潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/363a7d9e1b99/gr18.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/7df9c2564f3b/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/a7ddf8fcd2a9/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/ddcc1389ac09/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/f0e34560e45a/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/a35d82dd054a/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/a892f566a560/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/b18a4e60afbe/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/1477c23da023/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/d0d788836533/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/8387a70bf395/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/fb5b206d3bfe/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/62ace52ee416/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/6db12e96438f/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/af6675e8792a/gr14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/c1193a149ab1/gr15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/624d1b4ee558/gr16.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/0aab0373d89f/gr17.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/363a7d9e1b99/gr18.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/7df9c2564f3b/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/a7ddf8fcd2a9/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/ddcc1389ac09/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/f0e34560e45a/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/a35d82dd054a/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/a892f566a560/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/b18a4e60afbe/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/1477c23da023/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/d0d788836533/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/8387a70bf395/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/fb5b206d3bfe/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/62ace52ee416/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/6db12e96438f/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/af6675e8792a/gr14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/c1193a149ab1/gr15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/624d1b4ee558/gr16.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/0aab0373d89f/gr17.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93b7/11667598/363a7d9e1b99/gr18.jpg

相似文献

1
Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters.使用语音数据进行喉疾病分类:倍频程滤波器与梅尔频率滤波器
Heliyon. 2024 Nov 30;10(24):e40748. doi: 10.1016/j.heliyon.2024.e40748. eCollection 2024 Dec 30.
2
Classification of laryngeal diseases including laryngeal cancer, benign mucosal disease, and vocal cord paralysis by artificial intelligence using voice analysis.利用语音分析通过人工智能对包括喉癌、良性黏膜疾病和声带麻痹在内的喉部疾病进行分类。
Sci Rep. 2024 Apr 23;14(1):9297. doi: 10.1038/s41598-024-58817-x.
3
Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.深度学习在嗓音障碍自动检测中的应用:比较声学特征并开发一个可推广的框架。
Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.
4
Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features.基于多窗梅尔频率倒谱系数特征的嗓音障碍分类
Comput Math Methods Med. 2015;2015:956249. doi: 10.1155/2015/956249. Epub 2015 Nov 22.
5
Detection of Neurogenic Voice Disorders Using the Fisher Vector Representation of Cepstral Features.使用倒谱特征的费舍尔向量表示法检测神经性嗓音障碍
J Voice. 2025 May;39(3):757-763. doi: 10.1016/j.jvoice.2022.10.016. Epub 2022 Nov 21.
6
[Nonlinear acoustic analysis in the evaluation of occupational voice disorders].[非线性声学分析在职业性嗓音障碍评估中的应用]
Med Pr. 2013;64(1):29-35. doi: 10.13075/mp.5893/2013/0004.
7
A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers.一种使用预训练模型和集成分类器对语音障碍进行二分类和多分类的混合方法。
BMC Med Inform Decis Mak. 2025 May 1;25(1):177. doi: 10.1186/s12911-025-02978-w.
8
A Voice Disease Detection Method Based on MFCCs and Shallow CNN.一种基于梅尔频率倒谱系数(MFCCs)和浅层卷积神经网络(CNN)的嗓音疾病检测方法。
J Voice. 2023 Oct 25. doi: 10.1016/j.jvoice.2023.09.024.
9
An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks.基于 MFCC 和深度神经网络的语音病理学检测分析研究。
Comput Math Methods Med. 2022 Apr 4;2022:7814952. doi: 10.1155/2022/7814952. eCollection 2022.
10
Unraveling the complexities of pathological voice through saliency analysis.通过显著分析揭示病理嗓音的复杂性。
Comput Biol Med. 2023 Nov;166:107566. doi: 10.1016/j.compbiomed.2023.107566. Epub 2023 Oct 14.

本文引用的文献

1
Laryngeal Cancer: Epidemiology, Etiology, and Prevention: A Narrative Review.喉癌:流行病学、病因学与预防:一篇叙述性综述
Iran J Public Health. 2023 Nov;52(11):2248-2259. doi: 10.18502/ijph.v52i11.14025.
2
Particle Swarm Optimization-Based Extreme Learning Machine for COVID-19 Detection.基于粒子群优化的极限学习机用于新冠病毒检测
Cognit Comput. 2022 Oct 12:1-16. doi: 10.1007/s12559-022-10063-x.
3
Bright Voice Quality and Fundamental Frequency Variation in Non-binary Speakers.非二元性别者的明亮嗓音特质与基频变化
J Voice. 2025 Jan;39(1):282.e1-282.e17. doi: 10.1016/j.jvoice.2022.08.001. Epub 2022 Oct 6.
4
Explainable COVID-19 detection using fractal dimension and vision transformer with Grad-CAM on cough sounds.使用分形维数和视觉Transformer以及基于咳嗽声音的Grad-CAM进行可解释的COVID-19检测
Biocybern Biomed Eng. 2022 Jul-Sep;42(3):1066-1080. doi: 10.1016/j.bbe.2022.08.005. Epub 2022 Sep 6.
5
Using SincNet for Learning Pathological Voice Disorders.基于 SincNet 学习病理性嗓音障碍。
Sensors (Basel). 2022 Sep 2;22(17):6634. doi: 10.3390/s22176634.
6
Diagnosis of Early Glottic Cancer Using Laryngeal Image and Voice Based on Ensemble Learning of Convolutional Neural Network Classifiers.基于卷积神经网络分类器集成学习的喉图像和嗓音用于早期声门癌诊断
J Voice. 2025 Jan;39(1):245-257. doi: 10.1016/j.jvoice.2022.07.007. Epub 2022 Sep 6.
7
Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study.深度学习在声门疾病预测中的应用:通过语音识别——初步开发研究
J Med Internet Res. 2021 Jun 8;23(6):e25247. doi: 10.2196/25247.
8
Analysis of Speech Fundamental Frequencies for Different Tasks in Japanese.
J Voice. 2023 Mar;37(2):299.e1-299.e8. doi: 10.1016/j.jvoice.2020.12.021. Epub 2021 Jan 15.
9
Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy.卷积神经网络可高精度地对喉癌中的病理性声音变化进行分类。
J Clin Med. 2020 Oct 25;9(11):3415. doi: 10.3390/jcm9113415.
10
Voice Quality in Laryngeal Cancer Patients: A Randomized Controlled Study of the Effect of Voice Rehabilitation.喉癌患者的嗓音质量:嗓音康复效果的随机对照研究。
J Voice. 2020 May;34(3):486.e13-486.e22. doi: 10.1016/j.jvoice.2018.09.011. Epub 2018 Oct 26.