Kim Hyun-Bum, Song Jaemin, Park Seho, Lee Yong Oh
Department of Otolaryngology-Head and Neck Surgery, The Catholic University of Korea, Seoul, South Korea.
Department of Industrial and Data Engineering, Hongik University, Seoul, South Korea.
Sci Rep. 2024 Apr 23;14(1):9297. doi: 10.1038/s41598-024-58817-x.
Voice change is often the first sign of laryngeal cancer, leading to diagnosis through hospital laryngoscopy. Screening for laryngeal cancer solely based on voice could enhance early detection. However, identifying voice indicators specific to laryngeal cancer is challenging, especially when differentiating it from other laryngeal ailments. This study presents an artificial intelligence model designed to distinguish between healthy voices, laryngeal cancer voices, and those of the other laryngeal conditions. We gathered voice samples of individuals with laryngeal cancer, vocal cord paralysis, benign mucosal diseases, and healthy participants. Comprehensive testing was conducted to determine the best mel-frequency cepstral coefficient conversion and machine learning techniques, with results analyzed in-depth. In our tests, laryngeal diseases distinguishing from healthy voices achieved an accuracy of 0.85-0.97. However, when multiclass classification, accuracy ranged from 0.75 to 0.83. These findings highlight the challenges of artificial intelligence-driven voice-based diagnosis due to overlaps with benign conditions but also underscore its potential.
声音变化往往是喉癌的首个迹象,可通过医院喉镜检查得以诊断。仅基于声音对喉癌进行筛查能够提高早期检测率。然而,识别喉癌特有的声音指标具有挑战性,尤其是在将其与其他喉部疾病区分开来时。本研究提出了一种人工智能模型,旨在区分健康嗓音、喉癌嗓音以及其他喉部疾病的嗓音。我们收集了喉癌患者、声带麻痹患者、良性黏膜疾病患者以及健康参与者的声音样本。进行了全面测试以确定最佳的梅尔频率倒谱系数转换和机器学习技术,并对结果进行了深入分析。在我们的测试中,将喉部疾病与健康嗓音区分开来的准确率达到了0.85至0.97。然而,在多类分类时,准确率范围为0.75至0.83。这些发现凸显了基于人工智能的声音诊断因与良性病症存在重叠而面临的挑战,但同时也强调了其潜力。