Maskeliūnas Rytis, Kulikajevas Audrius, Damaševičius Robertas, Pribuišis Kipras, Ulozaitė-Stanienė Nora, Uloza Virgilijus
Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania.
Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania.
Cancers (Basel). 2022 May 11;14(10):2366. doi: 10.3390/cancers14102366.
Laryngeal carcinoma is the most common malignant tumor of the upper respiratory tract. Total laryngectomy provides complete and permanent detachment of the upper and lower airways that causes the loss of voice, leading to a patient's inability to verbally communicate in the postoperative period. This paper aims to exploit modern areas of deep learning research to objectively classify, extract and measure the substitution voicing after laryngeal oncosurgery from the audio signal. We propose using well-known convolutional neural networks (CNNs) applied for image classification for the analysis of voice audio signal. Our approach takes an input of Mel-frequency spectrogram (MFCC) as an input of deep neural network architecture. A database of digital speech recordings of 367 male subjects (279 normal speech samples and 88 pathological speech samples) was used. Our approach has shown the best true-positive rate of any of the compared state-of-the-art approaches, achieving an overall accuracy of 89.47%.
喉癌是上呼吸道最常见的恶性肿瘤。全喉切除术会使上下呼吸道完全永久性分离,导致失声,致使患者在术后无法进行言语交流。本文旨在利用深度学习研究的现代领域,从音频信号中客观地对上喉肿瘤切除术后的替代发声进行分类、提取和测量。我们建议使用用于图像分类的著名卷积神经网络(CNN)来分析语音音频信号。我们的方法将梅尔频率倒谱系数(MFCC)作为深度神经网络架构的输入。使用了一个包含367名男性受试者数字语音记录的数据库(279个正常语音样本和88个病理语音样本)。我们的方法在所有比较的现有先进方法中显示出最佳的真阳性率,总体准确率达到89.47%。