Kim HyunBum, Jeon Juhyeong, Han Yeon Jae, Joo YoungHoon, Lee Jonghwan, Lee Seungchul, Im Sun
Department of Otolaryngology-Head and Neck Surgery, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.
Department of Mechanical Engineering, Pohang University of Science and Technology (POSTECH), Pohang 37673, Korea.
J Clin Med. 2020 Oct 25;9(11):3415. doi: 10.3390/jcm9113415.
Voice changes may be the earliest signs in laryngeal cancer. We investigated whether automated voice signal analysis can be used to distinguish patients with laryngeal cancer from healthy subjects. We extracted features using the software package for speech analysis in phonetics (PRAAT) and calculated the Mel-frequency cepstral coefficients (MFCCs) from voice samples of a vowel sound of /a:/. The proposed method was tested with six algorithms: support vector machine (SVM), extreme gradient boosting (XGBoost), light gradient boosted machine (LGBM), artificial neural network (ANN), one-dimensional convolutional neural network (1D-CNN) and two-dimensional convolutional neural network (2D-CNN). Their performances were evaluated in terms of accuracy, sensitivity, and specificity. The result was compared with human performance. A total of four volunteers, two of whom were trained laryngologists, rated the same files. The 1D-CNN showed the highest accuracy of 85% and sensitivity and sensitivity and specificity levels of 78% and 93%. The two laryngologists achieved accuracy of 69.9% but sensitivity levels of 44%. Automated analysis of voice signals could differentiate subjects with laryngeal cancer from those of healthy subjects with higher diagnostic properties than those performed by the four volunteers.
嗓音变化可能是喉癌最早的症状。我们研究了自动语音信号分析是否可用于区分喉癌患者与健康受试者。我们使用语音学语音分析软件包(PRAAT)提取特征,并从/a:/元音的语音样本中计算梅尔频率倒谱系数(MFCC)。所提出的方法用六种算法进行了测试:支持向量机(SVM)、极端梯度提升(XGBoost)、轻量级梯度提升机(LGBM)、人工神经网络(ANN)、一维卷积神经网络(1D-CNN)和二维卷积神经网络(2D-CNN)。根据准确率、敏感性和特异性对它们的性能进行了评估。结果与人工评判进行了比较。共有四名志愿者,其中两名是训练有素的喉科医生,对相同的文件进行了评分。1D-CNN显示出最高准确率85%,敏感性和特异性水平分别为78%和93%。两名喉科医生的准确率为69.9%,但敏感性水平为44%。语音信号的自动分析能够将喉癌患者与健康受试者区分开来,其诊断性能高于四名志愿者的表现。