Uloza Virgilijus, Padervinskis Evaldas, Vegiene Aurelija, Pribuisiene Ruta, Saferis Viktoras, Vaiciukynas Evaldas, Gelzinis Adas, Verikas Antanas
Department of Otolaryngology, Lithuanian University of Health Sciences, Eiveniu 2, 50009, Kaunas, Lithuania.
Department of Physics, Mathematics and Biophysics, Lithuanian University of Health Sciences, Kaunas, Lithuania.
Eur Arch Otorhinolaryngol. 2015 Nov;272(11):3391-9. doi: 10.1007/s00405-015-3708-4. Epub 2015 Jul 11.
The objective of this study is to evaluate the reliability of acoustic voice parameters obtained using smart phone (SP) microphones and investigate the utility of use of SP voice recordings for voice screening. Voice samples of sustained vowel/a/obtained from 118 subjects (34 normal and 84 pathological voices) were recorded simultaneously through two microphones: oral AKG Perception 220 microphone and SP Samsung Galaxy Note3 microphone. Acoustic voice signal data were measured for fundamental frequency, jitter and shimmer, normalized noise energy (NNE), signal to noise ratio and harmonic to noise ratio using Dr. Speech software. Discriminant analysis-based Correct Classification Rate (CCR) and Random Forest Classifier (RFC) based Equal Error Rate (EER) were used to evaluate the feasibility of acoustic voice parameters classifying normal and pathological voice classes. Lithuanian version of Glottal Function Index (LT_GFI) questionnaire was utilized for self-assessment of the severity of voice disorder. The correlations of acoustic voice parameters obtained with two types of microphones were statistically significant and strong (r = 0.73-1.0) for the entire measurements. When classifying into normal/pathological voice classes, the Oral-NNE revealed the CCR of 73.7% and the pair of SP-NNE and SP-shimmer parameters revealed CCR of 79.5%. However, fusion of the results obtained from SP voice recordings and GFI data provided the CCR of 84.60% and RFC revealed the EER of 7.9%, respectively. In conclusion, measurements of acoustic voice parameters using SP microphone were shown to be reliable in clinical settings demonstrating high CCR and low EER when distinguishing normal and pathological voice classes, and validated the suitability of the SP microphone signal for the task of automatic voice analysis and screening.
本研究的目的是评估使用智能手机(SP)麦克风获得的声学语音参数的可靠性,并研究SP语音记录在语音筛查中的实用性。通过两个麦克风同时记录了118名受试者(34名正常嗓音和84名病理嗓音)发出的持续元音/a/的语音样本:口腔AKG Perception 220麦克风和SP三星Galaxy Note3麦克风。使用Dr. Speech软件测量声学语音信号数据的基频、抖动和闪烁、归一化噪声能量(NNE)、信噪比和谐波噪声比。基于判别分析的正确分类率(CCR)和基于随机森林分类器(RFC)的等错误率(EER)用于评估声学语音参数对正常和病理嗓音类别进行分类的可行性。使用立陶宛语版的声门功能指数(LT_GFI)问卷对嗓音障碍的严重程度进行自我评估。两种类型麦克风获得的声学语音参数之间的相关性在整个测量中具有统计学意义且很强(r = 0.73 - 1.0)。在将嗓音分为正常/病理类别时,口腔-NNE的CCR为73.7%,而SP-NNE和SP-闪烁参数对的CCR为79.5%。然而,将SP语音记录和GFI数据的结果融合后,CCR为84.60%,RFC的EER分别为7.9%。总之,在临床环境中,使用SP麦克风测量声学语音参数在区分正常和病理嗓音类别时显示出高CCR和低EER,是可靠的,并验证了SP麦克风信号适用于自动语音分析和筛查任务。