Lei Zhengdong, Kennedy Evan, Fasanella Laura, Li-Jessen Nicole Yee-Key, Mongeau Luc
Department of Mechanical Engineering, McGill University, Montreal, QC H3A 0G4, Canada.
School of Communication Sciences and Disorders, McGill University, Montreal, QC H3A 0G4, Canada.
Appl Sci (Basel). 2019 Apr;9(7). doi: 10.3390/app9071505. Epub 2019 Apr 11.
The purpose of this study was to investigate the feasibility of using neck-surface acceleration signals to discriminate between modal, breathy and pressed voice. Voice data for five English single vowels were collected from 31 female native Canadian English speakers using a portable Neck Surface Accelerometer (NSA) and a condenser microphone. Firstly, auditory-perceptual ratings were conducted by five clinically-certificated Speech Language Pathologists (SLPs) to categorize voice type using the audio recordings. Intra- and inter-rater analyses were used to determine the SLPs' reliability for the perceptual categorization task. Mixed-type samples were screened out, and congruent samples were kept for the subsequent classification task. Secondly, features such as spectral harmonics, jitter, shimmer and spectral entropy were extracted from the NSA data. Supervised learning algorithms were used to map feature vectors to voice type categories. A feature wrapper strategy was used to evaluate the contribution of each feature or feature combinations to the classification between different voice types. The results showed that the highest classification accuracy on a full set was 82.5%. The breathy voice classification accuracy was notably greater (approximately 12%) than those of the other two voice types. Shimmer and spectral entropy were the best correlated metrics for the classification accuracy.
本研究的目的是探讨使用颈部表面加速度信号来区分模态音、呼吸音和紧压音的可行性。使用便携式颈部表面加速度计(NSA)和电容式麦克风,从31名以英语为母语的加拿大女性中收集了五个英语单元音的语音数据。首先,由五名具有临床认证的言语语言病理学家(SLP)进行听觉感知评分,以便使用录音对语音类型进行分类。使用评分者内和评分者间分析来确定SLP在感知分类任务中的可靠性。筛选出混合型样本,并保留一致的样本用于后续的分类任务。其次,从NSA数据中提取诸如频谱谐波、抖动、闪烁和频谱熵等特征。使用监督学习算法将特征向量映射到语音类型类别。采用特征包装策略来评估每个特征或特征组合对不同语音类型之间分类的贡献。结果表明,完整数据集上的最高分类准确率为82.5%。呼吸音的分类准确率明显高于其他两种语音类型(约12%)。闪烁和频谱熵是与分类准确率相关性最好的指标。