Faculty of Biomedical Engineering, Silesian University of Technology, Roosevelta 40, 41-800 Zabrze, Poland.
Sensors (Basel). 2024 Aug 19;24(16):5360. doi: 10.3390/s24165360.
Speech disorders are significant barriers to the balanced development of a child. Many children in Poland are affected by lisps (sigmatism)-the incorrect articulation of sibilants. Since speech therapy diagnostics is complex and multifaceted, developing computer-assisted methods is crucial. This paper presents the results of assessing the usefulness of hybrid feature vectors extracted based on multimodal (video and audio) data for the place of articulation assessment in sibilants /s/ and /ʂ/. We used acoustic features and, new in this field, visual parameters describing selected articulators' texture and shape. Analysis using statistical tests indicated the differences between various sibilant realizations in the context of the articulation pattern assessment using hybrid feature vectors. In sound /s/, 35 variables differentiated dental and interdental pronunciation, and 24 were visual (textural and shape). For sibilant /ʂ/, we found 49 statistically significant variables whose distributions differed between speaker groups (alveolar, dental, and postalveolar articulation), and the dominant feature type was noise-band acoustic. Our study suggests hybridizing the acoustic description with video processing provides richer diagnostic information.
言语障碍是儿童全面发展的重大障碍。波兰许多儿童都存在口齿不清(齿擦音化)的问题,即不能正确地发出摩擦音 s。由于言语治疗诊断复杂且多方面,因此开发计算机辅助方法至关重要。本文介绍了基于多模态(视频和音频)数据提取混合特征向量用于评估摩擦音 /s/ 和 /ʂ/ 的发音部位的有用性的评估结果。我们使用了声学特征,以及在该领域中,描述所选发音器官纹理和形状的新的视觉参数。统计测试分析表明,在使用混合特征向量评估发音模式时,各种摩擦音的实现之间存在差异。在声音 /s/ 中,有 35 个变量可以区分齿音和齿龈音,而 24 个是视觉(纹理和形状)的。对于摩擦音 /ʂ/,我们发现了 49 个具有统计学意义的变量,它们在发音群体(齿龈、齿音和齿龈后音)之间的分布不同,主要的特征类型是噪声带声学。我们的研究表明,将声学描述与视频处理相结合可以提供更丰富的诊断信息。