Sage Agata
Faculty of Biomedical Engineering, Silesian University of Technology, Roosevelta 40, Zabrze, 41-800, Silesia, Poland.
Comput Methods Programs Biomed. 2025 Jun;264:108716. doi: 10.1016/j.cmpb.2025.108716. Epub 2025 Mar 21.
Sigmatism is a speech disorder concerning sibilants, and its diagnosis affects many Polish children of preschool age. The success of therapy often depends on early and accurate diagnosis. This paper presents research findings on using 2D and 3D (time-related) visual features to analyze the place of articulation, sibilance (the character of a gap between teeth that allows the articulation of sibilant sounds), and tongue positioning in four of twelve Polish sibilants:/s/,/z/,/ʦ/, and/dz/.
A dedicated data acquisition system captured the stereovision stream during the speech therapy examination (201 speakers aged 4-8). The material contains 23 words and four logatomes. This study introduces 3D texture and shape features extracted for the mouth, lips, and tongue. The third dimension is the time of articulation, and the volumes reflect the movements of speech organs. The research compares the usability of 3D mode to a 2D approach (mouth texture features; mouth, lips, and tongue shape parameters) described in previous works. The statistical analysis includes Mann-Whitney U test to indicate the significant differences between selected articulation patterns for each sibilant and pronunciation aspect (considering p<0.05).
Overall outcomes suggest the dominance of 3D time-related statistically significant features, especially describing the shape of a tongue. Analysis considering features with at least medium effect size showed that 3D features differentiate dental and interdental articulation in case of/s/,/z/, and/ʦ/, while in case of/dz/ significant parameters were 2D. The 3D mode prevails also in terms of sibilance: analysis of sounds/z/ and/ʦ/ results in 3D features only, but for/s/ and/dz/ outcomes include both 3D and 2D parameters. Analysis of the tongue positioning during articulation in terms of at least moderate effect size suggests a presence of features only in the case of affricates:/ʦ/ (3D features) and/dz/ (2D features). All parameters with at least medium effect size describe the shape of the tongue.
This research proves the potential of visual data in building computer-aided speech diagnosis systems using non-contact recording tools. It highlights the usability of a 3D approach introduced in this paper. Results also emphasize the importance of tongue movement analysis.
咝音障碍是一种与咝音有关的言语障碍,其诊断影响着许多波兰学龄前儿童。治疗的成功往往取决于早期准确的诊断。本文介绍了利用二维和三维(与时间相关)视觉特征来分析波兰语12个咝音中4个音的发音部位、咝音性(牙齿间间隙特征,使咝音得以发出)以及舌位的研究结果,这4个音为/s/、/z/、/ʦ/和/dz/。
在言语治疗检查期间,一个专用数据采集系统采集了立体视觉流(201名4至8岁的受试者)。材料包含23个单词和4个语素。本研究介绍了针对口腔、嘴唇和舌头提取的三维纹理和形状特征。第三个维度是发音时间,这些体积反映了言语器官的运动。该研究将三维模式与先前研究中描述的二维方法(口腔纹理特征、口腔及嘴唇和舌头形状参数)的可用性进行了比较。统计分析包括曼-惠特尼U检验,以表明每个咝音和发音方面所选发音模式之间的显著差异(p<0.05)。
总体结果表明与时间相关的三维统计显著特征占主导,尤其是描述舌头形状的特征。考虑效应大小至少为中等的特征进行分析表明,对于/s/、/z/和/ʦ/,三维特征可区分齿音和齿间音发音,而对于/dz/,显著参数是二维的。在咝音性方面三维模式也占优势:对/z/和/ʦ/音的分析仅得出三维特征,但对于/s/和/dz/,结果包括三维和二维参数。根据效应大小至少为中等的情况对发音时的舌位进行分析表明,仅在塞擦音的情况下存在特征:/ʦ/(三维特征)和/dz/(二维特征)。所有效应大小至少为中等的参数都描述了舌头的形状。
本研究证明了视觉数据在使用非接触记录工具构建计算机辅助言语诊断系统方面的潜力。它突出了本文所介绍的三维方法的可用性。结果还强调了舌运动分析的重要性。