School of Foreign Languages, Hunan University, Changsha, China.
School of Foreign Languages, Hunan First Normal University, Changsha, China.
J Speech Lang Hear Res. 2023 Nov 9;66(11):4363-4379. doi: 10.1044/2023_JSLHR-23-00356. Epub 2023 Oct 20.
Capturing phonation types such as breathy, modal, and pressed voices precisely can facilitate the recognition of human emotions. However, little is known about how exactly phonation types and decoders' gender influence the perception of emotional speech. Based on the modified Brunswikian lens model, this article aims to examine the roles of phonation types and decoders' gender in Mandarin emotional speech recognition by virtue of articulatory speech synthesis.
Fifty-five participants (28 male and 27 female) completed a recognition task of Mandarin emotional speech, with 200 stimuli representing five emotional categories (happiness, anger, fear, sadness, and neutrality) and five types (original, copied, breathy, modal, and pressed). Repeated-measures analyses of variance were performed to analyze recognition accuracy and confusion data.
For male and female decoders, the recognition accuracy of anger from pressed stimuli and fear from breathy stimuli was high; across all phonation-type stimuli, the recognition accuracy of sadness was also high, but that of happiness was low. The confusion data revealed that in recognizing fear from all phonation-type stimuli, female decoders chose fear responses more frequently and neutral responses less frequently than male decoders. In recognizing neutrality from breathy stimuli, female decoders significantly reduced their choice of neutral responses and misidentified neutrality as anger, while male decoders mistook neutrality from pressed stimuli for anger.
This study revealed that, in Mandarin, phonation types play crucial roles in recognizing anger, fear, and neutrality, while the recognition of sadness and happiness seems not to depend heavily on phonation types. Moreover, the decoders' gender affects their recognition of neutrality and fear. These findings support the modified Brunswikian lens model and have significance for diagnosis and intervention among clinical populations with hearing impairment or gender-related psychiatric disorders.
准确捕捉声门类型(如气息声、中间声和压挤声)可以促进对人类情感的识别。然而,对于声门类型和声门类型解码器的性别如何影响情感语音的感知,我们知之甚少。基于修正的 Brunswikian 镜头模型,本文旨在通过发音语音合成技术,研究声门类型和声门类型解码器的性别在普通话情感语音识别中的作用。
55 名参与者(28 名男性和 27 名女性)完成了普通话情感语音识别任务,其中 200 个刺激代表五个情感类别(快乐、愤怒、恐惧、悲伤和中性)和五种类型(原声、复制声、气息声、中间声和压挤声)。对识别准确率和混淆数据进行重复测量方差分析。
对于男性和女性解码器,压挤声刺激的愤怒识别和气息声刺激的恐惧识别准确率较高;在所有声门类型刺激中,悲伤的识别准确率也较高,但快乐的识别准确率较低。混淆数据显示,在识别所有声门类型刺激的恐惧时,女性解码器比男性解码器更频繁地选择恐惧反应,较少选择中性反应。在识别气息声刺激的中性时,女性解码器显著减少了对中性反应的选择,将中性误识别为愤怒,而男性解码器则将压挤声刺激的中性误识别为愤怒。
本研究表明,在普通话中,声门类型在识别愤怒、恐惧和中性方面起着关键作用,而悲伤和快乐的识别似乎不依赖于声门类型。此外,解码器的性别影响他们对中性和恐惧的识别。这些发现支持修正的 Brunswikian 镜头模型,对有听力障碍或与性别相关的精神障碍的临床人群的诊断和干预具有重要意义。