Suppr超能文献

从连续语音的元音片段感知情绪的情感和活动水平。

Perception of emotional valences and activity levels from vowel segments of continuous speech.

机构信息

Department of Speech Communication and Voice Research, University of Tampere, Tampere, Finland.

出版信息

J Voice. 2010 Jan;24(1):30-8. doi: 10.1016/j.jvoice.2008.04.004. Epub 2008 Dec 25.

Abstract

This study aimed to investigate the role of voice source and formant frequencies in the perception of emotional valence and psychophysiological activity level from short vowel samples (approximately 150 milliseconds). Nine professional actors (five males and four females) read a prose passage simulating joy, tenderness, sadness, anger, and a neutral emotional state. The stress carrying vowel [a:] was extracted from continuous speech during the Finnish word [ta:k:ahan] and analyzed for duration, fundamental frequency (F0), equivalent sound level (L(eq)), alpha ratio, and formant frequencies F1-F4. Alpha ratio was calculated by subtracting the L(eq) (dB) in the range 50 Hz-1 kHz from the L(eq) in the range 1-5 kHz. The samples were inverse filtered by Iterative Adaptive Inverse Filtering and the estimates of the glottal flow obtained were parameterized with the normalized amplitude quotient (NAQ = f(AC)/(d(peak)T)). Fifty listeners (mean age 28.5 years) identified the emotional valences from the randomized samples. Multinomial Logistic Regression Analysis was used to study the interrelations of the parameters for perception. It appeared to be possible to identify valences from vowel samples of short duration ( approximately 150 milliseconds). NAQ tended to differentiate between the valences and activity levels perceived in both genders. Voice source may not only reflect variations of F0 and L(eq), but may also have an independent role in expression, reflecting phonation types. To some extent, formant frequencies appeared to be related to valence perception but no clear patterns could be identified. Coding of valence tends to be a complicated multiparameter phenomenon with wide individual variation.

摘要

本研究旨在探讨元音源和共振峰频率在感知短元音样本(约 150 毫秒)的情感效价和心理生理活动水平中的作用。9 名专业演员(5 男 4 女)用模拟喜悦、温柔、悲伤、愤怒和中性情绪的方式朗读了一篇散文。从连续语音中提取出重音元音 [a:],并对其进行时长、基频(F0)、等效声级(L(eq))、α比和共振峰频率 F1-F4 的分析。α比是通过从 50 Hz-1 kHz 范围内的 L(eq)(dB)减去 1-5 kHz 范围内的 L(eq)计算得出的。样本通过迭代自适应逆滤波进行逆滤波,并对获得的声门波进行参数化,使用归一化幅度商(NAQ = f(AC)/(d(peak)T))。50 名听众(平均年龄 28.5 岁)从随机样本中识别出情感效价。多项逻辑回归分析用于研究参数之间的相互关系。似乎可以从持续时间较短(约 150 毫秒)的元音样本中识别出效价。NAQ 倾向于区分两性感知到的效价和活动水平。元音源不仅可能反映 F0 和 L(eq)的变化,而且可能在表达中具有独立的作用,反映出不同的发声类型。在某种程度上,共振峰频率似乎与效价感知有关,但没有明确的模式可以确定。效价的编码往往是一个复杂的多参数现象,个体差异很大。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验