University of California, Los Angeles, USA.
J Speech Lang Hear Res. 2011 Jun;54(3):803-12. doi: 10.1044/1092-4388(2010/10-0083). Epub 2010 Nov 16.
Interrater disagreements in ratings of quality plague the study of voice. This study compared 2 methods for handling this variability.
Listeners provided multiple breathiness ratings for 2 sets of pathological voices, one including 20 male and 20 female voices unselected for quality and one including 20 breathy female voices. Ratings for each listener were averaged together, mean ratings were z transformed, and the likelihood that 2 listeners would agree exactly in their ratings was calculated as a function of averaging and standardizing condition. Data were also multidimensionally scaled to examine similarities among listeners in perceptual strategy. Results were compared with parallel analyses of existing breathiness ratings of the same voices gathered using a method-of-adjustment task.
Three-way interactions between the mean rating for a voice, standardization condition, and the number of voices averaged together were observed, but no main effect of averaging condition emerged. Multidimensional scaling revealed significant residual differences in perceptual strategy across listeners after averaging and standardizing. Ratings from the method-of-adjustment task showed both high agreement levels and consistent perceptual strategies across listeners, as theoretically predicted.
Averaging multiple ratings and standardizing the mean are inadequate in addressing variations in voice quality perception.
在嗓音研究中,评级者之间的意见分歧严重影响研究结果。本研究比较了两种处理这种可变性的方法。
多位听众对两组病变嗓音进行了多次呼吸声感知评级,一组包括 20 名未经质量选择的男性和 20 名女性嗓音,另一组则包括 20 名女性呼吸声。每位听众的评级进行平均,平均值进行 z 变换,然后计算两名听众的评级完全一致的可能性,作为平均和标准化条件的函数。数据还进行了多维标度分析,以研究听众在感知策略方面的相似性。结果与使用调整方法任务收集的相同嗓音的现有呼吸声感知评级的平行分析进行了比较。
观察到嗓音的平均评级、标准化条件和平均的嗓音数量之间存在三向交互作用,但平均条件没有出现主要效应。多维标度分析显示,在平均和标准化后,听众的感知策略仍然存在显著差异。调整方法任务的评级显示,听众之间的一致性水平很高,且感知策略一致,这与理论预测相符。
平均多个评级和标准化平均值不足以解决嗓音质量感知的变化。