Kreiman J, Gerratt B R
Division of Head and Neck Surgery, UCLA School of Medicine 90095-1794, USA.
J Acoust Soc Am. 1998 Sep;104(3 Pt 1):1598-608. doi: 10.1121/1.424372.
The validity of perceptual measures of vocal quality has been neglected in studies of voice, which focus more commonly on rater reliability. Validity depends in part on reliability, because an unreliable test does not measure what it is intended to measure. However, traditional measures of rating reliability only partially represent interrater agreement, because they cannot reflect variations or patterns of agreement for specific voice samples. In this paper the likelihood that two raters would agree in their ratings of a single voice is examined, for each voice in five previously gathered data sets. Results do not support the continued assumption that traditional rating procedures produce useful indices of listeners' perceptions. Listeners agreed very poorly in the midrange of scales for breathiness and roughness, and mean ratings in the midrange of such scales did not represent the extent to which a voice possesses a quality, but served only to indicate that listeners disagreed. Techniques like analysis by synthesis or judgment of similarity avoid decomposing quality into constituent dimensions, and do not require a listener to compare an external stimulus to an unstable internal representation, thus decreasing the error in measures of quality. Modeling individual differences in perception can increase the variance accounted for in models of quality, further reducing the error in perceptual measures. Thus such techniques may provide valid alternatives to current approaches.
在嗓音研究中,嗓音质量感知测量的有效性一直被忽视,这些研究通常更关注评分者的可靠性。有效性部分取决于可靠性,因为不可靠的测试无法测量其 intended 测量的内容。然而,传统的评分可靠性测量仅部分代表评分者间的一致性,因为它们无法反映特定嗓音样本的一致性变化或模式。本文针对五个先前收集的数据集中的每个嗓音,研究了两位评分者对单一嗓音评分达成一致的可能性。结果不支持继续假设传统评分程序能产生有用的听众感知指标。听众在呼吸声和粗糙度量表的中间范围达成的一致非常差,此类量表中间范围的平均评分并不能代表嗓音具有某种特质的程度,而仅表明听众存在分歧。像通过合成分析或相似性判断这样的技术避免将质量分解为组成维度,并且不需要听众将外部刺激与不稳定的内部表征进行比较,从而减少了质量测量中的误差。对感知中的个体差异进行建模可以增加质量模型中解释的方差,进一步减少感知测量中的误差。因此,此类技术可能为当前方法提供有效的替代方案。