Kreiman Jody, Gerratt Bruce R, Ito Mika
Division of Head and Neck Surgery, UCLA School of Medicine, 31-24 Rehab Center, Los Angeles, California 90095, USA.
J Acoust Soc Am. 2007 Oct;122(4):2354-64. doi: 10.1121/1.2770547.
Modeling sources of listener variability in voice quality assessment is the first step in developing reliable, valid protocols for measuring quality, and provides insight into the reasons that listeners disagree in their quality assessments. This study examined the adequacy of one such model by quantifying the contributions of four factors to interrater variability: instability of listeners' internal standards for different qualities, difficulties isolating individual attributes in voice patterns, scale resolution, and the magnitude of the attribute being measured. One hundred twenty listeners in six experiments assessed vocal quality in tasks that differed in scale resolution, in the presence/absence of comparison stimuli, and in the extent to which the comparison stimuli (if present) matched the target voices. These factors accounted for 84.2% of the variance in the likelihood that listeners would agree exactly in their assessments. Providing listeners with comparison stimuli that matched the target voices doubled the likelihood that they would agree exactly. Listeners also agreed significantly better when assessing quality on continuous versus six-point scales. These results indicate that interrater variability is an issue of task design, not of listener unreliability.
对语音质量评估中听众变异性的来源进行建模,是制定可靠、有效的质量测量方案的第一步,并且能深入了解听众在质量评估中存在分歧的原因。本研究通过量化四个因素对评分者间变异性的贡献,检验了一个这样的模型的充分性:听众对不同质量的内部标准的不稳定性、在语音模式中分离个体属性的困难、量表分辨率以及所测量属性的大小。六项实验中的120名听众在量表分辨率、有无比较刺激以及比较刺激(如果存在)与目标声音的匹配程度不同的任务中评估语音质量。这些因素占听众在评估中完全一致的可能性差异的84.2%。为听众提供与目标声音匹配的比较刺激,会使他们完全一致的可能性增加一倍。当在连续量表与六点量表上评估质量时,听众的一致性也显著更好。这些结果表明,评分者间变异性是任务设计的问题,而不是听众不可靠的问题。