Onslow M, Adams R, Ingham R
School of Communication Disorders, University of Sydney, Australia.
J Speech Hear Res. 1992 Oct;35(5):994-1001. doi: 10.1044/jshr.3505.994.
This study evaluated the reliability with which relatively sophisticated and unsophisticated judges used a 9-point scale to rate the speech naturalness of speech samples from 10 clients in a treatment program for stuttering that employed prolonged speech. Judges rated repeated speech samples from different speakers during various phases of the program. Different groups of sophisticated and unsophisticated judges made ratings at either 15 sec, 30 sec, or 60 sec intervals while listening to the samples. Of the reliability indices, intraclass correlations were significantly higher for sophisticated judges although the consistency and agreement of unsophisticated judges were generally equivalent to that of sophisticated judges. Both agreement scores and intraclass correlations were higher when ratings were made at 60 sec rather than 30 sec intervals. The predominant variable that influenced judgement reliability appeared to be differences among the subjects. The methodology partially replicated Martin, Haroldson, and Triden's (1984) initial investigation on the use of this scale. However, the levels of intra- or interjudge reliability in this study were lower than the levels achieved by Martin et al.'s judges. There were important differences between the Martin et al. study and this one that may account for the findings, and these are discussed.
本研究评估了相对专业和非专业的评判者使用9分制量表对10名口吃治疗项目客户的言语样本的言语自然度进行评分的可靠性,该治疗项目采用了延长言语的方法。评判者在项目的不同阶段对不同说话者的重复言语样本进行评分。不同组的专业和非专业评判者在听样本时,以15秒、30秒或60秒的间隔进行评分。在可靠性指标中,专业评判者的组内相关系数显著更高,尽管非专业评判者的一致性和一致性总体上与专业评判者相当。当以60秒而不是30秒的间隔进行评分时,一致性分数和组内相关系数都更高。影响判断可靠性的主要变量似乎是受试者之间的差异。该方法部分重复了马丁、哈罗德森和特里登(1984年)对该量表使用的初步调查。然而,本研究中的评判者内或评判者间可靠性水平低于马丁等人的评判者所达到的水平。马丁等人的研究与本研究之间存在重要差异,这可能解释了研究结果,并对此进行了讨论。