Swartz M H, Colliver J A, Bardes C L, Charon R, Fried E D, Moroff S
Mount Sinai School of Medicine, New York, New York, USA.
Acad Med. 1997 Jul;72(7):619-26. doi: 10.1097/00001888-199707000-00014.
To test the criterion validity of existing standardized-patient (SP)-examination scores using global ratings by a panel of faculty-physician observers as the gold-standard criterion; to determine whether such ratings can provide a reliable gold-standard criterion to be used for validity-related research; and to encourage the use of these gold-standard ratings for validation research and examination development, including scoring and standard setting, and for enhancing understanding of the clinical competence construct.
Five faculty physicians independently observed and rated videotaped performances of 44 students from one medical school on the seven SP cases that make up the fourth-year assessment administered at The Morchand Center of Mount Sinai School of Medicine to students in the eight member schools in the new York City Consortium.
The validity coefficients showed correlations between scores on the examination and the overall ratings ranging from .60 to .70. The reliability coefficients for ratings of overall examination performance reached the commonly recommended .80 level and were very close at the case level, with interrater reliabilities generally in the .70 to .80 range.
The results are encouraging, with validity coefficients high enough to warrant optimism about the possibility of increasing them to the recommended .80 level, based on further studies to identify those measurable performance characteristics that most reflect the gold-standard ratings. The high interrater reliabilities indicate that faculty-physician ratings of performance on SP cases and examinations can or may be able to provide a reliable gold standard for validating and refining SP assessment.
以一组教师医师观察员的整体评分为金标准,检验现有标准化病人(SP)考试分数的标准效度;确定此类评分是否能提供可靠的金标准,用于与效度相关的研究;并鼓励将这些金标准评分用于验证研究和考试开发,包括评分和标准设定,以及增进对临床能力结构的理解。
五位教师医师独立观察并对来自一所医学院的44名学生在七个SP病例中的录像表现进行评分,这些病例构成了西奈山医学院莫尚德中心对纽约市联盟八所成员学校的学生进行的四年级评估。
效度系数显示考试分数与整体评分之间的相关性在0.60至0.70之间。整体考试表现评分的信度系数达到了普遍推荐的0.80水平,在病例层面非常接近,评分者间信度一般在0.70至0.80范围内。
结果令人鼓舞,效度系数足够高,基于进一步研究以确定那些最能反映金标准评分的可测量表现特征,有理由乐观地认为有可能将其提高到推荐的0.80水平。评分者间的高信度表明,教师医师对SP病例和考试表现的评分能够或可能能够为验证和完善SP评估提供可靠的金标准。