Downing Steven M, Haladyna Thomas M
University of Illinois at Chicago, College of Medicine, Department of Medical Education, Chicago, Illinois 60612-7309, USA.
Med Educ. 2004 Mar;38(3):327-33. doi: 10.1046/j.1365-2923.2004.01777.x.
Factors that interfere with the ability to interpret assessment scores or ratings in the proposed manner threaten validity. To be interpreted in a meaningful manner, all assessments in medical education require sound, scientific evidence of validity.
The purpose of this essay is to discuss 2 major threats to validity: construct under-representation (CU) and construct-irrelevant variance (CIV). Examples of each type of threat for written, performance and clinical performance examinations are provided.
The CU threat to validity refers to undersampling the content domain. Using too few items, cases or clinical performance observations to adequately generalise to the domain represents CU. Variables that systematically (rather than randomly) interfere with the ability to meaningfully interpret scores or ratings represent CIV. Issues such as flawed test items written at inappropriate reading levels or statistically biased questions represent CIV in written tests. For performance examinations, such as standardised patient examinations, flawed cases or cases that are too difficult for student ability contribute CIV to the assessment. For clinical performance data, systematic rater error, such as halo or central tendency error, represents CIV. The term face validity is rejected as representative of any type of legitimate validity evidence, although the fact that the appearance of the assessment may be an important characteristic other than validity is acknowledged.
There are multiple threats to validity in all types of assessment in medical education. Methods to eliminate or control validity threats are suggested.
以提议的方式干扰解释评估分数或评级能力的因素会威胁效度。为了以有意义的方式进行解释,医学教育中的所有评估都需要可靠的、科学的效度证据。
本文的目的是讨论效度的两大威胁:结构代表性不足(CU)和结构无关变异(CIV)。提供了书面考试、实践考试和临床实践考试中每种威胁类型的示例。
效度的CU威胁是指对内容领域的抽样不足。使用太少的题目、病例或临床实践观察结果来充分概括该领域就代表了CU。系统地(而非随机地)干扰有意义地解释分数或评级能力的变量代表CIV。诸如以不适当阅读水平编写的有缺陷的题目或存在统计偏差的问题等,在书面考试中代表CIV。对于实践考试,如标准化病人考试,有缺陷的病例或对学生能力而言太难的病例会给评估带来CIV。对于临床实践数据,系统的评分者误差,如光环效应或集中趋势误差,代表CIV。尽管承认评估的外观可能是除效度之外的一个重要特征,但“表面效度”一词不被视为任何类型的合法效度证据。
医学教育中所有类型的评估都存在多种效度威胁。文中提出了消除或控制效度威胁的方法。