van der Vleuten Cees P M, Schuwirth Lambert W T
Department of Educational Development and Research, University of Maastricht, Maastricht, The Netherlands.
Med Educ. 2005 Mar;39(3):309-17. doi: 10.1111/j.1365-2929.2005.02094.x.
We use a utility model to illustrate that, firstly, selecting an assessment method involves context-dependent compromises, and secondly, that assessment is not a measurement problem but an instructional design problem, comprising educational, implementation and resource aspects. In the model, assessment characteristics are differently weighted depending on the purpose and context of the assessment.
Of the characteristics in the model, we focus on reliability, validity and educational impact and argue that they are not inherent qualities of any instrument. Reliability depends not on structuring or standardisation but on sampling. Key issues concerning validity are authenticity and integration of competencies. Assessment in medical education addresses complex competencies and thus requires quantitative and qualitative information from different sources as well as professional judgement. Adequate sampling across judges, instruments and contexts can ensure both validity and reliability. Despite recognition that assessment drives learning, this relationship has been little researched, possibly because of its strong context dependence.
When assessment should stimulate learning and requires adequate sampling, in authentic contexts, of the performance of complex competencies that cannot be broken down into simple parts, we need to make a shift from individual methods to an integral programme, intertwined with the education programme. Therefore, we need an instructional design perspective.
Programmatic instructional design hinges on a careful description and motivation of choices, whose effectiveness should be measured against the intended outcomes. We should not evaluate individual methods, but provide evidence of the utility of the assessment programme as a whole.
我们使用一个实用模型来说明,首先,选择评估方法涉及到依赖于上下文的权衡;其次,评估不是一个测量问题,而是一个教学设计问题,包括教育、实施和资源方面。在该模型中,评估特征根据评估的目的和上下文而具有不同的权重。
在模型的特征中,我们关注可靠性、有效性和教育影响,并认为它们不是任何工具的固有品质。可靠性不取决于结构或标准化,而是取决于抽样。与有效性相关的关键问题是能力的真实性和整合。医学教育中的评估涉及复杂的能力,因此需要来自不同来源的定量和定性信息以及专业判断。在评委、工具和背景之间进行充分抽样可以确保有效性和可靠性。尽管人们认识到评估推动学习,但这种关系很少被研究,可能是因为它强烈依赖于上下文。
当评估应该促进学习,并且需要在真实背景下对无法分解为简单部分的复杂能力的表现进行充分抽样时,我们需要从个体方法转向一个与教育计划交织在一起的整体计划。因此,我们需要一个教学设计的视角。
程序性教学设计取决于对选择的仔细描述和动机,其有效性应该根据预期结果来衡量。我们不应该评估个体方法,而应该提供整个评估计划效用的证据。