Medical Education Research Unit, Imperial College London, London, UK.
Division of Diabetes, Endocrinology and Metabolism, Imperial College London, London, UK.
Med Educ. 2018 Apr;52(4):447-455. doi: 10.1111/medu.13504. Epub 2018 Feb 1.
Single-best-answer questions (SBAQs) have been widely used to test knowledge because they are easy to mark and demonstrate high reliability. However, SBAQs have been criticised for being subject to cueing.
We used a novel assessment tool that facilitates efficient marking of open-ended very-short-answer questions (VSAQs). We compared VSAQs with SBAQs with regard to reliability, discrimination and student performance, and evaluated the acceptability of VSAQs.
Medical students were randomised to sit a 60-question assessment administered in either VSAQ and then SBAQ format (Group 1, n = 155) or the reverse (Group 2, n = 144). The VSAQs were delivered on a tablet; responses were computer-marked and subsequently reviewed by two examiners. The standard error of measurement (SEM) across the ability spectrum was estimated using item response theory.
The review of machine-marked questions took an average of 1 minute, 36 seconds per question for all students. The VSAQs had high reliability (alpha: 0.91), a significantly lower SEM than the SBAQs (p < 0.001) and higher mean item-total point biserial correlations (p < 0.001). The VSAQ scores were significantly lower than the SBAQ scores (p < 0.001). The difference in scores between VSAQs and SBAQs was attenuated in Group 2. Although 80.4% of students found the VSAQs more difficult, 69.2% found them more authentic.
The VSAQ format demonstrated high reliability and discrimination and items were perceived as more authentic. The SBAQ format was associated with significant cueing. The present results suggest the VSAQ format has a higher degree of validity.
单项最佳答案问题 (SBAQs) 已被广泛用于测试知识,因为它们易于标记且可靠性高。然而,SBAQs 因易受暗示而受到批评。
我们使用了一种新的评估工具,它便于对开放式简答题 (VSAQs) 进行高效评分。我们比较了 VSAQs 和 SBAQs 在可靠性、区分度和学生表现方面的差异,并评估了 VSAQs 的可接受性。
将医学生随机分配到 VSAQ 然后 SBAQ 格式的 60 题评估中(第 1 组,n = 155)或相反(第 2 组,n = 144)。VSAQs 是在平板电脑上提供的;答案由计算机评分,然后由两名考官进行审查。使用项目反应理论估计整个能力谱上的测量标准误差 (SEM)。
对机器标记问题的审查,所有学生的平均用时为每题 1 分 36 秒。VSAQs 具有很高的可靠性(alpha:0.91),与 SBAQs 相比,SEM 显著降低(p < 0.001),平均项目总分双列相关系数更高(p < 0.001)。VSAQ 分数明显低于 SBAQ 分数(p < 0.001)。VSAQ 组 2 的分数差异较小。尽管 80.4%的学生认为 VSAQs 更难,但 69.2%的学生认为它们更真实。
VSAQ 格式表现出较高的可靠性和区分度,并且项目被认为更真实。SBAQ 格式与明显的暗示有关。目前的结果表明 VSAQ 格式具有更高的有效性。