Stillman R M
Surgery. 1984 Jul;96(1):97-101.
The difficulty of creating new, unambiguous, pertinent multiple-choice questions of a level appropriate to medical students implies that examinations must be compiled from a limited number of items. Furthermore, it is impossible to keep used questions inaccessible to all subsequent students. This study was undertaken to determine if these realities are compatible with examinations that are both valid and reliable. A pool of 480 multiple-choice questions was distributed to 232 students during the surgical clerkship. At the conclusion of each quarter, a 120-item multiple-choice examination that consisted of entirely new questions was administered (group I). These 960 questions were then made available to the next group of 218 students; each subsequent examination consisted of 50% new questions and 50% questions repeated verbatim from the publicized pool (group II). With the available pool now increased to 1200, the next examination consisted of 20% new and 80% repeat questions (group III). Reliability (internal consistency) was measured by the Kuder-Richardson-21 formula. Validity was measured by correlation between the multiple-choice examination and the average score of evaluations of each student by two oral examinations and five faculty members. Despite the expected increase in mean examination score, there is loss of neither reliability nor validity by inclusion of even 80% of items repeated from a large pool of multiple-choice questions that have been distributed to the students. Hence, instead of adding irrelevant, trivial, or inappropriate items or trying in vain to hide old examinations from new students, simple preparation of examinations from a large pool of questions is recommended. To insure fairness to all students, this pool should be made public knowledge.
为医学生编写新的、明确无误且相关的、难度合适的多项选择题存在困难,这意味着考试必须由有限数量的题目组成。此外,不可能让后续所有学生都无法获取已使用过的题目。本研究旨在确定这些现实情况是否与既有效又可靠的考试相兼容。在外科实习期间,向232名学生发放了一组480道多项选择题。每个季度末,进行一场由全新题目组成的120道多项选择题考试(第一组)。然后将这960道题目提供给下一组218名学生;随后的每次考试由50%的新题目和50%从已公布题库中原封不动重复的题目组成(第二组)。随着可用题库增加到1200道,下一次考试由20%的新题目和80%的重复题目组成(第三组)。可靠性(内部一致性)通过库德 - 理查森 - 21公式测量。有效性通过多项选择题考试与由两次口试和五名教员对每名学生的评估平均得分之间的相关性来衡量。尽管平均考试成绩预期会提高,但即使包含80%从已分发给学生的大量多项选择题库中重复的题目,可靠性和有效性也不会丧失。因此,建议不要添加无关紧要、琐碎或不恰当的题目,也不要徒劳地试图对新生隐瞒旧考试题目,而是从大量题目中简单地准备考试。为确保对所有学生公平,应将这个题库公之于众。