Pamphlett Roger
Department of Pathology, The University of Sydney, Australia.
Med Teach. 2005 Aug;27(5):468-72. doi: 10.1080/01421590500097018.
Multiple true/false (MTF) questions are often used to test medical students. Statistical methods can give an indication of how many of these questions are needed for a reliable test, but it would be useful in addition to have a graphical indication of reliability. Therefore, in an attempt to estimate the smallest number of items needed for reliable testing, the marks of students were plotted throughout an examination. A total of 211 medical students were given 60 pathology questions comprising 300 true/false items. The cumulative percentage mark throughout the 300 items was calculated and graphed for five students each from the top, middle and bottom total scores. For the other 196 students, percentage marks were calculated at 100 and 300 items. The reduction in reliability induced by the 300 to 100 item reduction was calculated by the Spearman-Brown formula. The cumulative percentage graphs showed that, after early fluctuations in each student's mark, the total mark stabilized after 100 items. The mark fluctuated slightly either up or down after 100 marks, but in 96.7% of students it differed by fewer than 10 percentage points between 100 and 300 items. The reliability coefficient was reduced from 0.94 in the 300-item test to 0.85 in the 100-item test. In conclusion, student marks appear to stabilize after 100 true/false items. If the level of difficulty of an examination remains constant, and items are of high discriminatory value, 100 true/false items appear to be sufficient to assess medical students in the MTF format.
多项是非选择题常被用于测试医学生。统计方法可以表明进行一次可靠测试需要多少这类题目,但除此之外,若能有一个关于可靠性的图形指示会很有用。因此,为了估计可靠测试所需的最少题目数量,在一次考试中绘制了学生的成绩。共有211名医学生接受了60道病理学问题的测试,这些问题包含300个是非题。计算了300道题目的累计百分比成绩,并分别为总分排名靠前、中间和靠后的五名学生绘制了图表。对于其他196名学生,计算了他们在100道题和300道题时的百分比成绩。使用斯皮尔曼 - 布朗公式计算了从300道题减少到100道题所导致的可靠性降低情况。累计百分比图表显示,在每个学生成绩的早期波动之后,100道题之后总成绩趋于稳定。在100分之后成绩略有上下波动,但在96.7%的学生中,100道题和300道题时的成绩差异不到10个百分点。可靠性系数从300道题测试时的0.94降至100道题测试时的0.85。总之,学生成绩在100个是非题之后似乎趋于稳定。如果考试的难度水平保持不变,并且题目具有较高的区分度,100个是非题似乎足以采用多项是非题形式来评估医学生。