Anderson J
Med Educ. 1983 Mar;17(2):122-33. doi: 10.1111/j.1365-2923.1983.tb01111.x.
Although computer marking of MCQ papers is common practice and is popular because of its accuracy, speed and the fact that detailed statistical analysis can be carried out painlessly, there is still a major role for hand-scoring. A computer and computer time are not always immediately available and some form of data capture (optical mark reading or transfer of responses to punched cards) is a necessary preliminary. The use of a computer is an unnecessary extravagance when: (a) the test is a non-critical class or small-group exam (b) the papers are short (thirty questions or less) or (c) the number of candidates is small (ten or less) (d) detailed statistical analysis is unnecessary. One-from-five MCQs can be marked by hand easily and rapidly. Multiple true/false questions are most easily hand-scored using grid response sheets and some form of stencil overlays prepared from the answer key. For multiple true/false questions the +1, -1, 0 marking system is strongly recommended. Candidates' total scores, the mean score and its standard deviation for the whole group, ranked order and histograms of scores can be obtained with little difficulty. Mean scores and standard deviations for questions take more time to calculate, but when these are available simple indices of discrimination and of internal reliability can be estimated with some extra time and trouble, although examiners may not wish to assess the discriminatory ability of every question. Hand-scoring is of greatest value in non-critical tests when candidate scores are needed rapidly and is particularly useful when combined with full feedback discussion of the MCQ paper.
虽然计算机批改选择题试卷是常见做法,且因其准确性、速度以及能轻松进行详细统计分析而广受欢迎,但人工评分仍发挥着重要作用。计算机和计算机时间并非总是随时可用,某种形式的数据采集(光学标记读取或将答案转至穿孔卡片)是必要的前期准备。在以下情况下,使用计算机是不必要的奢侈行为:(a) 测试为非关键性的班级或小组考试;(b) 试卷篇幅短(30道题或更少);(c) 考生人数少(10人或更少);(d) 无需进行详细统计分析。五选一的选择题可轻松快速地人工批改。多项是非题使用网格答卷以及根据答案密钥准备的某种模板覆盖物进行人工评分最为简便。对于多项是非题,强烈推荐采用+1、-1、0评分系统。获取考生的总分、整个群体的平均分及其标准差、排名顺序以及分数直方图并不困难。计算各题的平均分和标准差耗时更多,但在有这些数据的情况下,虽考官可能不希望评估每道题的区分能力,但花费一些额外时间和精力,仍可估算出简单的区分度指标和内部信度指标。在非关键性测试中,当需要快速得出考生分数时,人工评分极具价值,并且在结合对选择题试卷进行全面反馈讨论时尤为有用。