Veloski J J, Rabinowitz H K, Robeson M R, Young P R
Center for Research in Medical Education and Health Care, Jefferson Medical College of Thomas Jefferson University, Philadelphia, Pennsylvania 19107, USA.
Acad Med. 1999 May;74(5):539-46. doi: 10.1097/00001888-199905000-00022.
To evaluate an open-ended, computer-scored testing format designed to overcome certain limitations of multiple-choice questions.
Test items covering content in family medicine were administered in two different formats to 7,036 resident physicians in 380 training programs, and to 35 experienced, board-certified physicians in conjunction with the In-training Examination of the American Board of Family Practice. Examinees completed a booklet of 40 open-ended, uncued (UnQ) test items by selecting the answer to each item from a list of over 500 responses. Similar items were administered using the standard multiple-choice question (MCQ) format. One year later, another test of 40 UnQ test items dealing with core content in family medicine was administered to 7,138 residents.
Examinees completed over 560,000 UnQ responses with high compliance and few errors. Both reliability and validity for the UnQ format were higher than for the MCQ format, and the UnQ items discriminated more accurately among levels of physicians' experience. The UnQ format almost eliminated the possibility that the physicians could answer questions by sight recognition or random guessing, and it was particularly effective in measuring knowledge of core content.
This study supports the feasibility of administering open-ended test items to enhance tests of physicians' competence.
评估一种开放式、计算机评分的测试形式,旨在克服多项选择题的某些局限性。
涵盖家庭医学内容的测试题目以两种不同形式对380个培训项目中的7036名住院医师进行施测,并在美国家庭医学委员会的住院医师培训考试中,对35名经验丰富、获得委员会认证的医师进行施测。考生通过从500多个答案选项列表中选择每个题目的答案,完成一本包含40道开放式、无提示(UnQ)测试题目的小册子。类似题目采用标准多项选择题(MCQ)形式进行施测。一年后,对7138名住院医师进行了另一项包含40道关于家庭医学核心内容的UnQ测试题目的测试。
考生完成了超过560,000份UnQ回答,依从性高且错误少。UnQ形式的信度和效度均高于MCQ形式,并且UnQ题目在区分不同经验水平的医师方面更为准确。UnQ形式几乎消除了医师通过视觉识别或随机猜测回答问题的可能性,并且在测量核心内容知识方面特别有效。
本研究支持采用开放式测试题目以加强医师能力测试的可行性。