HPU Workman School of Dental Medicine, High Point University, One University Parkway, High Point, NC 27268, 336-841-7344, United States; UNC Eshelman School of Pharmacy, The University of North Carolina at Chapel Hill, 301 Pharmacy Lane - Beard Hall 321, Chapel Hill, NC 27599, 919-451-3547, United States.
College of Education, University of Texas at Arlington, 701 Planetarium Place, Arlington, TX 76019, 817-272-5641, United States.
Curr Pharm Teach Learn. 2022 Sep;14(9):1206-1214. doi: 10.1016/j.cptl.2022.07.023. Epub 2022 Jul 30.
Classical test theory (CTT) and item response theory (IRT) are two measurement models used to evaluate results from examinations, questionnaires, and instruments. To illustrate the benefits of IRT, we compared how results from multiple-choice tests can be interpreted using CTT and IRT.
IRT encompasses a collection of statistical models that estimate the probability of providing a correct response for a test item. The models are non-linear and generate item characteristic curves that illustrate the relationship between the examinee's ability level and whether they answered the item correctly. Several models can be used to estimate parameters such as item difficulty, discrimination, and guessing. In addition, IRT can generate item and test information functions to illustrate the accuracy of ability estimates.
Researchers interested in IRT should gather the necessary resources early in the research process and collaborate with those experienced in quantitative and advanced statistical models. Researchers should confirm IRT is the optimal choice and select the model ideal for their needs. Once data are acquired, confirm model assumptions are met and model fit is appropriate. Lastly, researchers should consider disseminating the findings with accompanying visuals.
IRT can be a valuable approach in assessment design and evaluation. Potential opportunities include supporting the design of computer adaptive tests, creating equivalent test forms that evaluate a range of examinee abilities, and evaluating whether items perform differently for examinee sub-groups. Further, IRT can have noteworthy visuals such as test information and functions.
经典测量理论(Classical test theory,CTT)和项目反应理论(Item response theory,IRT)是两种用于评估考试、问卷和仪器结果的测量模型。为了说明 IRT 的优势,我们比较了使用 CTT 和 IRT 如何解释多项选择题测试的结果。
IRT 包含一系列统计模型,用于估计测试项目中提供正确答案的概率。这些模型是非线性的,并生成项目特征曲线,说明考生能力水平与他们是否正确回答项目之间的关系。可以使用几种模型来估计参数,如项目难度、区分度和猜测度。此外,IRT 可以生成项目和测试信息函数,以说明能力估计的准确性。
有兴趣使用 IRT 的研究人员应在研究过程的早期阶段收集必要的资源,并与那些熟悉定量和高级统计模型的人员合作。研究人员应确认 IRT 是最优选择,并选择适合其需求的理想模型。一旦获得数据,应确认模型假设得到满足且模型拟合度合适。最后,研究人员应考虑用相关的视觉效果来传播研究结果。
IRT 在评估设计和评估中可能是一种有价值的方法。潜在的机会包括支持计算机自适应测试的设计、创建评估考生多种能力的等效测试形式,以及评估项目对于考生子群体的表现是否不同。此外,IRT 可以具有值得注意的视觉效果,如测试信息和功能。