WD-FAB IRT模型的正则化贝叶斯校准和评分比边际最大似然法具有更好的预测性能。

Regularized Bayesian calibration and scoring of the WD-FAB IRT model improves predictive performance over marginal maximum likelihood.

作者信息

Chang Joshua C, Porcino Julia, Rasch Elizabeth K, Tang Larry

机构信息

Rehabilitation Medicine Department, NIH Clinical Center, Bethesda, Maryland, United States of America.

National Center for Forensic Science, University of Central Florida, Orlando, Florida, United States of America.

出版信息

PLoS One. 2022 Apr 8;17(4):e0266350. doi: 10.1371/journal.pone.0266350. eCollection 2022.

DOI:10.1371/journal.pone.0266350

PMID:35395055

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8993025/

Abstract

Item response theory (IRT) is the statistical paradigm underlying a dominant family of generative probabilistic models for test responses, used to quantify traits in individuals relative to target populations. The graded response model (GRM) is a particular IRT model that is used for ordered polytomous test responses. Both the development and the application of the GRM and other IRT models require statistical decisions. For formulating these models (calibration), one needs to decide on methodologies for item selection, inference, and regularization. For applying these models (test scoring), one needs to make similar decisions, often prioritizing computational tractability and/or interpretability. In many applications, such as in the Work Disability Functional Assessment Battery (WD-FAB), tractability implies approximating an individual's score distribution using estimates of mean and variance, and obtaining that score conditional on only point estimates of the calibrated model. In this manuscript, we evaluate the calibration and scoring of models under this common use-case using Bayesian cross-validation. Applied to the WD-FAB responses collected for the National Institutes of Health, we assess the predictive power of implementations of the GRM based on their ability to yield, on validation sets of respondents, ability estimates that are most predictive of patterns of item responses. Our main finding indicates that regularized Bayesian calibration of the GRM outperforms the regularization-free empirical Bayesian procedure of marginal maximum likelihood. We also motivate the use of compactly supported priors in test scoring.

摘要

项目反应理论（IRT）是用于测试反应的一类主要生成概率模型背后的统计范式，用于相对于目标人群量化个体特征。等级反应模型（GRM）是一种特定的IRT模型，用于有序多分类测试反应。GRM和其他IRT模型的开发与应用都需要进行统计决策。为了构建这些模型（校准），需要决定项目选择、推断和正则化的方法。为了应用这些模型（测试评分），也需要做出类似的决策，通常会优先考虑计算的易处理性和/或可解释性。在许多应用中，例如在工作残疾功能评估量表（WD-FAB）中，易处理性意味着使用均值和方差估计来近似个体的分数分布，并仅基于校准模型的点估计来获得该分数。在本手稿中，我们使用贝叶斯交叉验证评估在此常见用例下模型的校准和评分。应用于为美国国立卫生研究院收集的WD-FAB反应，我们基于GRM实现对验证集中受访者的能力估计，评估其对项目反应模式的预测能力，从而评估GRM实现的预测能力。我们的主要发现表明，GRM的正则化贝叶斯校准优于无正则化的边际最大似然经验贝叶斯程序。我们还提倡在测试评分中使用紧支先验。

相似文献

Regularized Bayesian calibration and scoring of the WD-FAB IRT model improves predictive performance over marginal maximum likelihood.

PLoS One. 2022 Apr 8;17(4):e0266350. doi: 10.1371/journal.pone.0266350. eCollection 2022.

Robustness of the performance of the optimized hierarchical two-parameter logistic IRT model for small-sample item calibration.

Behav Res Methods. 2023 Dec;55(8):3965-3983. doi: 10.3758/s13428-022-02000-5. Epub 2022 Nov 4.

Psychometric properties for the Balanced Inventory of Desirable Responding: dichotomous versus polytomous conventional and IRT scoring.

Psychol Assess. 2014 Sep;26(3):878-91. doi: 10.1037/a0036430. Epub 2014 Apr 7.

Comparison of unweighted and item response theory-based weighted sum scoring for the Nine-Questions Depression-Rating Scale in the Northern Thai Dialect.

BMC Med Res Methodol. 2022 Oct 12;22(1):268. doi: 10.1186/s12874-022-01744-0.

Improving Assessment of Work Related Mental Health Function Using the Work Disability Functional Assessment Battery (WD-FAB).

J Occup Rehabil. 2018 Mar;28(1):190-199. doi: 10.1007/s10926-017-9710-5.

A Note on the -Scoring Method Adapted for Polytomous Test Items.

Educ Psychol Meas. 2019 Jun;79(3):545-557. doi: 10.1177/0013164418786014. Epub 2018 Jul 4.

Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales.

Qual Life Res. 2003 Dec;12(8):981-1002. doi: 10.1023/a:1026123400242.

Measuring Work Related Physical and Mental Health Function: Updating the Work Disability Functional Assessment Battery (WD-FAB) Using Item Response Theory.

J Occup Environ Med. 2019 Mar;61(3):219-224. doi: 10.1097/JOM.0000000000001521.

Patient Satisfaction with Navigator Interpersonal Relationship (PSN-I): item-level psychometrics using IRT analysis.

Support Care Cancer. 2020 Feb;28(2):541-550. doi: 10.1007/s00520-019-04833-x. Epub 2019 May 10.

The e-MSWS-12: improving the multiple sclerosis walking scale using item response theory.

Qual Life Res. 2016 Dec;25(12):3221-3230. doi: 10.1007/s11136-016-1342-2. Epub 2016 Jun 24.

本文引用的文献

Stan: A Probabilistic Programming Language.

J Stat Softw. 2017;76. doi: 10.18637/jss.v076.i01. Epub 2017 Jan 11.

The Work Disability Functional Assessment Battery (WD-FAB).

Phys Med Rehabil Clin N Am. 2019 Aug;30(3):561-572. doi: 10.1016/j.pmr.2019.03.004. Epub 2019 May 11.

Predictive Bayesian selection of multistep Markov chains, applied to the detection of the hot hand and other statistical dependencies in free throws.

R Soc Open Sci. 2019 Mar 20;6(3):182174. doi: 10.1098/rsos.182174. eCollection 2019 Mar.

Using the Stan Program for Bayesian Item Response Theory.

Educ Psychol Meas. 2018 Jun;78(3):384-408. doi: 10.1177/0013164417693666. Epub 2017 Feb 1.

Alternative Hypothesis Testing Procedures for DIMTEST.

Appl Psychol Meas. 2015 Sep;39(6):480-493. doi: 10.1177/0146621615577618. Epub 2015 Mar 27.

Improving Assessment of Work Related Mental Health Function Using the Work Disability Functional Assessment Battery (WD-FAB).

J Occup Rehabil. 2018 Mar;28(1):190-199. doi: 10.1007/s10926-017-9710-5.

Investigating a Weakly Informative Prior for Item Scale Hyperparameters in Hierarchical 3PNO IRT Models.

Front Psychol. 2017 Feb 6;8:123. doi: 10.3389/fpsyg.2017.00123. eCollection 2017.

Bayesian Prior Choice in IRT Estimation Using MCMC and Variational Bayes.

Front Psychol. 2016 Sep 27;7:1422. doi: 10.3389/fpsyg.2016.01422. eCollection 2016.

Disabil Health J. 2015 Oct;8(4):652-7. doi: 10.1016/j.dhjo.2015.04.001. Epub 2015 Apr 15.

Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

J Rheumatol. 2014 Jan;41(1):153-8. doi: 10.3899/jrheum.130813. Epub 2013 Nov 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

WD-FAB IRT模型的正则化贝叶斯校准和评分比边际最大似然法具有更好的预测性能。

Regularized Bayesian calibration and scoring of the WD-FAB IRT model improves predictive performance over marginal maximum likelihood.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献