Ayasse Nicolai D, Coon Cheryl D
Clinical Outcome Assessment Program, Critical Path Institute, Tucson, AZ, USA.
Qual Life Res. 2025 Apr;34(4):1125-1136. doi: 10.1007/s11136-024-03873-z. Epub 2024 Dec 12.
Item response theory (IRT) models are an increasingly popular method choice for evaluating clinical outcome assessments (COAs) for use in clinical trials. Given common constraints in clinical trial design, such as limits on sample size and assessment lengths, the current study aimed to examine the appropriateness of commonly used polytomous IRT models, specifically the graded response model (GRM) and partial credit model (PCM), in the context of how they are frequently used for psychometric evaluation of COAs in clinical trials.
Data were simulated under varying sample sizes, measure lengths, response category numbers, and slope strengths, as well as under conditions that violated some model assumptions, namely, unidimensionality and equality of item slopes. Model fit, detection of item local dependence, and detection of item misfit were all examined to identify conditions where one model may be preferable or results may contain a degree of bias.
For unidimensional item sets and equal item slopes, the PCM and GRM performed similarly, and GRM performance remained consistent as slope variability increased. For not-unidimensional item sets, the PCM was somewhat more sensitive to this unidimensionality violation. Looking across conditions, the PCM did not demonstrate a clear advantage over the GRM for small sample sizes or shorter measure lengths.
Overall, the GRM and the PCM each demonstrated advantages and disadvantages depending on underlying data conditions and the model outcome investigated. We recommend careful consideration of the known, or expected, data characteristics when choosing a model and interpreting its results.
项目反应理论(IRT)模型是评估用于临床试验的临床结局评估(COA)时越来越常用的方法选择。鉴于临床试验设计中的常见限制,如样本量和评估长度的限制,本研究旨在探讨常用的多分类IRT模型,特别是等级反应模型(GRM)和部分计分模型(PCM),在它们常用于临床试验中COA的心理测量评估的背景下的适用性。
在不同的样本量、测量长度、反应类别数量和斜率强度下模拟数据,以及在违反一些模型假设的条件下,即单维性和项目斜率相等的条件下模拟数据。检查模型拟合、项目局部依赖性的检测和项目不拟合的检测,以确定一个模型可能更可取或结果可能存在一定程度偏差的条件。
对于单维项目集和相等的项目斜率,PCM和GRM的表现相似,并且随着斜率变异性的增加,GRM的表现保持一致。对于非单维项目集,PCM对这种单维性违反更为敏感。综合各种条件来看,对于小样本量或较短的测量长度,PCM并没有显示出比GRM有明显优势。
总体而言,GRM和PCM各有优缺点,这取决于潜在的数据条件和所研究的模型结果。我们建议在选择模型并解释其结果时,仔细考虑已知的或预期的数据特征。