Hansen Mark, Cai Li, Monroe Scott, Li Zhen
University of California, Los Angeles, California, USA.
Br J Math Stat Psychol. 2016 Nov;69(3):225-252. doi: 10.1111/bmsp.12074.
Despite the growing popularity of diagnostic classification models (e.g., Rupp et al., 2010, Diagnostic measurement: theory, methods, and applications, Guilford Press, New York, NY) in educational and psychological measurement, methods for testing their absolute goodness of fit to real data remain relatively underdeveloped. For tests of reasonable length and for realistic sample size, full-information test statistics such as Pearson's X and the likelihood ratio statistic G suffer from sparseness in the underlying contingency table from which they are computed. Recently, limited-information fit statistics such as Maydeu-Olivares and Joe's (2006, Psychometrika, 71, 713) M have been found to be quite useful in testing the overall goodness of fit of item response theory models. In this study, we applied Maydeu-Olivares and Joe's (2006, Psychometrika, 71, 713) M statistic to diagnostic classification models. Through a series of simulation studies, we found that M is well calibrated across a wide range of diagnostic model structures and was sensitive to certain misspecifications of the item model (e.g., fitting disjunctive models to data generated according to a conjunctive model), errors in the Q-matrix (adding or omitting paths, omitting a latent variable), and violations of local item independence due to unmodelled testlet effects. On the other hand, M was largely insensitive to misspecifications in the distribution of higher-order latent dimensions and to the specification of an extraneous attribute. To complement the analyses of the overall model goodness of fit using M , we investigated the utility of the Chen and Thissen (1997, J. Educ. Behav. Stat., 22, 265) local dependence statistic XLD2 for characterizing sources of misfit, an important aspect of model appraisal often overlooked in favour of overall statements. The XLD2 statistic was found to be slightly conservative (with Type I error rates consistently below the nominal level) but still useful in pinpointing the sources of misfit. Patterns of local dependence arising due to specific model misspecifications are illustrated. Finally, we used the M and XLD2 statistics to evaluate a diagnostic model fit to data from the Trends in Mathematics and Science Study, drawing upon analyses previously conducted by Lee et al., (2011, IJT, 11, 144).
尽管诊断分类模型(例如,Rupp等人,2010年,《诊断测量:理论、方法与应用》,吉尔福德出版社,纽约州纽约市)在教育和心理测量领域越来越受欢迎,但检验它们与实际数据绝对拟合优度的方法仍相对不够完善。对于长度合理的测试和实际样本量,诸如皮尔逊X和似然比统计量G等全信息检验统计量,在其计算所依据的潜在列联表中存在稀疏性问题。最近,诸如Maydeu - Olivares和Joe(2006年,《心理测量学》,71卷,713页)的M等有限信息拟合统计量,已被发现对于检验项目反应理论模型的整体拟合优度非常有用。在本研究中,我们将Maydeu - Olivares和Joe(2006年,《心理测量学》,71卷,713页)的M统计量应用于诊断分类模型。通过一系列模拟研究,我们发现M在广泛的诊断模型结构范围内校准良好,并且对项目模型的某些错误设定(例如,将析取模型拟合到根据合取模型生成的数据)、Q矩阵中的错误(添加或省略路径、省略一个潜在变量)以及由于未建模的测验效应导致的局部项目独立性违反情况敏感。另一方面,M在很大程度上对高阶潜在维度分布中的错误设定以及无关属性的设定不敏感。为了补充使用M对整体模型拟合优度的分析,我们研究了Chen和Thissen(1997年,《教育行为统计学杂志》,22卷,265页)的局部依赖性统计量XLD2在表征拟合不佳来源方面的效用,这是模型评估中一个常被忽视而偏向于整体陈述的重要方面。发现XLD2统计量略显保守(I型错误率始终低于名义水平),但在查明拟合不佳的来源方面仍然有用。文中说明了因特定模型错误设定而产生的局部依赖性模式。最后,我们利用M和XLD2统计量,借鉴Lee等人(2011年,《国际测试杂志》,11卷,144页)之前进行的分析,评估了一个与数学和科学趋势研究数据拟合的诊断模型。