Chen Changsheng, D'hondt Robbe, Vens Celine, Van den Noortgate Wim
Faculty of Psychology and Educational Sciences, KU Leuven, Campus KULAK, Kortrijk, Belgium.
imec research group itec, KU Leuven, Kortrijk, Belgium.
Educ Psychol Meas. 2025 Jan 4:00131644241306680. doi: 10.1177/00131644241306680.
Multidimensional Item Response Theory (MIRT) is applied routinely in developing educational and psychological assessment tools, for instance, for exploring multidimensional structures of items using exploratory MIRT. A critical decision in exploratory MIRT analyses is the number of factors to retain. Unfortunately, the comparative properties of statistical methods and innovative Machine Learning (ML) methods for factor retention in exploratory MIRT analyses are still not clear. This study aims to fill this gap by comparing a selection of statistical and ML methods, including Kaiser Criterion (KC), Empirical Kaiser Criterion (EKC), Parallel Analysis (PA), scree plot (OC and AF), Very Simple Structure (VSS; C1 and C2), Minimum Average Partial (MAP), Exploratory Graph Analysis (EGA), Random Forest (RF), Histogram-based Gradient Boosted Decision Trees (HistGBDT), eXtreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN). The comparison was performed using 720,000 dichotomous response data sets simulated by the MIRT, for various between-item and within-item structures and considering characteristics of large-scale assessments. The results show that MAP, RF, HistGBDT, XGBoost, and ANN tremendously outperform other methods. Among them, HistGBDT generally performs better than other methods. Furthermore, including statistical methods' results as training features improves ML methods' performance. The methods' correct-factoring proportions decrease with an increase in missingness or a decrease in sample size. KC, PA, EKC, and scree plot (OC) are over-factoring, while EGA, scree plot (AF), and VSS (C1) are under-factoring. We recommend that practitioners use both MAP and HistGBDT to determine the number of factors when applying exploratory MIRT.
多维项目反应理论(MIRT)在开发教育和心理评估工具时经常被应用,例如,使用探索性MIRT来探索项目的多维结构。探索性MIRT分析中的一个关键决策是要保留的因子数量。不幸的是,在探索性MIRT分析中,用于因子保留的统计方法和创新机器学习(ML)方法的比较特性仍不明确。本研究旨在通过比较一系列统计和ML方法来填补这一空白,这些方法包括凯泽准则(KC)、经验凯泽准则(EKC)、平行分析(PA)、碎石图(OC和AF)、非常简单结构(VSS;C1和C2)、最小平均偏相关(MAP)、探索性图分析(EGA)、随机森林(RF)、基于直方图的梯度提升决策树(HistGBDT)、极端梯度提升(XGBoost)和人工神经网络(ANN)。使用由MIRT模拟的720,000个二分反应数据集进行比较,针对各种项目间和项目内结构,并考虑大规模评估的特征。结果表明,MAP、RF、HistGBDT、XGBoost和ANN的表现远远优于其他方法。其中,HistGBDT通常比其他方法表现更好。此外,将统计方法的结果作为训练特征可提高ML方法的性能。随着缺失率的增加或样本量的减少,这些方法的正确因子分解比例会降低。KC、PA、EKC和碎石图(OC)存在过度因子分解的情况,而EGA、碎石图(AF)和VSS(C1)则存在因子分解不足的情况。我们建议从业者在应用探索性MIRT时使用MAP和HistGBDT来确定因子数量。