Kutscher Tanja, Eid Michael, Crayen Claudia
Department of Education and Psychology, Freie Universitaet Berlin, Berlin, Germany.
Department of Data Center and Method Development, Leibniz Institute for Educational Trajectories, Bamberg, Germany.
Front Psychol. 2019 Nov 13;10:2494. doi: 10.3389/fpsyg.2019.02494. eCollection 2019.
Mixture models of item response theory (IRT) can be used to detect inappropriate category use. Data collected by panel surveys where attitudes and traits are typically assessed by short scales with many response categories are prone to response styles indicating inappropriate category use. However, the application of mixed IRT models to this data type can be challenging because of many threshold parameters within items. Up to now, there is very limited knowledge about the sample size required for an appropriate performance of estimation methods as well as goodness-of-fit criteria of mixed IRT models in this case. The present Monte Carlo simulation study examined these issues for two mixed IRT models [the restricted mixed generalized partial credit model (rmGPCM) and the mixed partial credit model (mPCM)]. The population parameters of the simulation study were taken from a real application to survey data which is challenging (a 5-item scale with an 11-point rating scale, and three latent classes). Additional data conditions (e.g., long tests, a reduced number of response categories, and a simple latent mixture) were included in this simulation study to improve the generalizability of the results. Under this challenging data condition, for each model, data were generated based on varying sample sizes (from 500 to 5,000 observations with a 500-step). For the additional conditions, only three sample sizes (consisting of 1,000, 2,500, and 4,500 observations) were examined. The effect of sample size on estimation problems and accuracy of parameter and standard error estimates were evaluated. Results show that the two mixed IRT models require at least 2,500 observations to provide accurate parameter and standard error estimates under the challenging data condition. The rmGPCM produces more estimation problems than the more parsimonious mPCM, mostly because of the sparse tables arising due to many response categories. These models exhibit similar trends of estimation accuracy across sample sizes. Under the additional conditions, no estimation problems are observed. Both models perform well with a smaller sample size when long tests were used or a true latent mixture includes two classes. For model selection, the AIC3 and the SABIC are the most reliable information criteria.
项目反应理论(IRT)的混合模型可用于检测不恰当的类别使用情况。在面板调查中收集的数据,其中态度和特质通常通过具有多个反应类别的短量表进行评估,容易出现表明不恰当类别使用的反应方式。然而,由于项目内存在许多阈值参数,将混合IRT模型应用于这种数据类型可能具有挑战性。到目前为止,关于在这种情况下混合IRT模型的估计方法以及拟合优度标准的适当性能所需的样本量,人们的了解非常有限。本蒙特卡罗模拟研究针对两种混合IRT模型[受限混合广义部分计分模型(rmGPCM)和混合部分计分模型(mPCM)]研究了这些问题。模拟研究的总体参数取自对具有挑战性的调查数据的实际应用(一个5个项目的量表,采用11点量表,以及三个潜在类别)。本模拟研究还纳入了其他数据条件(例如,长测试、减少的反应类别数量和简单的潜在混合),以提高结果的普遍性。在这种具有挑战性的数据条件下,对于每个模型,根据不同的样本量(从500到5000个观测值,步长为500)生成数据。对于其他条件,仅检验了三个样本量(由1000、2500和4500个观测值组成)。评估了样本量对估计问题以及参数和标准误差估计准确性的影响。结果表明,在具有挑战性的数据条件下,这两种混合IRT模型至少需要2500个观测值才能提供准确的参数和标准误差估计。与更简约的mPCM相比,rmGPCM产生的估计问题更多,主要是因为许多反应类别导致的稀疏表格。这些模型在不同样本量下的估计准确性呈现出相似的趋势。在其他条件下,未观察到估计问题。当使用长测试或真实潜在混合包括两个类别时,两种模型在较小样本量下都表现良好。对于模型选择,AIC3和SABIC是最可靠的信息准则。