Li Ziying, Shin Jinnie, Kuang Huan, Huggins-Manley A Corinne
University of Florida, Gainesville, USA.
Florida State University, Tallahassee, USA.
Educ Psychol Meas. 2024 Nov 29:00131644241298975. doi: 10.1177/00131644241298975.
Evaluating differential item functioning (DIF) in assessments plays an important role in achieving measurement fairness across different subgroups, such as gender and native language. However, relying solely on the item response scores among traditional DIF techniques poses challenges for researchers and practitioners in interpreting DIF. Recently, response process data, which carry valuable information about examinees' response behaviors, offer an opportunity to further interpret DIF items by examining differences in response processes. This study aims to investigate the potential of response process data features in improving the interpretability of DIF items, with a focus on gender DIF using data from the Programme for International Assessment of Adult Competencies (PIAAC) 2012 computer-based numeracy assessment. We applied random forest and logistic regression with ridge regularization to investigate the association between process data features and DIF items, evaluating the important features to interpret DIF. In addition, we evaluated model performance across varying percentages of DIF items to reflect practical scenarios with different percentages of DIF items. The results demonstrate that the combination of timing features and action-sequence features is informative to reveal the response process differences between groups, thereby enhancing DIF item interpretability. Overall, this study introduces a feasible procedure to leverage response process data to understand and interpret DIF items, shedding light on potential reasons for the low agreement between DIF statistics and expert reviews and revealing potential irrelevant factors to enhance measurement equity.
在评估中评估项目功能差异(DIF)对于实现不同亚组(如性别和母语)间的测量公平性起着重要作用。然而,在传统的DIF技术中,仅依靠项目反应分数给研究人员和从业者解释DIF带来了挑战。最近,包含考生反应行为有价值信息的反应过程数据,为通过检查反应过程差异进一步解释DIF项目提供了机会。本研究旨在探讨反应过程数据特征在提高DIF项目可解释性方面的潜力,重点是利用2012年成人能力国际评估项目(PIAAC)基于计算机的算术评估数据进行性别DIF研究。我们应用随机森林和带岭正则化的逻辑回归来研究过程数据特征与DIF项目之间的关联,评估解释DIF的重要特征。此外,我们评估了不同比例DIF项目的模型性能,以反映不同比例DIF项目的实际情况。结果表明,时间特征和动作序列特征的组合有助于揭示组间反应过程差异,从而提高DIF项目的可解释性。总体而言,本研究引入了一种可行的程序,利用反应过程数据来理解和解释DIF项目,阐明了DIF统计与专家评审之间低一致性的潜在原因,并揭示了潜在的无关因素以提高测量公平性。