在心理测量测试中使用可解释机器学习进行项目功能差异检测。

Using Interpretable Machine Learning for Differential Item Functioning Detection in Psychometric Tests.

作者信息

Kraus Elisabeth Barbara, Wild Johannes, Hilbert Sven

机构信息

LMU Munich, Germany.

University of Regensburg, Germany.

出版信息

Appl Psychol Meas. 2024 Jul;48(4-5):167-186. doi: 10.1177/01466216241238744. Epub 2024 Mar 11.

DOI:10.1177/01466216241238744

PMID:39055539

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11268249/

Abstract

This study presents a novel method to investigate test fairness and differential item functioning combining psychometrics and machine learning. Test unfairness manifests itself in systematic and demographically imbalanced influences of confounding constructs on residual variances in psychometric modeling. Our method aims to account for resulting complex relationships between response patterns and demographic attributes. Specifically, it measures the importance of individual test items, and latent ability scores in comparison to a random baseline variable when predicting demographic characteristics. We conducted a simulation study to examine the functionality of our method under various conditions such as linear and complex impact, unfairness and varying number of factors, unfair items, and varying test length. We found that our method detects unfair items as reliably as Mantel-Haenszel statistics or logistic regression analyses but generalizes to multidimensional scales in a straight forward manner. To apply the method, we used random forests to predict migration backgrounds from ability scores and single items of an elementary school reading comprehension test. One item was found to be unfair according to all proposed decision criteria. Further analysis of the item's content provided plausible explanations for this finding. Analysis code is available at: https://osf.io/s57rw/?view_only=47a3564028d64758982730c6d9c6c547.

摘要

本研究提出了一种结合心理测量学和机器学习来调查测试公平性和项目功能差异的新方法。测试不公平性表现为在心理测量建模中，混杂结构对残差方差产生系统性的、人口统计学上不均衡的影响。我们的方法旨在解释反应模式与人口统计学属性之间由此产生的复杂关系。具体而言，在预测人口统计学特征时，它会测量各个测试项目以及潜在能力分数相对于随机基线变量的重要性。我们进行了一项模拟研究，以检验我们的方法在各种条件下的功能，如线性和复杂影响、不公平性以及不同数量的因素、不公平项目和不同的测试长度。我们发现，我们的方法检测不公平项目的可靠性与曼特尔 - 亨塞尔统计或逻辑回归分析相当，但能以一种直接的方式推广到多维量表。为了应用该方法，我们使用随机森林从小学阅读理解测试的能力分数和单个项目来预测移民背景。根据所有提出的决策标准，发现有一个项目是不公平的。对该项目内容的进一步分析为这一发现提供了合理的解释。分析代码可在以下网址获取：https://osf.io/s57rw/?view_only=47a3564028d64758982730c6d9c6c547 。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在心理测量测试中使用可解释机器学习进行项目功能差异检测。

Using Interpretable Machine Learning for Differential Item Functioning Detection in Psychometric Tests.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

在心理测量测试中使用可解释机器学习进行项目功能差异检测。

Using Interpretable Machine Learning for Differential Item Functioning Detection in Psychometric Tests.

作者信息

机构信息

出版信息

相似文献

本文引用的文献