高风险测试案例研究：评估测量和预测不变性的潜在变量方法。

High-Stakes Testing Case Study: A Latent Variable Approach for Assessing Measurement and Prediction Invariance.

机构信息

Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, USA.

Department of Psychology, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, USA.

出版信息

Psychometrika. 2019 Mar;84(1):285-309. doi: 10.1007/s11336-018-9649-2. Epub 2019 Jan 22.

DOI:10.1007/s11336-018-9649-2

PMID:30671788

Abstract

The existence of differences in prediction systems involving test scores across demographic groups continues to be a thorny and unresolved scientific, professional, and societal concern. Our case study uses a two-stage least squares (2SLS) estimator to jointly assess measurement invariance and prediction invariance in high-stakes testing. So, we examined differences across groups based on latent as opposed to observed scores with data for 176 colleges and universities from The College Board. Results showed that evidence regarding measurement invariance was rejected for the SAT mathematics (SAT-M) subtest at the 0.01 level for 74.5% and 29.9% of cohorts for Black versus White and Hispanic versus White comparisons, respectively. Also, on average, Black students with the same standing on a common factor had observed SAT-M scores that were nearly a third of a standard deviation lower than for comparable Whites. We also found evidence that group differences in SAT-M measurement intercepts may partly explain the well-known finding of observed differences in prediction intercepts. Additionally, results provided evidence that nearly a quarter of the statistically significant observed intercept differences were not statistically significant at the 0.05 level once predictor measurement error was accounted for using the 2SLS procedure. Our joint measurement and prediction invariance approach based on latent scores opens the door to a new high-stakes testing research agenda whose goal is to not simply assess whether observed group-based differences exist and the size and direction of such differences. Rather, the goal of this research agenda is to assess the causal chain starting with underlying theoretical mechanisms (e.g., contextual factors, differences in latent predictor scores) that affect the size and direction of any observed differences.

摘要

预测系统在不同人群中测试成绩的差异仍然是一个棘手且尚未解决的科学、专业和社会问题。我们的案例研究使用两阶段最小二乘法（2SLS）估计器来联合评估高风险测试中的测量不变性和预测不变性。因此，我们使用来自大学理事会的数据，根据潜在分数而不是观察分数，检查了不同群体之间的差异。结果表明，对于 SAT 数学（SAT-M）子测试，证据表明在 0.01 水平上，黑人和白人以及西班牙裔人和白人之间的 74.5%和 29.9%的队列分别拒绝了测量不变性的证据。此外，平均而言，具有相同共同因素地位的黑人学生的观察到的 SAT-M 分数比可比白人学生低近三分之一标准差。我们还发现证据表明，SAT-M 测量截距的群体差异可能部分解释了众所周知的预测截距观察差异。此外，结果表明，近四分之一的具有统计学意义的观测截距差异在考虑到 2SLS 过程中预测测量误差后，在 0.05 水平上不再具有统计学意义。我们基于潜在分数的联合测量和预测不变性方法为新的高风险测试研究议程打开了大门，其目标不是简单地评估是否存在基于观察的群体差异以及这些差异的大小和方向。相反，这一研究议程的目标是评估从潜在预测因素分数的影响任何观察到的差异的大小和方向的潜在理论机制（例如，背景因素）开始的因果链。

相似文献

High-Stakes Testing Case Study: A Latent Variable Approach for Assessing Measurement and Prediction Invariance.

Psychometrika. 2019 Mar;84(1):285-309. doi: 10.1007/s11336-018-9649-2. Epub 2019 Jan 22.

An essay on measurement and factorial invariance.

Med Care. 2006 Nov;44(11 Suppl 3):S69-77. doi: 10.1097/01.mlr.0000245438.73837.89.

Can student self-ratings be compared with peer ratings? A study of measurement invariance of multisource feedback.

Adv Health Sci Educ Theory Pract. 2016 May;21(2):401-13. doi: 10.1007/s10459-015-9638-5. Epub 2015 Sep 19.

The performance of the IES-R for Latinos and non-Latinos: Assessing measurement invariance.

PLoS One. 2018 Apr 3;13(4):e0195229. doi: 10.1371/journal.pone.0195229. eCollection 2018.

The Center for Epidemiological Studies Depression Scale (CES-D): Measurement equivalence across gender groups in Hispanic college students.

J Affect Disord. 2017 Sep;219:112-118. doi: 10.1016/j.jad.2017.05.024. Epub 2017 May 14.

Penalized Best Linear Prediction of True Test Scores.

Psychometrika. 2019 Mar;84(1):186-211. doi: 10.1007/s11336-018-9636-7. Epub 2018 Sep 21.

Investigating the structure and measurement invariance of the Multigroup Ethnic Identity Measure in a multiethnic sample of college students.

J Couns Psychol. 2014 Jul;61(3):437-446. doi: 10.1037/a0036253. Epub 2014 Mar 24.

How the 2SLS/IV estimator can handle equality constraints in structural equation models: a system-of-equations approach.

Br J Math Stat Psychol. 2014 May;67(2):353-69. doi: 10.1111/bmsp.12023. Epub 2013 Aug 23.

The perceived constraints subscale of the Sense of Mastery Scale: dimensionality and measurement invariance.

Qual Life Res. 2017 Jan;26(1):127-138. doi: 10.1007/s11136-016-1359-6. Epub 2016 Jul 6.

Testing measurement invariance of the protective behavioral strategies scale in college men and women.

Psychol Assess. 2014 Mar;26(1):307-13. doi: 10.1037/a0034471. Epub 2013 Sep 23.

引用本文的文献

An introduction to model implied instrumental variables using two stage least squares (MIIV-2SLS) in structural equation models (SEMs).

Psychol Methods. 2022 Oct;27(5):752-772. doi: 10.1037/met0000297. Epub 2021 Jul 29.

Differential Item Functioning Analyses of the Patient-Reported Outcomes Measurement Information System (PROMIS®) Measures: Methods, Challenges, Advances, and Future Directions.

Psychometrika. 2021 Sep;86(3):674-711. doi: 10.1007/s11336-021-09775-0. Epub 2021 Jul 12.

本文引用的文献

Solving the Supreme Problem: 100 years of selection and recruitment at the Journal of Applied Psychology.

J Appl Psychol. 2017 Mar;102(3):291-304. doi: 10.1037/apl0000081. Epub 2017 Jan 26.

Using the Criterion-Predictor Factor Model to Compute the Probability of Detecting Prediction Bias with Ordinary Least Squares Regression.

Psychometrika. 2012 Jul;77(3):561-80. doi: 10.1007/s11336-012-9270-8. Epub 2012 May 17.

A taxonomy of path-related goodness-of-fit indices and recommended criterion values.

Psychol Methods. 2016 Sep;21(3):388-404. doi: 10.1037/met0000068. Epub 2016 May 23.

Measurement Invariance, Predictive Invariance, and the Duality Paradox.

Multivariate Behav Res. 1995 Oct 1;30(4):577-605. doi: 10.1207/s15327906mbr3004_6.

Studying Individual Differences in Predictability With Gamma Regression and Nonlinear Multilevel Models.

Multivariate Behav Res. 2010 Jan 29;45(1):153-85. doi: 10.1080/00273170903504885.

Group Differences in Regression Intercepts: Implications for Factorial Invariance.

Multivariate Behav Res. 1998 Jul 1;33(3):403-24. doi: 10.1207/s15327906mbr3303_5.

Why women perform better in college than admission scores would predict: Exploring the roles of conscientiousness and course-taking patterns.

J Appl Psychol. 2016 Apr;101(4):569-81. doi: 10.1037/apl0000069. Epub 2015 Dec 14.

An Improved Correction for Range Restricted Correlations Under Extreme, Monotonic Quadratic Nonlinearity and Heteroscedasticity.

Psychometrika. 2016 Jun;81(2):550-64. doi: 10.1007/s11336-015-9466-9. Epub 2015 May 8.

Addressing criticisms of existing predictive bias research: cognitive ability test scores still overpredict African Americans' job performance.

J Appl Psychol. 2015 Jan;100(1):162-79. doi: 10.1037/a0037615. Epub 2014 Aug 25.

Model-implied instrumental variable-generalized method of moments (MIIV-GMM) estimators for latent variable models.

Psychometrika. 2014 Jan;79(1):20-50. doi: 10.1007/s11336-013-9335-3. Epub 2013 Apr 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高风险测试案例研究：评估测量和预测不变性的潜在变量方法。

High-Stakes Testing Case Study: A Latent Variable Approach for Assessing Measurement and Prediction Invariance.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献