Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill.
Department of Psychiatry, University of California, San Diego.
Psychol Assess. 2021 Jul;33(7):596-609. doi: 10.1037/pas0000938. Epub 2021 May 17.
Screening measures are used in psychology and medicine to identify respondents who are high or low on a construct. Based on the screening, the evaluator assigns respondents to classes corresponding to different courses of action: Make a diagnosis versus reject a diagnosis; provide services versus withhold services; or conduct further assessment versus conclude the assessment process. When measures are used to classify individuals, it is important that the decisions be consistent and equitable across groups. Ideally, if respondents completed the screening measure repeatedly in quick succession, they would be consistently assigned into the same class each time. In addition, the consistency of the classification should be unrelated to the respondents' background characteristics, such as sex, race, or ethnicity (i.e., the measure is free of measurement bias). Reporting estimates of classification consistency is a common practice in educational testing, but there has been limited application of these estimates to screening in psychology and medicine. In this article, we present two procedures based on item response theory that are used (a) to estimate the classification consistency of a screening measure and (b) to evaluate how classification consistency is impacted by measurement bias across respondent groups. We provide R functions to conduct the procedures, illustrate the procedures with real data, and use Monte Carlo simulations to guide their appropriate use. Finally, we discuss how estimates of classification consistency can help assessment specialists make more informed decisions on the use of a screening measure with protected groups (e.g., groups defined by gender, race, or ethnicity). (PsycInfo Database Record (c) 2021 APA, all rights reserved).
筛查措施在心理学和医学中用于识别在某一结构上得分高或低的被试。基于筛查结果,评估者将被试分配到与不同行动方案相对应的类别中:做出诊断或拒绝诊断;提供服务或不提供服务;或进行进一步评估或结束评估过程。当使用测量方法对个体进行分类时,重要的是决策在各群体之间具有一致性和公平性。理想情况下,如果被试在短时间内反复完成筛查测量,他们每次都会被一致地分配到同一个类别中。此外,分类的一致性不应与被试的背景特征(如性别、种族或民族)相关(即,该测量方法没有测量偏差)。报告分类一致性的估计值是教育测试中的常见做法,但这些估计值在心理学和医学中的筛查应用有限。在本文中,我们提出了两种基于项目反应理论的程序,用于 (a) 估计筛查测量的分类一致性,以及 (b) 评估分类一致性如何受到不同被试群体中测量偏差的影响。我们提供了用于执行这些程序的 R 函数,用真实数据来说明这些程序,并使用蒙特卡罗模拟来指导其正确使用。最后,我们讨论了分类一致性的估计值如何帮助评估专家在使用受保护群体(如按性别、种族或民族定义的群体)的筛查测量时做出更明智的决策。