估计筛查措施的分类一致性，并量化测量偏倚的影响。

Estimating classification consistency of screening measures and quantifying the impact of measurement bias.

机构信息

Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill.

Department of Psychiatry, University of California, San Diego.

出版信息

Psychol Assess. 2021 Jul;33(7):596-609. doi: 10.1037/pas0000938. Epub 2021 May 17.

DOI:10.1037/pas0000938

PMID:33998821

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8412438/

Abstract

Screening measures are used in psychology and medicine to identify respondents who are high or low on a construct. Based on the screening, the evaluator assigns respondents to classes corresponding to different courses of action: Make a diagnosis versus reject a diagnosis; provide services versus withhold services; or conduct further assessment versus conclude the assessment process. When measures are used to classify individuals, it is important that the decisions be consistent and equitable across groups. Ideally, if respondents completed the screening measure repeatedly in quick succession, they would be consistently assigned into the same class each time. In addition, the consistency of the classification should be unrelated to the respondents' background characteristics, such as sex, race, or ethnicity (i.e., the measure is free of measurement bias). Reporting estimates of classification consistency is a common practice in educational testing, but there has been limited application of these estimates to screening in psychology and medicine. In this article, we present two procedures based on item response theory that are used (a) to estimate the classification consistency of a screening measure and (b) to evaluate how classification consistency is impacted by measurement bias across respondent groups. We provide R functions to conduct the procedures, illustrate the procedures with real data, and use Monte Carlo simulations to guide their appropriate use. Finally, we discuss how estimates of classification consistency can help assessment specialists make more informed decisions on the use of a screening measure with protected groups (e.g., groups defined by gender, race, or ethnicity). (PsycInfo Database Record (c) 2021 APA, all rights reserved).

摘要

筛查措施在心理学和医学中用于识别在某一结构上得分高或低的被试。基于筛查结果，评估者将被试分配到与不同行动方案相对应的类别中：做出诊断或拒绝诊断；提供服务或不提供服务；或进行进一步评估或结束评估过程。当使用测量方法对个体进行分类时，重要的是决策在各群体之间具有一致性和公平性。理想情况下，如果被试在短时间内反复完成筛查测量，他们每次都会被一致地分配到同一个类别中。此外，分类的一致性不应与被试的背景特征（如性别、种族或民族）相关（即，该测量方法没有测量偏差）。报告分类一致性的估计值是教育测试中的常见做法，但这些估计值在心理学和医学中的筛查应用有限。在本文中，我们提出了两种基于项目反应理论的程序，用于 (a) 估计筛查测量的分类一致性，以及 (b) 评估分类一致性如何受到不同被试群体中测量偏差的影响。我们提供了用于执行这些程序的 R 函数，用真实数据来说明这些程序，并使用蒙特卡罗模拟来指导其正确使用。最后，我们讨论了分类一致性的估计值如何帮助评估专家在使用受保护群体（如按性别、种族或民族定义的群体）的筛查测量时做出更明智的决策。

相似文献

Estimating classification consistency of screening measures and quantifying the impact of measurement bias.

Psychol Assess. 2021 Jul;33(7):596-609. doi: 10.1037/pas0000938. Epub 2021 May 17.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Estimating classification consistency of machine learning models for screening measures.

Psychol Assess. 2024 Jun-Jul;36(6-7):395-406. doi: 10.1037/pas0001313.

Measurement invariance and response consistency of single-item assessments for suicidal thoughts and behaviors.

Psychol Assess. 2023 Oct;35(10):830-841. doi: 10.1037/pas0001268. Epub 2023 Sep 4.

How Accurate and Consistent Are Score-Based Assessment Decisions? A Procedure Using the Linear Factor Model.

Assessment. 2023 Jul;30(5):1640-1650. doi: 10.1177/10731911221113568. Epub 2022 Aug 11.

School-based interventions for reducing disciplinary school exclusion: a systematic review.

Campbell Syst Rev. 2018 Jan 9;14(1):i-216. doi: 10.4073/csr.2018.1. eCollection 2018.

Opportunities to improve measurement-based care practices in mental health care systems: A case example of electronic mental health screening and measurement.

Fam Syst Health. 2018 Dec;36(4):427-438. doi: 10.1037/fsh0000379.

Improving our understanding of predictive bias in testing.

J Appl Psychol. 2024 Mar;109(3):402-414. doi: 10.1037/apl0001152. Epub 2023 Oct 12.

The role of raters threshold in estimating interrater agreement.

Psychol Methods. 2021 Oct;26(5):622-634. doi: 10.1037/met0000416.

Summary Intervals for Model-Based Classification Accuracy and Consistency Indices.

Educ Psychol Meas. 2023 Apr;83(2):240-261. doi: 10.1177/00131644221092347. Epub 2022 Apr 28.

引用本文的文献

Estimating classification consistency of machine learning models for screening measures.

Psychol Assess. 2024 Jun-Jul;36(6-7):395-406. doi: 10.1037/pas0001313.

Summary Intervals for Model-Based Classification Accuracy and Consistency Indices.

Educ Psychol Meas. 2023 Apr;83(2):240-261. doi: 10.1177/00131644221092347. Epub 2022 Apr 28.

How Accurate and Consistent Are Score-Based Assessment Decisions? A Procedure Using the Linear Factor Model.

Assessment. 2023 Jul;30(5):1640-1650. doi: 10.1177/10731911221113568. Epub 2022 Aug 11.

本文引用的文献

Estimating classification consistency of machine learning models for screening measures.

Psychol Assess. 2024 Jun-Jul;36(6-7):395-406. doi: 10.1037/pas0001313.

When Does Differential Item Functioning Matter for Screening? A Method for Empirical Evaluation.

Assessment. 2021 Mar;28(2):446-456. doi: 10.1177/1073191120913618. Epub 2020 Apr 4.

It Might Not Make a Big DIF: Improved Differential Test Functioning Statistics That Account for Sampling Variability.

Educ Psychol Meas. 2016 Feb;76(1):114-140. doi: 10.1177/0013164415584576. Epub 2015 Jun 29.

Benchmark validation of statistical models: Application to mediation analysis of imagery and memory.

Psychol Methods. 2018 Dec;23(4):654-671. doi: 10.1037/met0000174. Epub 2018 Mar 29.

The validity and reliability of screening measures for depression and anxiety disorders in multiple sclerosis.

Mult Scler Relat Disord. 2018 Feb;20:9-15. doi: 10.1016/j.msard.2017.12.007. Epub 2017 Dec 16.

Utilizing two-tiered screening for early detection of autism spectrum disorder.

Autism. 2018 Oct;22(7):881-890. doi: 10.1177/1362361317712649. Epub 2017 Sep 14.

A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.

Subst Use Misuse. 2018 Feb 23;53(3):391-399. doi: 10.1080/10826084.2017.1332645. Epub 2017 Aug 4.

An illustration of the effects of fluctuations in test information on measurement error, the attenuation of effect sizes, and diagnostic reliability.

Psychol Assess. 2018 Aug;30(8):991-1003. doi: 10.1037/pas0000471. Epub 2017 Mar 16.

Screening for Depression in the General Population with the Center for Epidemiologic Studies Depression (CES-D): A Systematic Review with Meta-Analysis.

PLoS One. 2016 May 16;11(5):e0155431. doi: 10.1371/journal.pone.0155431. eCollection 2016.

Sample Size in Factor Analysis: The Role of Model Error.

Multivariate Behav Res. 2001 Oct 1;36(4):611-37. doi: 10.1207/S15327906MBR3604_06.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

估计筛查措施的分类一致性，并量化测量偏倚的影响。

Estimating classification consistency of screening measures and quantifying the impact of measurement bias.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献