评估评分者和受试者特征对有序评分一致性测量的影响。

Assessing the influence of rater and subject characteristics on measures of agreement for ordinal ratings.

作者信息

Nelson Kerrie P, Mitani Aya A, Edwards Don

机构信息

Department of Biostatistics, Boston University, 801 Massachusetts Avenue, Boston, MA, 02118, U.S.A.

Department of Statistics, University of South Carolina, Columbia, SC, 29208, U.S.A.

出版信息

Stat Med. 2017 Sep 10;36(20):3181-3199. doi: 10.1002/sim.7323. Epub 2017 Jun 13.

DOI:10.1002/sim.7323

PMID:28612356

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5540881/

Abstract

Widespread inconsistencies are commonly observed between physicians' ordinal classifications in screening tests results such as mammography. These discrepancies have motivated large-scale agreement studies where many raters contribute ratings. The primary goal of these studies is to identify factors related to physicians and patients' test results, which may lead to stronger consistency between raters' classifications. While ordered categorical scales are frequently used to classify screening test results, very few statistical approaches exist to model agreement between multiple raters. Here we develop a flexible and comprehensive approach to assess the influence of rater and subject characteristics on agreement between multiple raters' ordinal classifications in large-scale agreement studies. Our approach is based upon the class of generalized linear mixed models. Novel summary model-based measures are proposed to assess agreement between all, or a subgroup of raters, such as experienced physicians. Hypothesis tests are described to formally identify factors such as physicians' level of experience that play an important role in improving consistency of ratings between raters. We demonstrate how unique characteristics of individual raters can be assessed via conditional modes generated during the modeling process. Simulation studies are presented to demonstrate the performance of the proposed methods and summary measure of agreement. The methods are applied to a large-scale mammography agreement study to investigate the effects of rater and patient characteristics on the strength of agreement between radiologists. Copyright © 2017 John Wiley & Sons, Ltd.

摘要

在诸如乳房X光检查等筛查测试结果中，医生的序数分类之间普遍存在不一致性。这些差异促使了大规模的一致性研究，许多评估者参与评分。这些研究的主要目标是确定与医生和患者测试结果相关的因素，这可能会使评估者的分类之间具有更强的一致性。虽然有序分类量表经常用于对筛查测试结果进行分类，但很少有统计方法可用于对多个评估者之间的一致性进行建模。在此，我们开发了一种灵活且全面的方法，以评估评估者和受试者特征对大规模一致性研究中多个评估者序数分类之间一致性的影响。我们的方法基于广义线性混合模型类。提出了基于模型的新颖汇总度量，以评估所有评估者或一部分评估者（如经验丰富的医生）之间的一致性。描述了假设检验，以正式确定诸如医生经验水平等在提高评估者之间评分一致性方面起重要作用的因素。我们展示了如何通过建模过程中生成的条件模式来评估单个评估者的独特特征。进行了模拟研究，以证明所提出方法和一致性汇总度量的性能。这些方法应用于一项大规模乳房X光检查一致性研究，以调查评估者和患者特征对放射科医生之间一致性强度的影响。版权所有© 2017约翰威立父子有限公司。

相似文献

Assessing the influence of rater and subject characteristics on measures of agreement for ordinal ratings.评估评分者和受试者特征对有序评分一致性测量的影响。

Stat Med. 2017 Sep 10;36(20):3181-3199. doi: 10.1002/sim.7323. Epub 2017 Jun 13.

Evaluating the effects of rater and subject factors on measures of association.评估评分者和受试者因素对关联度量的影响。

Biom J. 2018 May;60(3):639-656. doi: 10.1002/bimj.201700078. Epub 2018 Jan 19.

A measure of association for ordered categorical data in population-based studies.基于人群的研究中有序分类数据的关联度量。

Stat Methods Med Res. 2018 Mar;27(3):812-831. doi: 10.1177/0962280216643347. Epub 2016 May 16.

Summary measures of agreement and association between many raters' ordinal classifications.多位评估者的有序分类之间一致性和关联性的汇总指标。

Ann Epidemiol. 2017 Oct;27(10):677-685.e4. doi: 10.1016/j.annepidem.2017.09.001. Epub 2017 Sep 22.

Measures of agreement between many raters for ordinal classifications.多个评分者对有序分类的一致性度量。

Stat Med. 2015 Oct 15;34(23):3116-32. doi: 10.1002/sim.6546. Epub 2015 Jun 21.

Measuring intrarater association between correlated ordinal ratings.测量相关等级评定的组内关联性。

Biom J. 2020 Nov;62(7):1687-1701. doi: 10.1002/bimj.201900177. Epub 2020 Jun 11.

Improving the reliability of diagnostic tests in population-based agreement studies.提高基于人群的一致性研究中诊断试验的可靠性。

Stat Med. 2010 Mar 15;29(6):617-26. doi: 10.1002/sim.3819.

Measuring rater bias in diagnostic tests with ordinal ratings.用等级评定测量诊断测试中的评分者偏倚。

Stat Med. 2021 Jul 30;40(17):4014-4033. doi: 10.1002/sim.9011. Epub 2021 May 9.

Quantifying rater variation for ordinal data using a rating scale model.使用评分量表模型对有序数据进行评分者变异的量化。

Stat Med. 2018 Jun 30;37(14):2223-2237. doi: 10.1002/sim.7639. Epub 2018 Apr 16.

Validity and reliability of exposure assessors' ratings of exposure intensity by type of occupational questionnaire and type of rater.暴露评估者根据职业问卷类型和评估者类型对暴露强度进行评级的有效性和可靠性。

Ann Occup Hyg. 2011 Jul;55(6):601-11. doi: 10.1093/annhyg/mer019. Epub 2011 Apr 21.

引用本文的文献

Homogeneity score test of AC statistics and estimation of common AC in multiple or stratified inter-rater agreement studies.多或分层组内一致性研究中 AC 统计量的同质性检验和共同 AC 的估计。

BMC Med Res Methodol. 2020 Feb 5;20(1):20. doi: 10.1186/s12874-019-0887-5.

Assessing method agreement for paired repeated binary measurements administered by multiple raters.评估由多个评估者进行的配对重复二分类测量的方法一致性。

Stat Med. 2020 Feb 10;39(3):279-293. doi: 10.1002/sim.8398. Epub 2019 Dec 1.

How do clinicians rate patient's performance status using the ECOG performance scale? A mixed-methods exploration of variability in decision-making in oncology.临床医生如何使用东部肿瘤协作组（ECOG）体能状态量表对患者的体能状态进行评分？肿瘤学决策变异性的混合方法探索。

Ecancermedicalscience. 2019 Mar 28;13:913. doi: 10.3332/ecancer.2019.913. eCollection 2019.

Tongue Image Database Construction Based on the Expert Opinions: Assessment for Individual Agreement and Methods for Expert Selection.基于专家意见的舌象数据库构建：个体一致性评估与专家选择方法

Evid Based Complement Alternat Med. 2018 Oct 2;2018:8491057. doi: 10.1155/2018/8491057. eCollection 2018.

Summary measures of agreement and association between many raters' ordinal classifications.多位评估者的有序分类之间一致性和关联性的汇总指标。

Ann Epidemiol. 2017 Oct;27(10):677-685.e4. doi: 10.1016/j.annepidem.2017.09.001. Epub 2017 Sep 22.

本文引用的文献

Misclassification of Breast Imaging Reporting and Data System (BI-RADS) Mammographic Density and Implications for Breast Density Reporting Legislation.乳腺影像报告和数据系统（BI-RADS）中乳腺钼靶密度的错误分类及其对乳腺密度报告立法的影响

Breast J. 2015 Sep-Oct;21(5):481-9. doi: 10.1111/tbj.12443. Epub 2015 Jul 1.

Measures of agreement between many raters for ordinal classifications.多个评分者对有序分类的一致性度量。

Stat Med. 2015 Oct 15;34(23):3116-32. doi: 10.1002/sim.6546. Epub 2015 Jun 21.

Diagnostic concordance among pathologists interpreting breast biopsy specimens.解读乳腺活检标本的病理学家之间的诊断一致性。

JAMA. 2015 Mar 17;313(11):1122-32. doi: 10.1001/jama.2015.1405.

An assessment of estimation methods for generalized linear mixed models with binary outcomes.二项式结局广义线性混合模型估计方法的评估。

Stat Med. 2013 Nov 20;32(26):4550-66. doi: 10.1002/sim.5866. Epub 2013 Jul 9.

Radiologist agreement for mammographic recall by case difficulty and finding type.放射科医生根据病例难度和发现类型对乳腺 X 光片召回的一致性评估。

J Am Coll Radiol. 2012 Nov;9(11):788-94. doi: 10.1016/j.jacr.2012.05.020.

On fitting generalized linear mixed-effects models for binary responses using different statistical packages.关于使用不同统计软件包对二元响应拟合广义线性混合效应模型。

Stat Med. 2011 Sep 10;30(20):2562-72. doi: 10.1002/sim.4265. Epub 2011 Jun 10.

Bayesian random effects for interrater and test-retest reliability with nested clinical observations.贝叶斯嵌套临床观测的评分者间和重测信度的随机效应

J Clin Epidemiol. 2011 Jul;64(7):808-14. doi: 10.1016/j.jclinepi.2010.10.015. Epub 2011 Feb 2.

Missing data methods in longitudinal studies: a review.纵向研究中的缺失数据方法：综述

Test (Madr). 2009 May 1;18(1):1-43. doi: 10.1007/s11749-009-0138-x.

Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.加权kappa系数：用于衡量名义尺度上的一致性，并考虑了尺度不一致或部分得分的情况。

Psychol Bull. 1968 Oct;70(4):213-20. doi: 10.1037/h0026256.

Generalized linear mixed models: a practical guide for ecology and evolution.广义线性混合模型：生态学与进化实用指南

Trends Ecol Evol. 2009 Mar;24(3):127-35. doi: 10.1016/j.tree.2008.10.008.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验