Suppr超能文献

评估存在缺失评分信息的多个评分者之间的一致性,并将其应用于乳腺癌肿瘤分级。

Assessing agreement between multiple raters with missing rating information, applied to breast cancer tumour grading.

作者信息

Fanshawe Thomas R, Lynch Andrew G, Ellis Ian O, Green Andrew R, Hanka Rudolf

机构信息

Department of Medicine, Lancaster University, Lancaster, United Kingdom.

出版信息

PLoS One. 2008 Aug 13;3(8):e2925. doi: 10.1371/journal.pone.0002925.

Abstract

BACKGROUND

We consider the problem of assessing inter-rater agreement when there are missing data and a large number of raters. Previous studies have shown only 'moderate' agreement between pathologists in grading breast cancer tumour specimens. We analyse a large but incomplete data-set consisting of 24,177 grades, on a discrete 1-3 scale, provided by 732 pathologists for 52 samples.

METHODOLOGY/PRINCIPAL FINDINGS: We review existing methods for analysing inter-rater agreement for multiple raters and demonstrate two further methods. Firstly, we examine a simple non-chance-corrected agreement score based on the observed proportion of agreements with the consensus for each sample, which makes no allowance for missing data. Secondly, treating grades as lying on a continuous scale representing tumour severity, we use a Bayesian latent trait method to model cumulative probabilities of assigning grade values as functions of the severity and clarity of the tumour and of rater-specific parameters representing boundaries between grades 1-2 and 2-3. We simulate from the fitted model to estimate, for each rater, the probability of agreement with the majority. Both methods suggest that there are differences between raters in terms of rating behaviour, most often caused by consistent over- or under-estimation of the grade boundaries, and also considerable variability in the distribution of grades assigned to many individual samples. The Bayesian model addresses the tendency of the agreement score to be biased upwards for raters who, by chance, see a relatively 'easy' set of samples.

CONCLUSIONS/SIGNIFICANCE: Latent trait models can be adapted to provide novel information about the nature of inter-rater agreement when the number of raters is large and there are missing data. In this large study there is substantial variability between pathologists and uncertainty in the identity of the 'true' grade of many of the breast cancer tumours, a fact often ignored in clinical studies.

摘要

背景

我们考虑在存在缺失数据且有大量评分者的情况下评估评分者间一致性的问题。先前的研究表明,病理学家在对乳腺癌肿瘤标本进行分级时仅有“中等程度”的一致性。我们分析了一个由732名病理学家针对52个样本提供的、规模庞大但不完整的数据集,其中包含24177个离散的1 - 3级评分。

方法/主要发现:我们回顾了用于分析多个评分者间一致性的现有方法,并展示了另外两种方法。首先,我们考察了一个基于每个样本与共识的观察到的一致比例的简单非机遇校正一致得分,该得分未考虑缺失数据。其次,将分级视为代表肿瘤严重程度的连续尺度上的值,我们使用贝叶斯潜在特质方法来建模分配分级值的累积概率,将其作为肿瘤严重程度和清晰度以及代表1 - 2级和2 - 3级边界的评分者特定参数的函数。我们从拟合模型进行模拟,以估计每个评分者与多数意见一致的概率。两种方法都表明,评分者在评分行为方面存在差异,最常见的原因是对分级边界的一致高估或低估,并且在分配给许多单个样本的分级分布中也存在相当大的变异性。贝叶斯模型解决了对于偶然看到一组相对“简单”样本的评分者,一致得分有向上偏差的趋势。

结论/意义:当评分者数量众多且存在缺失数据时,潜在特质模型可以进行调整以提供关于评分者间一致性本质的新信息。在这项大型研究中,病理学家之间存在很大的变异性,并且许多乳腺癌肿瘤的“真实”分级身份存在不确定性,这一事实在临床研究中常常被忽视。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b26/2488396/48db6d6cae75/pone.0002925.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验