Test Evaluation Research Group, Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom.
NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, United Kingdom.
Br J Radiol. 2023 Aug;96(1148):20220972. doi: 10.1259/bjr.20220972. Epub 2023 Jun 29.
To review the methodology of interobserver variability studies; including current practice and quality of conducting and reporting studies.
Interobserver variability studies between January 2019 and January 2020 were included; extracted data comprised of study characteristics, populations, variability measures, key results, and conclusions. Risk of bias was assessed using the COSMIN tool for assessing reliability and measurement error.
Seventy-nine full-text studies were included covering various imaging tests and clinical areas. The median number of patients was 47 (IQR:23-88), and observers were 4 (IQR:2-7), with sample size justified in 12 (15%) studies. Most studies used static images ( = 75, 95%), where all observers interpreted images for all patients ( = 67, 85%). Intraclass correlation coefficients (ICC) ( = 41, 52%), Kappa (κ) statistics ( = 31, 39%) and percentage agreement ( = 15, 19%) were most commonly used. Interpretation of variability estimates often did not correspond with study conclusions. The COSMIN risk of bias tool gave a very good/adequate rating for 52 studies (66%) including any studies that used variability measures listed in the tool. For studies using static images, some study design standards were not applicable and did not contribute to the overall rating.
Interobserver variability studies have diverse study designs and methods, the impact of which requires further evaluation. Sample size for patients and observers was often small without justification. Most studies report ICC and κ values, which did not always coincide with the study conclusion. High ratings were assigned to many studies using the COSMIN risk of bias tool, with certain standards scored 'not applicable' when static images were used.
The sample size for both patients and observers was often small without justification. For most studies, observers interpreted static images and did not evaluate the process of acquiring the imaging test, meaning it was not possible to assess many COSMIN risk of bias standards for studies with this design. Most studies reported intraclass correlation coefficient and κ statistics; study conclusions often did not correspond with results.
回顾观察者间变异性研究的方法学;包括当前的实践和研究的开展及报告质量。
纳入了 2019 年 1 月至 2020 年 1 月的观察者间变异性研究;提取的数据包括研究特征、人群、变异性测量、主要结果和结论。使用 COSMIN 工具评估可靠性和测量误差来评估偏倚风险。
纳入了 79 项涵盖各种影像学检查和临床领域的全文研究。患者的中位数为 47 例(IQR:23-88),观察者中位数为 4 名(IQR:2-7),12 项研究(15%)中样本量合理。大多数研究使用静态图像(n=75,95%),所有观察者均对所有患者的图像进行解读(n=67,85%)。最常使用的是组内相关系数(ICC)(n=41,52%)、Kappa(κ)统计量(n=31,39%)和百分比一致率(n=15,19%)。观察者间变异性估计的解读结果往往与研究结论不符。COSMIN 偏倚风险工具对 52 项研究(66%)进行了很好/充分的评估,其中包括使用该工具中列出的变异性测量指标的研究。对于使用静态图像的研究,一些研究设计标准不适用,且对整体评分无影响。
观察者间变异性研究的研究设计和方法多种多样,其影响需要进一步评估。患者和观察者的样本量通常较小,且无充分依据。大多数研究报告 ICC 和 κ 值,这些值并不总是与研究结论一致。使用 COSMIN 偏倚风险工具对许多研究进行了较高评分,但当使用静态图像时,某些标准被评为“不适用”。
患者和观察者的样本量通常较小,且无充分依据。对于大多数研究,观察者解读静态图像,并未评估获取影像学检查的过程,因此对于这种设计的研究,无法评估 COSMIN 偏倚风险标准的许多方面。大多数研究报告了组内相关系数和 κ 统计量;研究结论往往与结果不符。