诊断影像学中的观察者间变异性研究：方法学系统评价。

Interobserver variability studies in diagnostic imaging: a methodological systematic review.

机构信息

Test Evaluation Research Group, Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom.

NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, United Kingdom.

出版信息

Br J Radiol. 2023 Aug;96(1148):20220972. doi: 10.1259/bjr.20220972. Epub 2023 Jun 29.

DOI:10.1259/bjr.20220972

PMID:37399082

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10392644/

Abstract

OBJECTIVES

To review the methodology of interobserver variability studies; including current practice and quality of conducting and reporting studies.

METHODS

Interobserver variability studies between January 2019 and January 2020 were included; extracted data comprised of study characteristics, populations, variability measures, key results, and conclusions. Risk of bias was assessed using the COSMIN tool for assessing reliability and measurement error.

RESULTS

Seventy-nine full-text studies were included covering various imaging tests and clinical areas. The median number of patients was 47 (IQR:23-88), and observers were 4 (IQR:2-7), with sample size justified in 12 (15%) studies. Most studies used static images ( = 75, 95%), where all observers interpreted images for all patients ( = 67, 85%). Intraclass correlation coefficients (ICC) ( = 41, 52%), Kappa (κ) statistics ( = 31, 39%) and percentage agreement ( = 15, 19%) were most commonly used. Interpretation of variability estimates often did not correspond with study conclusions. The COSMIN risk of bias tool gave a very good/adequate rating for 52 studies (66%) including any studies that used variability measures listed in the tool. For studies using static images, some study design standards were not applicable and did not contribute to the overall rating.

CONCLUSIONS

Interobserver variability studies have diverse study designs and methods, the impact of which requires further evaluation. Sample size for patients and observers was often small without justification. Most studies report ICC and κ values, which did not always coincide with the study conclusion. High ratings were assigned to many studies using the COSMIN risk of bias tool, with certain standards scored 'not applicable' when static images were used.

ADVANCES IN KNOWLEDGE

The sample size for both patients and observers was often small without justification. For most studies, observers interpreted static images and did not evaluate the process of acquiring the imaging test, meaning it was not possible to assess many COSMIN risk of bias standards for studies with this design. Most studies reported intraclass correlation coefficient and κ statistics; study conclusions often did not correspond with results.

摘要

目的

回顾观察者间变异性研究的方法学；包括当前的实践和研究的开展及报告质量。

方法

纳入了 2019 年 1 月至 2020 年 1 月的观察者间变异性研究；提取的数据包括研究特征、人群、变异性测量、主要结果和结论。使用 COSMIN 工具评估可靠性和测量误差来评估偏倚风险。

结果

纳入了 79 项涵盖各种影像学检查和临床领域的全文研究。患者的中位数为 47 例（IQR：23-88），观察者中位数为 4 名（IQR：2-7），12 项研究（15%）中样本量合理。大多数研究使用静态图像（n=75，95%），所有观察者均对所有患者的图像进行解读（n=67，85%）。最常使用的是组内相关系数（ICC）（n=41，52%）、Kappa（κ）统计量（n=31，39%）和百分比一致率（n=15，19%）。观察者间变异性估计的解读结果往往与研究结论不符。COSMIN 偏倚风险工具对 52 项研究（66%）进行了很好/充分的评估，其中包括使用该工具中列出的变异性测量指标的研究。对于使用静态图像的研究，一些研究设计标准不适用，且对整体评分无影响。

结论

观察者间变异性研究的研究设计和方法多种多样，其影响需要进一步评估。患者和观察者的样本量通常较小，且无充分依据。大多数研究报告 ICC 和 κ 值，这些值并不总是与研究结论一致。使用 COSMIN 偏倚风险工具对许多研究进行了较高评分，但当使用静态图像时，某些标准被评为“不适用”。

知识的进步

患者和观察者的样本量通常较小，且无充分依据。对于大多数研究，观察者解读静态图像，并未评估获取影像学检查的过程，因此对于这种设计的研究，无法评估 COSMIN 偏倚风险标准的许多方面。大多数研究报告了组内相关系数和 κ 统计量；研究结论往往与结果不符。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9081/10392644/05e90290f176/bjr.20220972.g001.jpg

相似文献

Interobserver variability studies in diagnostic imaging: a methodological systematic review.诊断影像学中的观察者间变异性研究：方法学系统评价。

Br J Radiol. 2023 Aug;96(1148):20220972. doi: 10.1259/bjr.20220972. Epub 2023 Jun 29.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

What Are the Interobserver and Intraobserver Variability of Gap and Stepoff Measurements in Acetabular Fractures?髋臼骨折中间隙和台阶测量的观察者间及观察者内变异性如何？

Clin Orthop Relat Res. 2020 Dec;478(12):2801-2808. doi: 10.1097/CORR.0000000000001398.

Reproducibility of MRI Features of Uterine Leiomyomas: A Study on Interobserver Agreement and Inter-Method Agreement With Surgery.子宫肌瘤 MRI 特征的可重复性：一项基于观察者间和方法间一致性的与手术对照的研究。

Can Assoc Radiol J. 2022 May;73(2):337-345. doi: 10.1177/08465371211038546. Epub 2021 Aug 16.

68Ga-DOTATATE PET/CT Interobserver Agreement for Neuroendocrine Tumor Assessment: Results of a Prospective Study on 50 Patients.68Ga-DOTATATE PET/CT在神经内分泌肿瘤评估中的观察者间一致性：一项针对50例患者的前瞻性研究结果

J Nucl Med. 2017 Feb;58(2):307-311. doi: 10.2967/jnumed.116.179192. Epub 2016 Aug 18.

Interobserver Reliability in Imaging-Based Fracture Union Assessment-Two Systematic Reviews.基于影像学的骨折愈合评估中的观察者间可靠性——两项系统评价

J Orthop Trauma. 2020 Jan;34(1):e31-e37. doi: 10.1097/BOT.0000000000001599.

Intra- and interobserver agreement with regard to describing adnexal masses using International Ovarian Tumor Analysis terminology: reproducibility study involving seven observers.观察者间及观察者自身使用国际卵巢肿瘤分析术语描述附件包块的一致性：涉及七名观察者的重复性研究

Ultrasound Obstet Gynecol. 2014 Jul;44(1):100-8. doi: 10.1002/uog.13273. Epub 2014 Jun 1.

Intraobserver and interobserver reproducibility in the evaluation of optic disc stereometric parameters by Heidelberg Retina Tomograph.海德堡视网膜断层扫描仪对视盘立体测量参数评估中的观察者内和观察者间可重复性

Ophthalmology. 2002 Jun;109(6):1072-7. doi: 10.1016/s0161-6420(02)01032-1.

Cone-beam CT-based delineation of stereotactic lung targets: the influence of image modality and target size on interobserver variability.基于锥形束 CT 的立体定向肺靶区勾画：图像模态和靶区大小对观察者间变异性的影响。

Int J Radiat Oncol Biol Phys. 2012 Feb 1;82(2):e265-72. doi: 10.1016/j.ijrobp.2011.03.042. Epub 2011 May 27.

A scale of methodological quality for clinical studies of radiologic examinations.放射学检查临床研究的方法学质量量表。

Radiology. 2000 Oct;217(1):69-74. doi: 10.1148/radiology.217.1.r00oc0669.

引用本文的文献

Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.通过与放射学指南的语言模型对齐来评估真实世界患者病例的急性影像检查单开具情况。

Commun Med (Lond). 2025 Aug 4;5(1):332. doi: 10.1038/s43856-025-01061-9.

Enhancing Radiologist Productivity with Artificial Intelligence in Magnetic Resonance Imaging (MRI): A Narrative Review.利用人工智能提高磁共振成像（MRI）中放射科医生的工作效率：一篇叙述性综述。

Diagnostics (Basel). 2025 Apr 30;15(9):1146. doi: 10.3390/diagnostics15091146.

Automated radiography assessment of ankle joint instability using deep learning.使用深度学习对踝关节不稳定进行自动放射学评估。

Sci Rep. 2025 Apr 29;15(1):15012. doi: 10.1038/s41598-025-99620-6.

InterobServer AgreeMent in Pd-l1 evaLuatIoN on cytoloGical samples-SAMPLING project: A multi-institutional, international study.细胞样本中PD-L1评估的观察者间一致性——抽样项目：一项多机构、国际研究

Cancer Cytopathol. 2025 Mar;133(3):e70003. doi: 10.1002/cncy.70003.

Battery for fall risk assessment in older adult people-BARQ: analysis of reliability and objectivity.老年人跌倒风险评估量表——BARQ：可靠性与客观性分析

Front Public Health. 2025 Jan 29;12:1456564. doi: 10.3389/fpubh.2024.1456564. eCollection 2024.

Methodology for the correction of a CBCT volume from the skull to the natural head position.将锥形束计算机断层扫描（CBCT）容积从颅骨位置校正到自然头部位置的方法。

MethodsX. 2024 Nov 27;13:103073. doi: 10.1016/j.mex.2024.103073. eCollection 2024 Dec.

Meta-Analysis of Interobserver Agreement in Assessment of Interstitial Lung Disease Using High-Resolution CT.高分辨率 CT 评估间质性肺疾病观察者间一致性的 Meta 分析。

Radiology. 2024 Oct;313(1):e240016. doi: 10.1148/radiol.240016.

Segmentation-based quantitative measurements in renal CT imaging using deep learning.基于深度学习的肾脏 CT 成像分割定量测量。

Eur Radiol Exp. 2024 Oct 9;8(1):110. doi: 10.1186/s41747-024-00507-4.

Quality Assessment of Periapical Radiographs Taken by Dental Assistants Using the Recent Faculty of General Dental Practice (FGDP) Guidelines.牙科助理使用最新的普通牙科实践学院（FGDP）指南拍摄的根尖片的质量评估

Cureus. 2024 Sep 3;16(9):e68508. doi: 10.7759/cureus.68508. eCollection 2024 Sep.

Artificial intelligence in diagnosing upper limb musculoskeletal disorders: a systematic review and meta-analysis of diagnostic tests.人工智能在诊断上肢肌肉骨骼疾病中的应用：诊断试验的系统评价和荟萃分析

EFORT Open Rev. 2024 Apr 4;9(4):241-251. doi: 10.1530/EOR-23-0174.

本文引用的文献

Retrospective studies - utility and caveats.回顾性研究——实用性及注意事项。

J R Coll Physicians Edinb. 2020 Dec;50(4):398-402. doi: 10.4997/JRCPE.2020.409.

COSMIN Risk of Bias tool to assess the quality of studies on reliability or measurement error of outcome measurement instruments: a Delphi study.COSMIN 偏倚风险工具评估结局测量工具可靠性或测量误差研究的质量：德尔菲研究。

BMC Med Res Methodol. 2020 Dec 3;20(1):293. doi: 10.1186/s12874-020-01179-5.

Kappa and Beyond: Is There Agreement?卡帕值及其他：是否存在一致性？

Global Spine J. 2020 Jun;10(4):499-501. doi: 10.1177/2192568220911648. Epub 2020 Mar 3.

Observer Agreement on Computed Tomography Perfusion Imaging in Acute Ischemic Stroke.观察者协议在急性缺血性脑卒中的计算机断层灌注成像中的应用。

Stroke. 2019 Nov;50(11):3108-3114. doi: 10.1161/STROKEAHA.119.026238. Epub 2019 Sep 25.

Inter-observer variability in target delineation increases during adaptive treatment of head-and-neck and lung cancer.在头颈部和肺癌的自适应治疗中，靶区勾画的观察者间变异性会增加。

Acta Oncol. 2019 Oct;58(10):1378-1385. doi: 10.1080/0284186X.2019.1629017. Epub 2019 Jul 4.

Inter-observer agreement of color duplex ultrasound of central vein stenosis in hemodialysis patients.血液透析患者中心静脉狭窄彩色双功超声检查的观察者间一致性

Phlebology. 2019 Oct;34(9):636-642. doi: 10.1177/0268355519837048. Epub 2019 Mar 14.

Inter-observer variability of clinical target volume delineation in definitive radiotherapy of neck lymph node metastases from unknown primary. A cooperative study of the Italian Association of Radiotherapy and Clinical Oncology (AIRO) Head and Neck Group.颈部淋巴结转移原发灶不明的根治性放疗中临床靶区勾画的观察者间变异性。意大利放射肿瘤学和临床肿瘤学协会（AIRO）头颈部肿瘤协作组的一项合作研究。

Radiol Med. 2019 Jul;124(7):682-692. doi: 10.1007/s11547-019-01006-y. Epub 2019 Mar 9.

Display colour scale effects on diagnostic performance and reader agreement in cardiac CT and prostate apparent diffusion coefficient assessment.显示彩色比例尺效应对心脏 CT 和前列腺表观扩散系数评估中诊断性能和读者一致性的影响。

Clin Radiol. 2019 Jan;74(1):79.e1-79.e9. doi: 10.1016/j.crad.2018.08.016. Epub 2018 Oct 15.

Fundamentals of Diagnostic Error in Imaging.医学影像学诊断错误基础

Radiographics. 2018 Oct;38(6):1845-1865. doi: 10.1148/rg.2018180021.

STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration.《STARD 2015诊断准确性研究报告指南：解释与详述》

BMJ Open. 2016 Nov 14;6(11):e012799. doi: 10.1136/bmjopen-2016-012799.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

诊断影像学中的观察者间变异性研究：方法学系统评价。

Interobserver variability studies in diagnostic imaging: a methodological systematic review.

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

ADVANCES IN KNOWLEDGE

目的

方法

结果

结论

知识的进步

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献