建立测试集的黄金标准：专家乳腺摄影技师在解释一致性方面的差异。

Establishing a gold standard for test sets: variation in interpretive agreement of expert mammographers.

机构信息

Department of Community & Family Medicine, Norris Cotton Cancer Center, Lebanon, NH, USA.

出版信息

Acad Radiol. 2013 Jun;20(6):731-9. doi: 10.1016/j.acra.2013.01.012.

DOI:10.1016/j.acra.2013.01.012

PMID:23664400

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3741406/

Abstract

RATIONALE AND OBJECTIVES

Test sets for assessing and improving radiologic image interpretation have been used for decades and typically evaluate performance relative to gold standard interpretations by experts. To assess test sets for screening mammography, a gold standard for whether a woman should be recalled for additional workup is needed, given that interval cancers may be occult on mammography and some findings ultimately determined to be benign require additional imaging to determine if biopsy is warranted. Using experts to set a gold standard assumes little variation occurs in their interpretations, but this has not been explicitly studied in mammography.

MATERIALS AND METHODS

Using digitized films from 314 screening mammography exams (n = 143 cancer cases) performed in the Breast Cancer Surveillance Consortium, we evaluated interpretive agreement among three expert radiologists who independently assessed whether each examination should be recalled, and the lesion location, finding type (mass, calcification, asymmetric density, or architectural distortion), and interpretive difficulty in the recalled images.

RESULTS

Agreement among the three expert pairs for recall/no recall was higher for cancer cases (mean 74.3 ± 6.5) than for noncancers (mean 62.6 ± 7.1). Complete agreement on recall, lesion location, finding type and difficulty ranged from 36.4% to 42.0% for cancer cases and from 43.9% to 65.6% for noncancer cases. Two of three experts agreed on recall and lesion location for 95.1% of cancer cases and 91.8% of noncancer cases, but all three experts agreed on only 55.2% of cancer cases and 42.1% of noncancer cases.

CONCLUSION

Variability in expert interpretive is notable. A minimum of three independent experts combined with a consensus should be used for establishing any gold standard interpretation for test sets, especially for noncancer cases.

摘要

背景与目的

评估和提高放射图像解读能力的测试集已经使用了几十年，通常是通过专家的金标准解读来评估其性能。为了评估筛查性乳房 X 线摄影的测试集，需要有一个金标准来确定女性是否需要进行额外的检查，因为在乳房 X 线上，间隔期癌症可能是隐匿的，而一些最终确定为良性的发现需要进行额外的影像学检查来确定是否需要进行活检。使用专家来设定金标准假设他们的解读很少存在差异，但这在乳房 X 线摄影中尚未得到明确研究。

材料与方法

我们使用了在乳腺癌监测联盟中进行的 314 例筛查性乳房 X 线摄影检查（n=143 例癌症病例）的数字化胶片，评估了三位独立评估每例检查是否需要召回的专家放射科医生之间的解读一致性，以及在召回图像中的病变位置、发现类型（肿块、钙化、不对称密度或结构扭曲）和解读难度。

结果

对于癌症病例，三位专家之间关于召回/不召回的一致性（平均 74.3±6.5）高于非癌症病例（平均 62.6±7.1）。对于癌症病例，召回、病变位置、发现类型和难度的完全一致性范围为 36.4%至 42.0%，对于非癌症病例为 43.9%至 65.6%。三位专家中的两位对 95.1%的癌症病例和 91.8%的非癌症病例的召回和病变位置达成一致，但三位专家仅对 55.2%的癌症病例和 42.1%的非癌症病例达成一致。

结论

专家解读的变异性是显著的。对于建立任何测试集的金标准解读，特别是对于非癌症病例，应至少使用三位独立专家并结合共识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc20/3741406/6432bb81ed2e/nihms467611f1.jpg

相似文献

Establishing a gold standard for test sets: variation in interpretive agreement of expert mammographers.建立测试集的黄金标准：专家乳腺摄影技师在解释一致性方面的差异。

Acad Radiol. 2013 Jun;20(6):731-9. doi: 10.1016/j.acra.2013.01.012.

Radiologist agreement for mammographic recall by case difficulty and finding type.放射科医生根据病例难度和发现类型对乳腺 X 光片召回的一致性评估。

J Am Coll Radiol. 2012 Nov;9(11):788-94. doi: 10.1016/j.jacr.2012.05.020.

Radiologist Agreement for Mammographic Recall by Case Difficulty and Finding Type.根据病例难度和检查结果类型制定的乳腺钼靶召回放射科医生协议。

J Am Coll Radiol. 2016 Nov;13(11S):e72-e79. doi: 10.1016/j.jacr.2016.09.035.

Correlation Between Screening Mammography Interpretive Performance on a Test Set and Performance in Clinical Practice.乳腺钼靶筛查对测试集的解读性能与临床实践中的性能之间的相关性。

Acad Radiol. 2017 Oct;24(10):1256-1264. doi: 10.1016/j.acra.2017.03.016. Epub 2017 May 24.

Variability of interpretive accuracy among diagnostic mammography facilities.诊断性乳腺钼靶检查机构间解释准确性的差异。

J Natl Cancer Inst. 2009 Jun 3;101(11):814-27. doi: 10.1093/jnci/djp105. Epub 2009 May 26.

Educational interventions to improve screening mammography interpretation: a randomized controlled trial.教育干预措施以提高筛查性乳房 X 光摄影解读：一项随机对照试验。

AJR Am J Roentgenol. 2014 Jun;202(6):W586-96. doi: 10.2214/AJR.13.11147.

Interpretive Performance and Inter-Observer Agreement on Digital Mammography Test Sets.解读性能和数字乳腺摄影测试集的观察者间一致性。

Korean J Radiol. 2019 Feb;20(2):218-224. doi: 10.3348/kjr.2018.0193.

Assessing radiologist performance using combined digital mammography and breast tomosynthesis compared with digital mammography alone: results of a multicenter, multireader trial.联合数字乳腺摄影和乳腺断层合成与单独数字乳腺摄影评估放射科医生表现：多中心、多读者试验结果。

Radiology. 2013 Jan;266(1):104-13. doi: 10.1148/radiol.12120674. Epub 2012 Nov 20.

Are radiologists' goals for mammography accuracy consistent with published recommendations?放射科医生的 mammography 准确性目标是否与已发表的建议一致？

Acad Radiol. 2012 Mar;19(3):289-95. doi: 10.1016/j.acra.2011.10.013. Epub 2011 Nov 30.

Association between time spent interpreting, level of confidence, and accuracy of screening mammography.解读时间、信心水平与乳腺 X 光筛查准确性之间的关系。

AJR Am J Roentgenol. 2012 Apr;198(4):970-8. doi: 10.2214/AJR.11.6988.

引用本文的文献

Comparison of the clinicians' and experts' assessments of rehabilitation service needs for patients in the middle of China.中国中部地区患者康复服务需求的临床医生与专家评估比较

Sci Rep. 2025 Jul 24;15(1):26950. doi: 10.1038/s41598-025-09359-3.

Mammographic features differ with body composition in women with breast cancer.乳腺癌女性的乳房X线特征因身体成分而异。

Eur Radiol. 2025 Jan;35(1):151-159. doi: 10.1007/s00330-024-10937-8. Epub 2024 Jul 12.

Acad Radiol. 2017 Oct;24(10):1256-1264. doi: 10.1016/j.acra.2017.03.016. Epub 2017 May 24.

Collective intelligence meets medical decision-making: the collective outperforms the best radiologist.集体智慧与医学决策：集体表现优于最佳放射科医生。

PLoS One. 2015 Aug 12;10(8):e0134269. doi: 10.1371/journal.pone.0134269. eCollection 2015.

Wavelet-based 3D reconstruction of microcalcification clusters from two mammographic views: new evidence that fractal tumors are malignant and Euclidean tumors are benign.基于小波的从两个乳腺X线视图对微钙化簇进行三维重建：分形肿瘤为恶性而欧几里得肿瘤为良性的新证据。

PLoS One. 2014 Sep 15;9(9):e107580. doi: 10.1371/journal.pone.0107580. eCollection 2014.

Educational interventions to improve screening mammography interpretation: a randomized controlled trial.教育干预措施以提高筛查性乳房 X 光摄影解读：一项随机对照试验。

AJR Am J Roentgenol. 2014 Jun;202(6):W586-96. doi: 10.2214/AJR.13.11147.

本文引用的文献

Association between time spent interpreting, level of confidence, and accuracy of screening mammography.解读时间、信心水平与乳腺 X 光筛查准确性之间的关系。

AJR Am J Roentgenol. 2012 Apr;198(4):970-8. doi: 10.2214/AJR.11.6988.

Variability in interpretive performance at screening mammography and radiologists' characteristics associated with accuracy.筛查性乳房 X 光摄影中的解释性能的可变性和与准确性相关的放射科医生的特征。

Radiology. 2009 Dec;253(3):641-51. doi: 10.1148/radiol.2533082308. Epub 2009 Oct 28.

Inter-observer variability in mammography screening and effect of type and number of readers on screening outcome.乳腺钼靶筛查中的观察者间变异性以及阅片者类型和数量对筛查结果的影响。

Br J Cancer. 2009 Mar 24;100(6):901-7. doi: 10.1038/sj.bjc.6604954. Epub 2009 Mar 3.

Consensus review of discordant findings maximizes cancer detection rate in double-reader screening mammography: Irish National Breast Screening Program experience.不一致结果的共识审查可提高双读乳腺筛查钼靶检查的癌症检出率：爱尔兰国家乳腺筛查计划经验

Radiology. 2009 Feb;250(2):354-62. doi: 10.1148/radiol.2502080224.

Interobserver and intraobserver variation among experts in the diagnosis of thyroid follicular lesions with borderline nuclear features of papillary carcinoma.甲状腺滤泡性病变诊断中具有乳头状癌核特征不典型的专家间及专家内变异。

Am J Clin Pathol. 2008 Nov;130(5):736-44. doi: 10.1309/AJCPKP2QUVN4RCCP.

Evidence of reference standard related bias in studies of plain radiograph reading performance: a meta-regression.

Br J Radiol. 2007 Jun;80(954):406-13. doi: 10.1259/bjr/41006673. Epub 2006 Dec 6.

Impact of the number of readers on mammography interpretation.

Acta Radiol. 2006 Sep;47(7):655-9. doi: 10.1080/02841850600803842.

Reader variability in reporting breast imaging according to BI-RADS assessment categories (the Florence experience).根据乳腺影像报告和数据系统（BI-RADS）评估类别报告乳腺影像时的读者变异性（佛罗伦萨经验）。

Breast. 2006 Feb;15(1):44-51. doi: 10.1016/j.breast.2005.04.019. Epub 2005 Aug 1.

Measurement of observer agreement.观察者一致性的测量。

Radiology. 2003 Aug;228(2):303-8. doi: 10.1148/radiol.2282011860. Epub 2003 Jun 20.

Does practice make perfect when interpreting mammography? Part II.乳腺钼靶摄影解读时熟能生巧吗？第二部分。

J Natl Cancer Inst. 2003 Feb 19;95(4):250-2. doi: 10.1093/jnci/95.4.250.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验