Suppr超能文献

建立测试集的黄金标准:专家乳腺摄影技师在解释一致性方面的差异。

Establishing a gold standard for test sets: variation in interpretive agreement of expert mammographers.

机构信息

Department of Community & Family Medicine, Norris Cotton Cancer Center, Lebanon, NH, USA.

出版信息

Acad Radiol. 2013 Jun;20(6):731-9. doi: 10.1016/j.acra.2013.01.012.

Abstract

RATIONALE AND OBJECTIVES

Test sets for assessing and improving radiologic image interpretation have been used for decades and typically evaluate performance relative to gold standard interpretations by experts. To assess test sets for screening mammography, a gold standard for whether a woman should be recalled for additional workup is needed, given that interval cancers may be occult on mammography and some findings ultimately determined to be benign require additional imaging to determine if biopsy is warranted. Using experts to set a gold standard assumes little variation occurs in their interpretations, but this has not been explicitly studied in mammography.

MATERIALS AND METHODS

Using digitized films from 314 screening mammography exams (n = 143 cancer cases) performed in the Breast Cancer Surveillance Consortium, we evaluated interpretive agreement among three expert radiologists who independently assessed whether each examination should be recalled, and the lesion location, finding type (mass, calcification, asymmetric density, or architectural distortion), and interpretive difficulty in the recalled images.

RESULTS

Agreement among the three expert pairs for recall/no recall was higher for cancer cases (mean 74.3 ± 6.5) than for noncancers (mean 62.6 ± 7.1). Complete agreement on recall, lesion location, finding type and difficulty ranged from 36.4% to 42.0% for cancer cases and from 43.9% to 65.6% for noncancer cases. Two of three experts agreed on recall and lesion location for 95.1% of cancer cases and 91.8% of noncancer cases, but all three experts agreed on only 55.2% of cancer cases and 42.1% of noncancer cases.

CONCLUSION

Variability in expert interpretive is notable. A minimum of three independent experts combined with a consensus should be used for establishing any gold standard interpretation for test sets, especially for noncancer cases.

摘要

背景与目的

评估和提高放射图像解读能力的测试集已经使用了几十年,通常是通过专家的金标准解读来评估其性能。为了评估筛查性乳房 X 线摄影的测试集,需要有一个金标准来确定女性是否需要进行额外的检查,因为在乳房 X 线上,间隔期癌症可能是隐匿的,而一些最终确定为良性的发现需要进行额外的影像学检查来确定是否需要进行活检。使用专家来设定金标准假设他们的解读很少存在差异,但这在乳房 X 线摄影中尚未得到明确研究。

材料与方法

我们使用了在乳腺癌监测联盟中进行的 314 例筛查性乳房 X 线摄影检查(n=143 例癌症病例)的数字化胶片,评估了三位独立评估每例检查是否需要召回的专家放射科医生之间的解读一致性,以及在召回图像中的病变位置、发现类型(肿块、钙化、不对称密度或结构扭曲)和解读难度。

结果

对于癌症病例,三位专家之间关于召回/不召回的一致性(平均 74.3±6.5)高于非癌症病例(平均 62.6±7.1)。对于癌症病例,召回、病变位置、发现类型和难度的完全一致性范围为 36.4%至 42.0%,对于非癌症病例为 43.9%至 65.6%。三位专家中的两位对 95.1%的癌症病例和 91.8%的非癌症病例的召回和病变位置达成一致,但三位专家仅对 55.2%的癌症病例和 42.1%的非癌症病例达成一致。

结论

专家解读的变异性是显著的。对于建立任何测试集的金标准解读,特别是对于非癌症病例,应至少使用三位独立专家并结合共识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc20/3741406/6432bb81ed2e/nihms467611f1.jpg

相似文献

引用本文的文献

本文引用的文献

6
Evidence of reference standard related bias in studies of plain radiograph reading performance: a meta-regression.
Br J Radiol. 2007 Jun;80(954):406-13. doi: 10.1259/bjr/41006673. Epub 2006 Dec 6.
7
Impact of the number of readers on mammography interpretation.
Acta Radiol. 2006 Sep;47(7):655-9. doi: 10.1080/02841850600803842.
9
Measurement of observer agreement.观察者一致性的测量。
Radiology. 2003 Aug;228(2):303-8. doi: 10.1148/radiol.2282011860. Epub 2003 Jun 20.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验