自我评估性能可改善图像标签的统计融合。

Self-assessed performance improves statistical fusion of image labels.

作者信息

Bryan Frederick W, Xu Zhoubing, Asman Andrew J, Allen Wade M, Reich Daniel S, Landman Bennett A

机构信息

Electrical Engineering, Vanderbilt University, Nashville, Tennessee 37235.

Translational Neuroradiology Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892.

出版信息

Med Phys. 2014 Mar;41(3):031903. doi: 10.1118/1.4864236.

DOI:10.1118/1.4864236

PMID:24593721

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3978333/

Abstract

PURPOSE

Expert manual labeling is the gold standard for image segmentation, but this process is difficult, time-consuming, and prone to inter-individual differences. While fully automated methods have successfully targeted many anatomies, automated methods have not yet been developed for numerous essential structures (e.g., the internal structure of the spinal cord as seen on magnetic resonance imaging). Collaborative labeling is a new paradigm that offers a robust alternative that may realize both the throughput of automation and the guidance of experts. Yet, distributing manual labeling expertise across individuals and sites introduces potential human factors concerns (e.g., training, software usability) and statistical considerations (e.g., fusion of information, assessment of confidence, bias) that must be further explored. During the labeling process, it is simple to ask raters to self-assess the confidence of their labels, but this is rarely done and has not been previously quantitatively studied. Herein, the authors explore the utility of self-assessment in relation to automated assessment of rater performance in the context of statistical fusion.

METHODS

The authors conducted a study of 66 volumes manually labeled by 75 minimally trained human raters recruited from the university undergraduate population. Raters were given 15 min of training during which they were shown examples of correct segmentation, and the online segmentation tool was demonstrated. The volumes were labeled 2D slice-wise, and the slices were unordered. A self-assessed quality metric was produced by raters for each slice by marking a confidence bar superimposed on the slice. Volumes produced by both voting and statistical fusion algorithms were compared against a set of expert segmentations of the same volumes.

RESULTS

Labels for 8825 distinct slices were obtained. Simple majority voting resulted in statistically poorer performance than voting weighted by self-assessed performance. Statistical fusion resulted in statistically indistinguishable performance from self-assessed weighted voting. The authors developed a new theoretical basis for using self-assessed performance in the framework of statistical fusion and demonstrated that the combined sources of information (both statistical assessment and self-assessment) yielded statistically significant improvement over the methods considered separately.

CONCLUSIONS

The authors present the first systematic characterization of self-assessed performance in manual labeling. The authors demonstrate that self-assessment and statistical fusion yield similar, but complementary, benefits for label fusion. Finally, the authors present a new theoretical basis for combining self-assessments with statistical label fusion.

摘要

目的

专家手动标注是图像分割的金标准，但该过程困难、耗时且容易出现个体差异。虽然全自动方法已成功应用于许多解剖结构，但针对众多重要结构（如磁共振成像所见脊髓内部结构）的自动化方法尚未开发出来。协作标注是一种新的模式，它提供了一种强大的替代方案，可能实现自动化的通量和专家的指导。然而，将手动标注专业知识分散到不同个体和地点会引入潜在的人为因素问题（如培训、软件可用性）和统计考量（如信息融合、置信度评估、偏差），这些都必须进一步探讨。在标注过程中，要求评分者对其标注的置信度进行自我评估很简单，但这种情况很少发生，且此前尚未进行过定量研究。在此，作者探讨了在统计融合背景下，自我评估与评分者表现的自动评估相关的效用。

方法

作者对由从大学本科人群中招募的75名经过最少培训的人类评分者手动标注的66个容积进行了研究。评分者接受了15分钟的培训，期间向他们展示了正确分割的示例，并演示了在线分割工具。容积按二维切片方式进行标注，且切片是无序的。评分者通过在叠加在切片上的置信条上做标记，为每个切片生成一个自我评估的质量指标。将投票算法和统计融合算法生成的容积与同一容积的一组专家分割结果进行比较。

结果

获得了8825个不同切片的标注。简单多数投票在统计学上的表现比根据自我评估表现加权的投票要差。统计融合在统计学上的表现与自我评估加权投票无法区分。作者为在统计融合框架中使用自我评估表现建立了一个新的理论基础，并证明信息的综合来源（统计评估和自我评估）比单独考虑的方法在统计学上有显著改进。

结论

作者首次对手动标注中自我评估表现进行了系统描述。作者证明自我评估和统计融合在标签融合方面产生了相似但互补的益处。最后，作者提出了将自我评估与统计标签融合相结合的新理论基础。

相似文献

Self-assessed performance improves statistical fusion of image labels.

Med Phys. 2014 Mar;41(3):031903. doi: 10.1118/1.4864236.

Segmentation of malignant gliomas through remote collaboration and statistical fusion.

Med Phys. 2012 Oct;39(10):5981-9. doi: 10.1118/1.4749967.

Foibles, follies, and fusion: web-based collaboration for medical image labeling.

Neuroimage. 2012 Jan 2;59(1):530-9. doi: 10.1016/j.neuroimage.2011.07.085. Epub 2011 Aug 2.

An algorithm for optimal fusion of atlases with different labeling protocols.

Neuroimage. 2015 Feb 1;106:451-63. doi: 10.1016/j.neuroimage.2014.11.031. Epub 2014 Nov 22.

Incorporating priors on expert performance parameters for segmentation validation and label fusion: a maximum a posteriori STAPLE.

Med Image Comput Comput Assist Interv. 2010;13(Pt 3):25-32. doi: 10.1007/978-3-642-15711-0_4.

Non-local statistical label fusion for multi-atlas segmentation.

Med Image Anal. 2013 Feb;17(2):194-208. doi: 10.1016/j.media.2012.10.002. Epub 2012 Nov 29.

Characterizing spatially varying performance to improve multi-atlas multi-label segmentation.

Inf Process Med Imaging. 2011;22:85-96. doi: 10.1007/978-3-642-22092-0_8.

A label fusion method using conditional random fields with higher-order potentials: Application to hippocampal segmentation.

Artif Intell Med. 2015 Jun;64(2):117-29. doi: 10.1016/j.artmed.2015.04.005. Epub 2015 May 4.

Spatially varying accuracy and reproducibility of prostate segmentation in magnetic resonance images using manual and semiautomated methods.

Med Phys. 2014 Nov;41(11):113503. doi: 10.1118/1.4899182.

Hippocampal volume change measurement: quantitative assessment of the reproducibility of expert manual outlining and the automated methods FreeSurfer and FIRST.

Neuroimage. 2014 May 15;92:169-81. doi: 10.1016/j.neuroimage.2014.01.058. Epub 2014 Feb 9.

引用本文的文献

Multi-atlas segmentation of biomedical images: A survey.

Med Image Anal. 2015 Aug;24(1):205-219. doi: 10.1016/j.media.2015.06.012. Epub 2015 Jul 6.

本文引用的文献

Robust GM/WM segmentation of the spinal cord with iterative non-local statistical fusion.

Med Image Comput Comput Assist Interv. 2013;16(Pt 1):759-67. doi: 10.1007/978-3-642-40811-3_95.

Multi-organ abdominal CT segmentation using hierarchically weighted subject-specific atlases.

Med Image Comput Comput Assist Interv. 2012;15(Pt 1):10-7. doi: 10.1007/978-3-642-33415-3_2.

Using Amazon's Mechanical Turk website to measure accuracy of body size estimation and body dissatisfaction.

Body Image. 2012 Sep;9(4):532-4. doi: 10.1016/j.bodyim.2012.06.006. Epub 2012 Jul 24.

Estimating a reference standard segmentation with spatially varying performance parameters: local MAP STAPLE.

IEEE Trans Med Imaging. 2012 Aug;31(8):1593-606. doi: 10.1109/TMI.2012.2197406. Epub 2012 May 2.

Formulating spatially varying performance in the statistical fusion framework.

IEEE Trans Med Imaging. 2012 Jun;31(6):1326-36. doi: 10.1109/TMI.2012.2190992. Epub 2012 Mar 15.

Robust statistical fusion of image labels.

IEEE Trans Med Imaging. 2012 Feb;31(2):512-22. doi: 10.1109/TMI.2011.2172215. Epub 2011 Oct 14.

Foibles, follies, and fusion: web-based collaboration for medical image labeling.

Neuroimage. 2012 Jan 2;59(1):530-9. doi: 10.1016/j.neuroimage.2011.07.085. Epub 2011 Aug 2.

Conducting behavioral research on Amazon's Mechanical Turk.

Behav Res Methods. 2012 Mar;44(1):1-23. doi: 10.3758/s13428-011-0124-6.

Robust statistical label fusion through COnsensus Level, Labeler Accuracy, and Truth Estimation (COLLATE).

IEEE Trans Med Imaging. 2011 Oct;30(10):1779-94. doi: 10.1109/TMI.2011.2147795. Epub 2011 Apr 29.

Foibles, Follies, and Fusion: Assessment of Statistical Label Fusion Techniques for Web-Based Collaborations using Minimal Training.

Proc SPIE Int Soc Opt Eng. 2011;7962:79623G. doi: 10.1117/12.877471.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

自我评估性能可改善图像标签的统计融合。

Self-assessed performance improves statistical fusion of image labels.

作者信息

Bryan Frederick W, Xu Zhoubing, Asman Andrew J, Allen Wade M, Reich Daniel S, Landman Bennett A

机构信息

Electrical Engineering, Vanderbilt University, Nashville, Tennessee 37235.

Translational Neuroradiology Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892.

出版信息

Med Phys. 2014 Mar;41(3):031903. doi: 10.1118/1.4864236.

DOI:10.1118/1.4864236

PMID:24593721

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3978333/

Abstract

PURPOSE

METHODS

RESULTS

CONCLUSIONS

摘要

自我评估性能可改善图像标签的统计融合。

Self-assessed performance improves statistical fusion of image labels.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

自我评估性能可改善图像标签的统计融合。

Self-assessed performance improves statistical fusion of image labels.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论