Gallas Brandon D, Brown David G
NIBIB/CDRH Laboratory for the Assessment of Medical Imaging Systems, FDA, Silver Spring, MD 20993-0002, United States.
Neural Netw. 2008 Mar-Apr;21(2-3):387-97. doi: 10.1016/j.neunet.2007.12.013. Epub 2007 Dec 23.
Evaluation of computational intelligence (CI) systems designed to improve the performance of a human operator is complicated by the need to include the effect of human variability. In this paper we consider human (reader) variability in the context of medical imaging computer-assisted diagnosis (CAD) systems, and we outline how to compare the detection performance of readers with and without the CAD. An effective and statistically powerful comparison can be accomplished with a receiver operating characteristic (ROC) experiment, summarized by the reader-averaged area under the ROC curve (AUC). The comparison requires sophisticated yet well-developed methods for multi-reader multi-case (MRMC) variance analysis. MRMC variance analysis accounts for random readers, random cases, and correlations in the experiment. In this paper, we extend the methods available for estimating this variability. Specifically, we present a method that can treat arbitrary study designs. Most methods treat only the fully-crossed study design, where every reader reads every case in two experimental conditions. We demonstrate our method with a computer simulation, and we assess the statistical power of a variety of study designs.
评估旨在提高人类操作员性能的计算智能(CI)系统,因需要纳入人类变异性的影响而变得复杂。在本文中,我们在医学成像计算机辅助诊断(CAD)系统的背景下考虑人类(读者)变异性,并概述如何比较有CAD和无CAD时读者的检测性能。通过接收者操作特征(ROC)实验可以实现有效且具有统计学效力的比较,该实验由读者平均ROC曲线下面积(AUC)总结。这种比较需要用于多读者多病例(MRMC)方差分析的复杂但完善的方法。MRMC方差分析考虑了实验中的随机读者、随机病例以及相关性。在本文中,我们扩展了可用于估计这种变异性的方法。具体而言,我们提出了一种能够处理任意研究设计的方法。大多数方法仅处理完全交叉的研究设计,即每个读者在两种实验条件下阅读每个病例。我们通过计算机模拟展示了我们的方法,并评估了各种研究设计的统计学效力。