Obuchowski Nancy A, Reeves Anthony P, Huang Erich P, Wang Xiao-Feng, Buckler Andrew J, Kim Hyun J Grace, Barnhart Huiman X, Jackson Edward F, Giger Maryellen L, Pennello Gene, Toledano Alicia Y, Kalpathy-Cramer Jayashree, Apanasovich Tatiyana V, Kinahan Paul E, Myers Kyle J, Goldgof Dmitry B, Barboriak Daniel P, Gillies Robert J, Schwartz Lawrence H, Sullivan Daniel C
Cleveland Clinic Foundation, Cleveland, OH, USA
Cornell University, Ithaca, NY, USA.
Stat Methods Med Res. 2015 Feb;24(1):68-106. doi: 10.1177/0962280214537390. Epub 2014 Jun 11.
Quantitative biomarkers from medical images are becoming important tools for clinical diagnosis, staging, monitoring, treatment planning, and development of new therapies. While there is a rich history of the development of quantitative imaging biomarker (QIB) techniques, little attention has been paid to the validation and comparison of the computer algorithms that implement the QIB measurements. In this paper we provide a framework for QIB algorithm comparisons. We first review and compare various study designs, including designs with the true value (e.g. phantoms, digital reference images, and zero-change studies), designs with a reference standard (e.g. studies testing equivalence with a reference standard), and designs without a reference standard (e.g. agreement studies and studies of algorithm precision). The statistical methods for comparing QIB algorithms are then presented for various study types using both aggregate and disaggregate approaches. We propose a series of steps for establishing the performance of a QIB algorithm, identify limitations in the current statistical literature, and suggest future directions for research.
医学图像中的定量生物标志物正成为临床诊断、分期、监测、治疗规划和新疗法开发的重要工具。虽然定量成像生物标志物(QIB)技术有着丰富的发展历史,但对实现QIB测量的计算机算法的验证和比较却很少受到关注。在本文中,我们提供了一个QIB算法比较的框架。我们首先回顾和比较各种研究设计,包括具有真值的设计(如体模、数字参考图像和零变化研究)、具有参考标准的设计(如测试与参考标准等效性的研究)以及没有参考标准的设计(如一致性研究和算法精度研究)。然后针对各种研究类型,使用汇总和分解方法介绍了比较QIB算法的统计方法。我们提出了一系列步骤来确定QIB算法的性能,识别当前统计文献中的局限性,并提出未来的研究方向。