Department of Radiology, Hôpitaux Universitaire de Strasbourg, Hôpital de Hautepierre, 67200, Strasbourg, France; Breast and Thyroid Imaging Unit, Institut de Cancérologie Strasbourg Europe, 67200, Strasbourg, France; IGBMC, Institut de Génétique et de Biologie Moléculaire et Cellulaire, 67400, Illkirch, France.
Inria, Epione Team, Sophia Antipolis, Université Côte d'Azur, 06902, Nice, France.
Diagn Interv Imaging. 2024 Feb;105(2):65-73. doi: 10.1016/j.diii.2023.08.001. Epub 2023 Aug 21.
The purpose of this study was to investigate the relationship between inter-reader variability in manual prostate contour segmentation on magnetic resonance imaging (MRI) examinations and determine the optimal number of readers required to establish a reliable reference standard.
Seven radiologists with various experiences independently performed manual segmentation of the prostate contour (whole-gland [WG] and transition zone [TZ]) on 40 prostate MRI examinations obtained in 40 patients. Inter-reader variability in prostate contour delineations was estimated using standard metrics (Dice similarity coefficient [DSC], Hausdorff distance and volume-based metrics). The impact of the number of readers (from two to seven) on segmentation variability was assessed using pairwise metrics (consistency) and metrics with respect to a reference segmentation (conformity), obtained either with majority voting or simultaneous truth and performance level estimation (STAPLE) algorithm.
The average segmentation DSC for two readers in pairwise comparison was 0.919 for WG and 0.876 for TZ. Variability decreased with the number of readers: the interquartile ranges of the DSC were 0.076 (WG) / 0.021 (TZ) for configurations with two readers, 0.005 (WG) / 0.012 (TZ) for configurations with three readers, and 0.002 (WG) / 0.0037 (TZ) for configurations with six readers. The interquartile range decreased slightly faster between two and three readers than between three and six readers. When using consensus methods, variability often reached its minimum with three readers (with STAPLE, DSC = 0.96 [range: 0.945-0.971] for WG and DSC = 0.94 [range: 0.912-0.957] for TZ, and interquartile range was minimal for configurations with three readers.
The number of readers affects the inter-reader variability, in terms of inter-reader consistency and conformity to a reference. Variability is minimal for three readers, or three readers represent a tipping point in the variability evolution, with both pairwise-based metrics or metrics with respect to a reference. Accordingly, three readers may represent an optimal number to determine references for artificial intelligence applications.
本研究旨在探讨磁共振成像(MRI)检查中手动前列腺轮廓分割的读者间变异性,并确定建立可靠参考标准所需的最佳读者数量。
7 名具有不同经验的放射科医生对 40 名患者的 40 次前列腺 MRI 检查分别进行了前列腺轮廓(全腺[WG]和移行区[TZ])的手动分割。使用标准指标(Dice 相似系数[DSC]、Hausdorff 距离和基于体积的指标)评估前列腺轮廓勾画的读者间变异性。使用两两比较指标(一致性)和参考分割指标(一致性)评估读者数量(从 2 名到 7 名)对分割变异性的影响,参考分割采用多数投票或同时真实和性能水平估计(STAPLE)算法获得。
两名读者的平均分割 DSC 为 0.919(WG)和 0.876(TZ)。随着读者数量的增加,变异性降低:DSC 的四分位间距为 0.076(WG)/0.021(TZ)(两名读者)、0.005(WG)/0.012(TZ)(三名读者)和 0.002(WG)/0.0037(TZ)(六名读者)。两名读者和三名读者之间的四分位间距变化比三名读者和六名读者之间的变化稍快。使用一致性方法时,变异性通常在三名读者时达到最小值(使用 STAPLE,全腺的 DSC=0.96[范围:0.945-0.971],TZ 的 DSC=0.94[范围:0.912-0.957],四分位间距在三名读者的配置中最小)。
读者数量会影响读者间的一致性和与参考标准的一致性,从而影响读者间的变异性。三名读者时变异性最小,或者三名读者代表变异性演变的临界点,无论是基于两两比较的指标还是参考指标都是如此。因此,三名读者可能是确定人工智能应用参考标准的最佳人数。