The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, 9190401, Jerusalem, Israel.
Department of Radiology, Hadassah Hebrew University Medical Center, Jerusalem, Israel.
Eur Radiol. 2019 Mar;29(3):1391-1399. doi: 10.1007/s00330-018-5695-5. Epub 2018 Sep 7.
To quantify the inter-observer variability of manual delineation of lesions and organ contours in CT to establish a reference standard for volumetric measurements for clinical decision making and for the evaluation of automatic segmentation algorithms.
Eleven radiologists manually delineated 3193 contours of liver tumours (896), lung tumours (1085), kidney contours (434) and brain hematomas (497) on 490 slices of clinical CT scans. A comparative analysis of the delineations was then performed to quantify the inter-observer delineation variability with standard volume metrics and with new group-wise metrics for delineations produced by groups of observers.
The mean volume overlap variability values and ranges (in %) between the delineations of two observers were: liver tumours 17.8 [-5.8,+7.2]%, lung tumours 20.8 [-8.8,+10.2]%, kidney contours 8.8 [-0.8,+1.2]% and brain hematomas 18 [-6.0,+6.0] %. For any two randomly selected observers, the mean delineation volume overlap variability was 5-57%. The mean variability captured by groups of two, three and five observers was 37%, 53% and 72%; eight observers accounted for 75-94% of the total variability. For all cases, 38.5% of the delineation non-agreement was due to parts of the delineation of a single observer disagreeing with the others. No statistical difference was found for the delineation variability between the observers based on their expertise.
The variability in manual delineations for different structures and observers is large and spans a wide range across a variety of structures and pathologies. Two and even three observers may not be sufficient to establish the full range of inter-observer variability.
• This study quantifies the inter-observer variability of manual delineation of lesions and organ contours in CT. • The variability of manual delineations between two observers can be significant. Two and even three observers capture only a fraction of the full range of inter-observer variability observed in common practice. • Inter-observer manual delineation variability is necessary to establish a reference standard for radiologist training and evaluation and for the evaluation of automatic segmentation algorithms.
量化 CT 中手动勾画病变和器官轮廓的观察者间变异性,为临床决策和评估自动分割算法的容积测量建立参考标准。
11 名放射科医生在 490 张临床 CT 扫描切片上手动勾画了 3193 个肝肿瘤(896 个)、肺肿瘤(1085 个)、肾轮廓(434 个)和脑血肿(497 个)的轮廓。然后对这些勾画结果进行了对比分析,使用标准体积指标和新的组间指标来量化观察者间勾画的变异性。
两名观察者勾画结果的平均体积重叠变异性值及其范围(%)为:肝肿瘤 17.8 [-5.8,+7.2]%,肺肿瘤 20.8 [-8.8,+10.2]%,肾轮廓 8.8 [-0.8,+1.2]%和脑血肿 18 [-6.0,+6.0]%。对于任何两个随机选择的观察者,平均勾画体积重叠变异性为 5-57%。由两名、三名和五名观察者组成的组的平均变异性分别为 37%、53%和 72%;8 名观察者占总变异性的 75-94%。对于所有病例,38.5%的勾画不一致是由于单个观察者的勾画部分与其他观察者不一致。根据专家经验,观察者之间的勾画变异性没有统计学差异。
不同结构和观察者的手动勾画变异性很大,在各种结构和病变中范围广泛。两名甚至三名观察者可能不足以确定观察者间变异性的全部范围。
本研究量化了 CT 中病变和器官轮廓的手动勾画的观察者间变异性。
两名观察者之间的手动勾画变异性可能很大。两名甚至三名观察者仅能捕捉到在实际工作中观察到的观察者间变异性的一部分。
观察者间手动勾画变异性对于建立放射科医生培训和评估以及评估自动分割算法的参考标准是必要的。