Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey.
Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey.
Eur J Radiol. 2023 Aug;165:110893. doi: 10.1016/j.ejrad.2023.110893. Epub 2023 May 26.
To evaluate the reliability of consensus-based segmentation in terms of reproducibility of radiomic features.
In this retrospective study, three tumor data sets were investigated: breast cancer (n = 30), renal cell carcinoma (n = 30), and pituitary macroadenoma (n = 30). MRI was utilized for breast and pituitary data sets, while CT was used for renal data set. 12 readers participated in the segmentation process. Consensus segmentation was created by making corrections on a previous region or volume of interest. Four experiments were designed to evaluate the reproducibility of radiomic features. Reliability was assessed with intraclass correlation coefficient (ICC) with two cut-off values: 0.75 and 0.9.
Considering the lower bound of the 95% confidence interval and the ICC threshold of 0.90, at least 61% of the radiomic features were not reproducible in the inter-consensus analysis. In the susceptibility experiment, at least half (54%) became non-reproducible when the first reader is replaced with a different reader. In the intra-consensus analysis, at least about one-third (32%) were non-reproducible when the same second reader segmented the image over the same first reader two weeks later. Compared to inter-reader analysis based on independent single readers, the inter-consensus analysis did not statistically significantly improve the rates of reproducible features in all data sets and analyses.
Despite the positive connotation of the word "consensus", it is essential to REMIND that consensus-based segmentation has significant reproducibility issues. Therefore, the usage of consensus-based segmentation alone should be avoided unless a reliability analysis is performed, even if it is not practical in clinical settings.
评估基于共识的分割在放射组学特征可重复性方面的可靠性。
本回顾性研究共纳入三个肿瘤数据集:乳腺癌(n=30)、肾细胞癌(n=30)和垂体大腺瘤(n=30)。乳腺和垂体数据集采用 MRI 检查,肾数据集采用 CT 检查。共有 12 名读者参与了分割过程。通过对上一个感兴趣区或感兴趣区的校正来创建共识分割。设计了四项实验来评估放射组学特征的可重复性。采用组内相关系数(ICC)评估可靠性,设定两个截断值:0.75 和 0.9。
考虑到 95%置信区间下限和 ICC 阈值为 0.90,在组间一致性分析中,至少有 61%的放射组学特征不可重复。在易感性实验中,当第一位读者被替换为不同的读者时,至少有一半(54%)变得不可重复。在组内一致性分析中,当同一位第二位读者在两周后对同一位第一位读者的图像进行分割时,至少有三分之一(32%)变得不可重复。与基于独立单个读者的组间读者分析相比,在所有数据集和分析中,组间一致性分析并没有显著提高可重复特征的比率。
尽管“共识”这个词带有积极的含义,但有必要提醒的是,基于共识的分割存在显著的可重复性问题。因此,除非进行可靠性分析,否则应避免单独使用基于共识的分割,即使在临床环境中不可行。