Beichel Reinhard R, Smith Brian J, Bauer Christian, Ulrich Ethan J, Ahmadvand Payam, Budzevich Mikalai M, Gillies Robert J, Goldgof Dmitry, Grkovski Milan, Hamarneh Ghassan, Huang Qiao, Kinahan Paul E, Laymon Charles M, Mountz James M, Muzi John P, Muzi Mark, Nehmeh Sadek, Oborski Matthew J, Tan Yongqiang, Zhao Binsheng, Sunderland John J, Buatti John M
Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA, USA.
Department of Internal Medicine, The University of Iowa, Iowa City, IA, USA.
Med Phys. 2017 Feb;44(2):479-496. doi: 10.1002/mp.12041.
Radiomics utilizes a large number of image-derived features for quantifying tumor characteristics that can in turn be correlated with response and prognosis. Unfortunately, extraction and analysis of such image-based features is subject to measurement variability and bias. The challenge for radiomics is particularly acute in Positron Emission Tomography (PET) where limited resolution, a high noise component related to the limited stochastic nature of the raw data, and the wide variety of reconstruction options confound quantitative feature metrics. Extracted feature quality is also affected by tumor segmentation methods used to define regions over which to calculate features, making it challenging to produce consistent radiomics analysis results across multiple institutions that use different segmentation algorithms in their PET image analysis. Understanding each element contributing to these inconsistencies in quantitative image feature and metric generation is paramount for ultimate utilization of these methods in multi-institutional trials and clinical oncology decision making.
To assess segmentation quality and consistency at the multi-institutional level, we conducted a study of seven institutional members of the National Cancer Institute Quantitative Imaging Network. For the study, members were asked to segment a common set of phantom PET scans acquired over a range of imaging conditions as well as a second set of head and neck cancer (HNC) PET scans. Segmentations were generated at each institution using their preferred approach. In addition, participants were asked to repeat segmentations with a time interval between initial and repeat segmentation. This procedure resulted in overall 806 phantom insert and 641 lesion segmentations. Subsequently, the volume was computed from the segmentations and compared to the corresponding reference volume by means of statistical analysis.
On the two test sets (phantom and HNC PET scans), the performance of the seven segmentation approaches was as follows. On the phantom test set, the mean relative volume errors ranged from 29.9 to 87.8% of the ground truth reference volumes, and the repeat difference for each institution ranged between -36.4 to 39.9%. On the HNC test set, the mean relative volume error ranged between -50.5 to 701.5%, and the repeat difference for each institution ranged between -37.7 to 31.5%. In addition, performance measures per phantom insert/lesion size categories are given in the paper. On phantom data, regression analysis resulted in coefficient of variation (CV) components of 42.5% for scanners, 26.8% for institutional approaches, 21.1% for repeated segmentations, 14.3% for relative contrasts, 5.3% for count statistics (acquisition times), and 0.0% for repeated scans. Analysis showed that the CV components for approaches and repeated segmentations were significantly larger on the HNC test set with increases by 112.7% and 102.4%, respectively.
Analysis results underline the importance of PET scanner reconstruction harmonization and imaging protocol standardization for quantification of lesion volumes. In addition, to enable a distributed multi-site analysis of FDG PET images, harmonization of analysis approaches and operator training in combination with highly automated segmentation methods seems to be advisable. Future work will focus on quantifying the impact of segmentation variation on radiomics system performance.
放射组学利用大量从图像中提取的特征来量化肿瘤特征,进而将这些特征与疗效和预后相关联。不幸的是,此类基于图像的特征的提取和分析容易受到测量变异性和偏差的影响。在正电子发射断层扫描(PET)中,放射组学面临的挑战尤为严峻,因为PET分辨率有限、与原始数据有限的随机性质相关的高噪声成分以及多种多样的重建选项都会混淆定量特征指标。提取的特征质量还受到用于定义计算特征区域的肿瘤分割方法的影响,这使得在多个机构的PET图像分析中使用不同分割算法时,难以产生一致的放射组学分析结果。了解导致定量图像特征和指标生成中这些不一致的每个因素,对于在多机构试验和临床肿瘤学决策中最终利用这些方法至关重要。
为了在多机构层面评估分割质量和一致性,我们对美国国立癌症研究所定量成像网络的七个机构成员进行了一项研究。在该研究中,要求成员对在一系列成像条件下获取的一组通用的体模PET扫描以及第二组头颈癌(HNC)PET扫描进行分割。每个机构使用其首选方法生成分割结果。此外,要求参与者在初始分割和重复分割之间设置一个时间间隔后重复进行分割。这一过程总共产生了806个体模插入物分割和641个病灶分割。随后,根据分割结果计算体积,并通过统计分析将其与相应的参考体积进行比较。
在两个测试集(体模和HNC PET扫描)上,七种分割方法的表现如下。在体模测试集上,平均相对体积误差范围为真实参考体积的29.9%至87.8%,每个机构的重复差异范围在 -36.4%至39.9%之间。在HNC测试集上,平均相对体积误差范围在 -50.5%至701.5%之间,每个机构的重复差异范围在 -37.7%至31.5%之间。此外,论文中给出了每个体模插入物/病灶大小类别的性能指标。在体模数据上,回归分析得出的变异系数(CV)成分如下:扫描仪为42.5%,机构方法为26.8%,重复分割为21.1%,相对对比度为14.3%,计数统计(采集时间)为5.3%,重复扫描为0.0%。分析表明,在HNC测试集上,方法和重复分割的CV成分显著更大,分别增加了%和%。
分析结果强调了PET扫描仪重建协调和成像协议标准化对于病灶体积量化的重要性。此外,为了实现FDG PET图像的分布式多中心分析,分析方法的协调和操作员培训与高度自动化的分割方法相结合似乎是可取的。未来的工作将集中于量化分割变异对放射组学系统性能的影响。 (注:原文中“分析表明,在HNC测试集上,方法和重复分割的CV成分显著更大,分别增加了%和%。”这里两个百分比原文缺失具体数字)