Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.
Department of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.
Med Phys. 2021 May;48(5):2468-2481. doi: 10.1002/mp.14782. Epub 2021 Mar 16.
To develop a two-stage three-dimensional (3D) convolutional neural networks (CNNs) for fully automated volumetric segmentation of pancreas on computed tomography (CT) and to further evaluate its performance in the context of intra-reader and inter-reader reliability at full dose and reduced radiation dose CTs on a public dataset.
A dataset of 1994 abdomen CT scans (portal venous phase, slice thickness ≤ 3.75-mm, multiple CT vendors) was curated by two radiologists (R1 and R2) to exclude cases with pancreatic pathology, suboptimal image quality, and image artifacts (n = 77). Remaining 1917 CTs were equally allocated between R1 and R2 for volumetric pancreas segmentation [ground truth (GT)]. This internal dataset was randomly divided into training (n = 1380), validation (n = 248), and test (n = 289) sets for the development of a two-stage 3D CNN model based on a modified U-net architecture for automated volumetric pancreas segmentation. Model's performance for pancreas segmentation and the differences in model-predicted pancreatic volumes vs GT volumes were compared on the test set. Subsequently, an external dataset from The Cancer Imaging Archive (TCIA) that had CT scans acquired at standard radiation dose and same scans reconstructed at a simulated 25% radiation dose was curated (n = 41). Volumetric pancreas segmentation was done on this TCIA dataset by R1 and R2 independently on the full dose and then at the reduced radiation dose CT images. Intra-reader and inter-reader reliability, model's segmentation performance, and reliability between model-predicted pancreatic volumes at full vs reduced dose were measured. Finally, model's performance was tested on the benchmarking National Institute of Health (NIH)-Pancreas CT (PCT) dataset.
Three-dimensional CNN had mean (SD) Dice similarity coefficient (DSC): 0.91 (0.03) and average Hausdorff distance of 0.15 (0.09) mm on the test set. Model's performance was equivalent between males and females (P = 0.08) and across different CT slice thicknesses (P > 0.05) based on noninferiority statistical testing. There was no difference in model-predicted and GT pancreatic volumes [mean predicted volume 99 cc (31cc); GT volume 101 cc (33 cc), P = 0.33]. Mean pancreatic volume difference was -2.7 cc (percent difference: -2.4% of GT volume) with excellent correlation between model-predicted and GT volumes [concordance correlation coefficient (CCC)=0.97]. In the external TCIA dataset, the model had higher reliability than R1 and R2 on full vs reduced dose CT scans [model mean (SD) DSC: 0.96 (0.02), CCC = 0.995 vs R1 DSC: 0.83 (0.07), CCC = 0.89, and R2 DSC:0.87 (0.04), CCC = 0.97]. The DSC and volume concordance correlations for R1 vs R2 (inter-reader reliability) were 0.85 (0.07), CCC = 0.90 at full dose and 0.83 (0.07), CCC = 0.96 at reduced dose datasets. There was good reliability between model and R1 at both full and reduced dose CT [full dose: DSC: 0.81 (0.07), CCC = 0.83 and reduced dose DSC:0.81 (0.08), CCC = 0.87]. Likewise, there was good reliability between model and R2 at both full and reduced dose CT [full dose: DSC: 0.84 (0.05), CCC = 0.89 and reduced dose DSC:0.83(0.06), CCC = 0.89]. There was no difference in model-predicted and GT pancreatic volume in TCIA dataset (mean predicted volume 96 cc (33); GT pancreatic volume 89 cc (30), p = 0.31). Model had mean (SD) DSC: 0.89 (0.04) (minimum-maximum DSC: 0.79 -0.96) on the NIH-PCT dataset.
A 3D CNN developed on the largest dataset of CTs is accurate for fully automated volumetric pancreas segmentation and is generalizable across a wide range of CT slice thicknesses, radiation dose, and patient gender. This 3D CNN offers a scalable tool to leverage biomarkers from pancreas morphometrics and radiomics for pancreatic diseases including for early pancreatic cancer detection.
开发一种用于在 CT 上全自动进行胰腺容积分割的两阶段三维(3D)卷积神经网络(CNN),并进一步评估其在公共数据集上全剂量和低剂量 CT 下的内读者和外读者可靠性方面的性能。
通过两位放射科医生(R1 和 R2)对 1994 例腹部 CT 扫描(门静脉期,层厚≤3.75mm,多个 CT 供应商)进行了分析,排除了存在胰腺病理、图像质量不佳和图像伪影的病例(n=77)。将剩余的 1917 例 CT 扫描随机分配给 R1 和 R2 进行胰腺容积分割[金标准(GT)]。该内部数据集被随机分为训练集(n=1380)、验证集(n=248)和测试集(n=289),用于开发基于改进的 U-net 架构的两阶段 3D CNN 模型,用于自动进行胰腺容积分割。在测试集上比较模型的胰腺分割性能以及模型预测的胰腺体积与 GT 体积之间的差异。随后,对来自癌症成像档案(TCIA)的外部数据集进行了分析,该数据集包含在标准辐射剂量下采集的 CT 扫描,以及在模拟的 25%辐射剂量下重建的相同扫描。R1 和 R2 独立地对全剂量和低剂量 CT 图像进行了 TCIA 数据集的胰腺容积分割。测量内读者和外读者的可靠性、模型的分割性能以及全剂量与低剂量之间模型预测的胰腺体积之间的可靠性。最后,在基准 NIH-胰腺 CT(PCT)数据集上测试了模型的性能。
3D CNN 在测试集上的平均(标准差)Dice 相似系数(DSC)为 0.91(0.03),Hausdorff 距离的平均值为 0.15(0.09)mm。基于非劣效性统计检验,模型在男性和女性之间(P=0.08)以及不同 CT 层厚之间(P>0.05)的性能相当。模型预测的胰腺体积与 GT 体积之间没有差异[平均预测体积 99cc(31cc);GT 体积 101cc(33cc),P=0.33]。平均胰腺体积差异为-2.7cc(GT 体积的百分比差异:-2.4%),模型预测的胰腺体积与 GT 体积之间具有极好的相关性[一致性相关系数(CCC)=0.97]。在外部 TCIA 数据集上,与 R1 和 R2 相比,模型在全剂量和低剂量 CT 扫描上的可靠性更高[模型平均(标准差)DSC:0.96(0.02),CCC=0.995 与 R1 DSC:0.83(0.07),CCC=0.89 和 R2 DSC:0.87(0.04),CCC=0.97]。R1 与 R2 之间(内读者可靠性)的 DSC 和体积一致性相关性分别为 0.85(0.07),CCC=0.90 在全剂量数据集和 0.83(0.07),CCC=0.96 在低剂量数据集。模型在全剂量和低剂量 CT 下与 R1 之间具有良好的可靠性[全剂量:DSC:0.81(0.07),CCC=0.83 和低剂量 DSC:0.81(0.08),CCC=0.87]。同样,模型与 R2 在全剂量和低剂量 CT 下也具有良好的可靠性[全剂量:DSC:0.84(0.05),CCC=0.89 和低剂量 DSC:0.83(0.06),CCC=0.89]。在 TCIA 数据集上,模型预测的胰腺体积与 GT 胰腺体积之间没有差异(平均预测体积 96cc(33);GT 胰腺体积 89cc(30),p=0.31)。模型在 NIH-PCT 数据集上的平均(标准差)DSC 为 0.89(0.04)(最小-最大值 DSC:0.79-0.96)。
在最大的 CT 数据集上开发的 3D CNN 可准确地进行全自动胰腺容积分割,并且可以在广泛的 CT 层厚、辐射剂量和患者性别范围内推广。该 3D CNN 提供了一种可扩展的工具,可以利用胰腺形态计量学和放射组学的生物标志物来进行胰腺疾病的早期检测,包括早期胰腺癌的检测。