Department of Radiology, Mayo Clinic, Rochester, Minnesota.
Division of Medical Imaging Technology Services, Mayo Clinic, Rochester, Minnesota.
Gastroenterology. 2023 Dec;165(6):1533-1546.e4. doi: 10.1053/j.gastro.2023.08.034. Epub 2023 Aug 30.
BACKGROUND & AIMS: The aims of our case-control study were (1) to develop an automated 3-dimensional (3D) Convolutional Neural Network (CNN) for detection of pancreatic ductal adenocarcinoma (PDA) on diagnostic computed tomography scans (CTs), (2) evaluate its generalizability on multi-institutional public data sets, (3) its utility as a potential screening tool using a simulated cohort with high pretest probability, and (4) its ability to detect visually occult preinvasive cancer on prediagnostic CTs.
A 3D-CNN classification system was trained using algorithmically generated bounding boxes and pancreatic masks on a curated data set of 696 portal phase diagnostic CTs with PDA and 1080 control images with a nonneoplastic pancreas. The model was evaluated on (1) an intramural hold-out test subset (409 CTs with PDA, 829 controls); (2) a simulated cohort with a case-control distribution that matched the risk of PDA in glycemically defined new-onset diabetes, and Enriching New-Onset Diabetes for Pancreatic Cancer score ≥3; (3) multi-institutional public data sets (194 CTs with PDA, 80 controls), and (4) a cohort of 100 prediagnostic CTs (i.e., CTs incidentally acquired 3-36 months before clinical diagnosis of PDA) without a focal mass, and 134 controls.
Of the CTs in the intramural test subset, 798 (64%) were from other hospitals. The model correctly classified 360 CTs (88%) with PDA and 783 control CTs (94%), with a mean accuracy 0.92 (95% CI, 0.91-0.94), area under the receiver operating characteristic (AUROC) curve of 0.97 (95% CI, 0.96-0.98), sensitivity of 0.88 (95% CI, 0.85-0.91), and specificity of 0.95 (95% CI, 0.93-0.96). Activation areas on heat maps overlapped with the tumor in 350 of 360 CTs (97%). Performance was high across tumor stages (sensitivity of 0.80, 0.87, 0.95, and 1.0 on T1 through T4 stages, respectively), comparable for hypodense vs isodense tumors (sensitivity: 0.90 vs 0.82), different age, sex, CT slice thicknesses, and vendors (all P > .05), and generalizable on both the simulated cohort (accuracy, 0.95 [95% 0.94-0.95]; AUROC curve, 0.97 [95% CI, 0.94-0.99]) and public data sets (accuracy, 0.86 [95% CI, 0.82-0.90]; AUROC curve, 0.90 [95% CI, 0.86-0.95]). Despite being exclusively trained on diagnostic CTs with larger tumors, the model could detect occult PDA on prediagnostic CTs (accuracy, 0.84 [95% CI, 0.79-0.88]; AUROC curve, 0.91 [95% CI, 0.86-0.94]; sensitivity, 0.75 [95% CI, 0.67-0.84]; and specificity, 0.90 [95% CI, 0.85-0.95]) at a median 475 days (range, 93-1082 days) before clinical diagnosis.
This automated artificial intelligence model trained on a large and diverse data set shows high accuracy and generalizable performance for detection of PDA on diagnostic CTs as well as for visually occult PDA on prediagnostic CTs. Prospective validation with blood-based biomarkers is warranted to assess the potential for early detection of sporadic PDA in high-risk individuals.
我们的这项病例对照研究的目的是:(1) 开发一种自动化的三维(3D)卷积神经网络(CNN),用于在诊断性计算机断层扫描(CT)上检测胰腺导管腺癌(PDA);(2) 在多机构公共数据集上评估其泛化能力;(3) 利用具有高术前概率的模拟队列作为潜在的筛查工具的效用;以及 (4) 检测预测前 CT 上肉眼不可见的癌前病变。
使用算法生成的边界框和胰腺掩模,在一个经过精心整理的数据集上训练 3D-CNN 分类系统,该数据集包含 696 例有 PDA 的门静脉期诊断 CT 和 1080 例无胰腺肿瘤的对照图像。该模型在(1)内部保留测试子集(409 例有 PDA 的 CT 和 829 例对照)上进行了评估;(2)模拟队列,该队列的病例对照分布与新诊断的糖尿病患者的 PDA 风险相匹配,且糖化定义的新发糖尿病患者的 Enriching New-Onset Diabetes for Pancreatic Cancer 评分≥3;(3)多机构公共数据集(194 例有 PDA 的 CT 和 80 例对照);以及(4)100 例预测前 CT (即在临床诊断 PDA 前 3-36 个月偶然获得的 CT)和 134 例对照,这些 CT 没有局灶性肿块。
内部测试子集中的 CT 有 798 例(64%)来自其他医院。该模型正确分类了 360 例有 PDA 的 CT(88%)和 783 例对照 CT(94%),平均准确率为 0.92(95%CI,0.91-0.94),接受者操作特征(ROC)曲线下面积(AUROC)为 0.97(95%CI,0.96-0.98),敏感性为 0.88(95%CI,0.85-0.91),特异性为 0.95(95%CI,0.93-0.96)。在 360 例 CT 中有 350 例(97%)的热图激活区域与肿瘤重叠。在各个肿瘤分期(T1 至 T4 分期的敏感性分别为 0.80、0.87、0.95 和 1.0)、低对比与等对比肿瘤(敏感性:0.90 与 0.82)、不同年龄、性别、CT 切片厚度和供应商(所有 P >.05)中性能均较高,且在模拟队列(准确率,0.95 [95%CI,0.94-0.95];AUROC 曲线,0.97 [95%CI,0.94-0.99])和公共数据集(准确率,0.86 [95%CI,0.82-0.90];AUROC 曲线,0.90 [95%CI,0.86-0.95])上均可泛化。尽管该模型仅在具有较大肿瘤的诊断性 CT 上进行了训练,但它可以检测预测前 CT 上的隐匿性 PDA(准确率,0.84 [95%CI,0.79-0.88];AUROC 曲线,0.91 [95%CI,0.86-0.94];敏感性,0.75 [95%CI,0.67-0.84];特异性,0.90 [95%CI,0.85-0.95]),其预测前中位数时间为 475 天(范围为 93-1082 天),早于临床诊断。
该基于深度学习的人工智能模型在一个大型且多样化的数据集上进行了训练,在诊断性 CT 上检测 PDA 以及在预测前 CT 上检测肉眼不可见的癌前病变方面具有较高的准确性和泛化能力。需要进行前瞻性验证,以评估血液生物标志物在高危人群中检测散发性 PDA 的潜在可能性。