Zhang Xian-Ya, Zhang Di, Wang Zhi-Yuan, Chen Jun, Ren Jia-Yu, Ma Ting, Lin Jian-Jun, Dietrich Christoph F, Cui Xin-Wu
Department of Medical Ultrasound, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
Department of Medical Ultrasound, The First Affiliated Hospital of Anhui Medical University, Hefei, China.
Med Phys. 2025 Jan;52(1):257-273. doi: 10.1002/mp.17498. Epub 2024 Oct 30.
Accurate preoperative prediction of cervical lymph node metastasis (LNM) for papillary thyroid carcinoma (PTC) patients is essential for disease staging and individualized treatment planning, which can improve prognosis and facilitate better management.
To establish a fully automated deep learning-enabled model (FADLM) for automated tumor segmentation and cervical LNM prediction in PTC using ultrasound (US) video keyframes.
The bicentral study retrospective enrolled 518 PTC patients, who were then randomly divided into the training (Hospital 1, n = 340), internal test (Hospital 1, n = 83), and external test cohorts (Hospital 2, n = 95). The FADLM integrated mask region-based convolutional neural network (Mask R-CNN) for automatic thyroid primary tumor segmentation and ResNet34 with Bayes strategy for cervical LNM diagnosis. A radiomics model (RM) using the same automated segmentation method, a traditional radiomics model (TRM) using manual segmentation, and a clinical-semantic model (CSM) were developed for comparison. The dice similarity coefficient (DSC) was used to evaluate segmentation performance. The prediction performance of the models was validated in terms of discrimination and clinical utility with the area under the receiver operator characteristic curve (AUC), heatmap analysis, and decision curve analysis (DCA). The comparison of the predictive performance among different models was conducted by DeLong test. The performances of two radiologists compared with FADLM and the diagnostic augmentation with FADLM's assistance were analyzed in terms of accuracy, sensitivity and specificity using McNemar's x test. The p-value less than 0.05 was defined as a statistically significant difference. The Benjamini-Hochberg procedure was applied for multiple comparisons to deal with Type I error.
The FADLM yielded promising segmentation results in training (DSC: 0.88 ± 0.23), internal test (DSC: 0.88 ± 0.23), and external test cohorts (DSC: 0.85 ± 0.24). The AUCs of FADLM for cervical LNM prediction were 0.78 (95% CI: 0.73, 0.83), 0.83 (95% CI: 0.74, 0.92), and 0.83 (95% CI: 0.75, 0.92), respectively. It all significantly outperformed the RM (AUCs: 0.78 vs. 0.72; 0.83 vs. 0.65; 0.83 vs. 0.68, all adjusted p-values < 0.05) and CSM (AUCs: 0.78 vs. 0.71; 0.83 vs. 0.62; 0.83 vs. 0.68, all adjusted p-values < 0.05) across the three cohorts. The RM offered similar performance to that of TRM (AUCs: 0.61 vs. 0.63, adjusted p-value = 0.60) while significantly reducing the segmentation time (3.3 ± 3.8 vs. 14.1 ± 4.2 s, p-value < 0.001). Under the assistance of FADLM, the accuracies of junior and senior radiologists were improved by 18% and 15% (all adjusted p-values < 0.05) and the sensitivities by 25% and 21% (all adjusted p-values < 0.05) in the external test cohort.
The FADLM with elaborately designed automated strategy using US video keyframes holds good potential to provide an efficient and consistent prediction of cervical LNM in PTC. The FADLM displays superior performance to RM, CSM, and radiologists with promising efficacy.
准确术前预测甲状腺乳头状癌(PTC)患者的颈部淋巴结转移(LNM)对于疾病分期和个体化治疗规划至关重要,这可以改善预后并促进更好的管理。
建立一种基于深度学习的全自动模型(FADLM),用于使用超声(US)视频关键帧对PTC进行自动肿瘤分割和颈部LNM预测。
该双中心研究回顾性纳入了518例PTC患者,然后将其随机分为训练组(医院1,n = 340)、内部测试组(医院1,n = 83)和外部测试组(医院2,n = 95)。FADLM集成了基于掩码区域的卷积神经网络(Mask R-CNN)用于自动甲状腺原发肿瘤分割,以及带有贝叶斯策略的ResNet34用于颈部LNM诊断。开发了使用相同自动分割方法的放射组学模型(RM)、使用手动分割的传统放射组学模型(TRM)和临床语义模型(CSM)进行比较。使用骰子相似系数(DSC)评估分割性能。通过受试者操作特征曲线下面积(AUC)、热图分析和决策曲线分析(DCA)在鉴别和临床实用性方面验证模型的预测性能。通过DeLong检验对不同模型的预测性能进行比较。使用McNemar's x检验从准确性、敏感性和特异性方面分析两名放射科医生与FADLM比较的性能以及FADLM辅助下的诊断增强。p值小于0.05被定义为具有统计学显著差异。应用Benjamini-Hochberg程序进行多重比较以处理I型错误。
FADLM在训练组(DSC:0.88±0.23)、内部测试组(DSC:0.88±0.23)和外部测试组(DSC:0.85±0.24)中产生了有前景的分割结果。FADLM用于颈部LNM预测的AUC分别为0.78(95%CI:0.73,0.83)、0.83(95%CI:0.74,0.92)和0.83(95%CI:0.75,0.92)。在所有三个队列中,它均显著优于RM(AUC:0.78对0.72;0.83对0.65;0.83对0.68,所有调整后p值<0.05)和CSM(AUC:0.78对0.71;0.83对0.62;0.83对0.68,所有调整后p值<0.05)。RM提供了与TRM相似的性能(AUC:0.61对0.63,调整后p值=0.60),同时显著减少了分割时间(3.3±3.8对14.1±4.2秒,p值<0.001)。在FADLM的辅助下,外部测试组中初级和高级放射科医生的准确性分别提高了18%和15%(所有调整后p值<0.05),敏感性分别提高了25%和21%(所有调整后p值<0.05)。
采用精心设计的使用US视频关键帧的自动策略的FADLM具有为PTC颈部LNM提供高效且一致预测的良好潜力。FADLM表现出优于RM、CSM和放射科医生的性能,具有有前景的疗效。