用于预测甲状腺良恶性结节并进行叶分割的非增强计算机断层扫描影像组学模型：一项双中心研究。

Non-contrast computed tomography radiomics model to predict benign and malignant thyroid nodules with lobe segmentation: A dual-center study.

作者信息

Wang Hao, Wang Xuan, Du Yu-Sheng, Wang You, Bai Zhuo-Jie, Wu Di, Tang Wu-Liang, Zeng Han-Ling, Tao Jing, He Jian

机构信息

Department of Radiology, The Fourth Affiliated Hospital of Nanjing Medical University, Nanjing 210031, Jiangsu Province, China.

Department of Radiology, Zhongda Hospital Southeast University (Jiangbei), Nanjing 210048, Jiangsu Province, China.

出版信息

World J Radiol. 2025 Jun 28;17(6):106682. doi: 10.4329/wjr.v17.i6.106682.

DOI:10.4329/wjr.v17.i6.106682

PMID:40606044

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12210200/

Abstract

BACKGROUND

Accurate preoperative differentiation of benign and malignant thyroid nodules is critical for optimal patient management. However, conventional imaging modalities present inherent diagnostic limitations.

AIM

To develop a non-contrast computed tomography-based machine learning model integrating radiomics and clinical features for preoperative thyroid nodule classification.

METHODS

This multicenter retrospective study enrolled 272 patients with thyroid nodules (376 thyroid lobes) from center A (May 2021-April 2024), using histopathological findings as the reference standard. The dataset was stratified into a training cohort (264 lobes) and an internal validation cohort (112 lobes). Additional prospective temporal (97 lobes, May-August 2024, center A) and external multicenter (81 lobes, center B) test cohorts were incorporated to enhance generalizability. Thyroid lobes were segmented along the isthmus midline, with segmentation reliability confirmed by an intraclass correlation coefficient (≥ 0.80). Radiomics feature extraction was performed using Pearson correlation analysis followed by least absolute shrinkage and selection operator regression with 10-fold cross-validation. Seven machine learning algorithms were systematically evaluated, with model performance quantified through the area under the receiver operating characteristic curve (AUC), Brier score, decision curve analysis, and DeLong test for comparison with radiologists interpretations. Model interpretability was elucidated using SHapley Additive exPlanations (SHAP).

RESULTS

The extreme gradient boosting model demonstrated robust diagnostic performance across all datasets, achieving AUCs of 0.899 [95% confidence interval (CI): 0.845-0.932] in the training cohort, 0.803 (95%CI: 0.715-0.890) in internal validation, 0.855 (95%CI: 0.775-0.935) in temporal testing, and 0.802 (95%CI: 0.664-0.939) in external testing. These results were significantly superior to radiologists assessments (AUCs: 0.596, 0.529, 0.558, and 0.538, respectively; < 0.001 by DeLong test). SHAP analysis identified radiomic score, age, tumor size stratification, calcification status, and cystic components as key predictive features. The model exhibited excellent calibration (Brier scores: 0.125-0.144) and provided significant clinical net benefit at decision thresholds exceeding 20%, as evidenced by decision curve analysis.

CONCLUSION

The non-contrast computed tomography-based radiomics-clinical fusion model enables robust preoperative thyroid nodule classification, with SHAP-driven interpretability enhancing its clinical applicability for personalized decision-making.

摘要

背景

准确的甲状腺结节术前良恶性鉴别对于优化患者管理至关重要。然而，传统成像方式存在固有的诊断局限性。

目的

开发一种基于非增强计算机断层扫描的机器学习模型，整合放射组学和临床特征以进行甲状腺结节术前分类。

方法

这项多中心回顾性研究纳入了中心A（2021年5月至2024年4月）的272例甲状腺结节患者（376个甲状腺叶），以组织病理学结果作为参考标准。数据集被分层为训练队列（264个叶）和内部验证队列（112个叶）。另外纳入了前瞻性时间队列（97个叶，2024年5月至8月，中心A）和外部多中心队列（81个叶，中心B）以提高模型的通用性。沿峡部中线对甲状腺叶进行分割，通过组内相关系数（≥0.80）确认分割的可靠性。使用Pearson相关分析进行放射组学特征提取，随后进行最小绝对收缩和选择算子回归及10折交叉验证。系统评估了七种机器学习算法，通过受试者操作特征曲线下面积（AUC）、Brier评分、决策曲线分析和DeLong检验量化模型性能，以与放射科医生的解读进行比较。使用SHapley加性解释（SHAP）阐明模型的可解释性。

结果

极端梯度提升模型在所有数据集中均表现出强大的诊断性能，在训练队列中的AUC为0.899[95%置信区间（CI）：0.845 - 0.932]，内部验证中为0.803（95%CI：0.715 - 0.890），时间测试中为0.855（95%CI：0.775 - 0.935），外部测试中为0.802（95%CI：0.664 - 0.939）。这些结果显著优于放射科医生的评估（AUC分别为：0.596、0.529、0.558和0.538；DeLong检验，P < 0.001）。SHAP分析确定放射组学评分、年龄、肿瘤大小分层、钙化状态和囊性成分是关键预测特征。该模型表现出良好的校准（Brier评分：0.125 - 0.144），并且在决策阈值超过20%时提供了显著的临床净效益，决策曲线分析证明了这一点。