预测甲状腺微小乳头状癌的中央淋巴结转移：可解释机器学习的一项突破

Predicting central lymph node metastasis in papillary thyroid microcarcinoma: a breakthrough with interpretable machine learning.

作者信息

Zhou Weijun, Li Lijuan, Hao Xiaowen, Wu Lanying, Liu Lifu, Zheng Binyu, Xia Yangzheng, Liu Yong

机构信息

Department of Ultrasound, Beijing Shijitan Hospital, Capital Medical University, Beijing, China.

出版信息

Front Endocrinol (Lausanne). 2025 May 12;16:1537386. doi: 10.3389/fendo.2025.1537386. eCollection 2025.

DOI:10.3389/fendo.2025.1537386

PMID:40421246

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12104047/

Abstract

OBJECTIVE

To develop and validate an interpretable machine learning (ML) model for the preoperative prediction of central lymph node metastasis (CLNM) in papillary thyroid microcarcinoma (PTMC).

METHODS

From December 2016 to December 2023, we retrospectively analyzed 710 PTMC patients who underwent thyroidectomies. Feature selection was conducted using the least absolute shrinkage and selection operator (LASSO) regression method, alongside the Support Vector Machine-Recursive Feature Elimination (SVM-RFE) algorithm in conjunction with multivariate logistic regression. Eight ML algorithms, namely Decision Tree, Random Forest (RF), K-nearest neighbors, Support vector machine, Extreme Gradient Boosting, Naive Bayes, Logistic regression, and Light Gradient Boosting machine, were developed for the prediction of CLNM. The performance of these models was evaluated using area under the receiver operating characteristic curve (AUC), decision curve analysis (DCA), sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and F1 scores. Additionally, the Shapley Additive Explanation (SHAP) algorithm was utilized to clarify the results of the optimal ML model.

RESULTS

The results indicated that 32.95% of the patients (234/710) presented with CLNM. Tumor diameter, multifocality, lymph nodes identified via ultrasound (US-LN), and extrathyroidal extension (ETE) were identified as independent predictors of CLNM. The RF model achieved the highest performance in the validation set with an AUC of 0.893(95%CI: 0.846-0.940), accuracy of 0.832, sensitivity of 0.764, specificity of 0.866, PPV of 0.743, NPV of 0.879, and F1-score of 0.753. Furthermore, the DCA demonstrated that the RF model exhibited a superior clinical net benefit.

CONCLUSION

Our model predicted the risk of CLNM in PTMC patients with high accuracy preoperatively.

摘要

目的

开发并验证一种可解释的机器学习（ML）模型，用于术前预测甲状腺微小乳头状癌（PTMC）的中央淋巴结转移（CLNM）。

方法

2016年12月至2023年12月，我们回顾性分析了710例行甲状腺切除术的PTMC患者。使用最小绝对收缩和选择算子（LASSO）回归方法进行特征选择，同时结合支持向量机递归特征消除（SVM-RFE）算法和多变量逻辑回归。开发了八种ML算法，即决策树、随机森林（RF）、K近邻、支持向量机、极端梯度提升、朴素贝叶斯、逻辑回归和轻梯度提升机，用于预测CLNM。使用受试者操作特征曲线下面积（AUC）、决策曲线分析（DCA）、敏感性、特异性、准确性、阳性预测值（PPV）、阴性预测值（NPV）和F1分数评估这些模型的性能。此外，使用Shapley加法解释（SHAP）算法来阐明最佳ML模型的结果。

结果

结果表明，32.95%的患者（234/710）出现CLNM。肿瘤直径、多灶性、超声识别的淋巴结（US-LN）和甲状腺外侵犯（ETE）被确定为CLNM的独立预测因素。RF模型在验证集中表现最佳，AUC为0.893（95%CI：0.846-0.940），准确性为0.832，敏感性为0.764，特异性为0.866，PPV为0.743，NPV为0.879，F1分数为0.753。此外，DCA表明RF模型具有更高的临床净效益。