Suppr超能文献

用于预测甲状腺乳头状癌中德尔菲淋巴结转移的可解释机器学习模型的开发与验证:一项大型队列研究

Development and validation of an explainable machine learning model to predict Delphian lymph node metastasis in papillary thyroid cancer: a large cohort study.

作者信息

Cui Jie, Liu Genglong, Yue Kai, Wu Yansheng, Duan Yuansheng, Wei Minghui, Wang Xudong

机构信息

Department of Maxillofacial and Otorhinolaryngological Oncology, Tianjin Medical University Cancer Institute and Hospital, Key Laboratory of Basic and Translational Medicine on Head & Neck Cancer, Tianjin, Key Laboratory of Cancer Prevention and Therapy, Tianjin Cancer Institute, National Clinical Research Center of Cancer, Tianjin, 300060, PR China.

School of Medicine, Southern Medical University, Foshan, 528305, Guangdong Province, PR China.

出版信息

J Cancer. 2025 Mar 3;16(6):2041-2061. doi: 10.7150/jca.110141. eCollection 2025.

Abstract

The occurrence of papillary thyroid cancer (PTC) has risen substantially and tends to exhibit early-stage lymph node metastasis (LNM), increasing the risk of postoperative recurrence and decreasing survival. There is a lack of a machine learning (ML) model to predict delphian LNM (DLNM) in PTC. This investigation seeks to comprehensively assess the significance of standard clinical indicators for DLNM prediction, while constructing a dependable and widely applicable ensemble ML framework to support surgical planning and therapeutic decision-making. This investigation incorporated 1993 sequential PTC patients who underwent curative surgical procedures from 2020 to 2023. Based on the time to surgery, we divided the cohort into the training cohort (n=1395) and the validation cohort (n=598). The Boruta algorithm was applied to select feature variables, succeeded by the development of an innovative ML structure combining 12 ML techniques across 113 permutations to create a unified prediction model (DLNM index). ROC analysis, calibration curve, Bootstrapping, 10-fold cross validation, restricted cubic spline (RCS) regression, multivariable logistic regression, and subgroup analysis were utilised to evaluate the predictive accuracy and discriminative ability of the DLNM index. Model interpretation and feature impact visualisation were accomplished through the Shapley Additive Explanations (SHAP) methodology. Based on 14 features via the Boruta algorithm selection, we integrated them into 12 ML approaches, yielding 113 permutations, from which we identified the superior algorithm to establish a consensus ML-derived diagnostic model (DLNM index). The DLNM index exhibited excellent diagnostic values with a mean AUC of 0.763 in two cohorts and discriminative ability, serving as an independent risk factor ( < 0.001). It performed better in predicting performance and yielded a larger net benefit than the published model ( < 0.05). Bootstrapping and 10-fold cross validation, and subgroup analysis showed that the DLNM index was generally robust and generalisable. SHAP explains the importance of ranking features (tumour size, right 4 region LN, FT4, TG, and T3) and visualises global and individual risk prediction. RCS regression suggested a nonlinear link between the DLNM index, TG, tumour size, FT3, and DLNM risk. An optimised explainable model (DLNM index) comprising 12 clinical features based on multiple ML algorithms was constructed and validated to provide an economical, readily available, and precise diagnostic instrument for DLNM in PTC, which has potential implications for clinical practice. The SHAP explanation and RCS regression quantify and visualise tumour size and FT4 as the most important variables that increase DLNM risk.

摘要

甲状腺乳头状癌(PTC)的发病率大幅上升,且往往表现为早期淋巴结转移(LNM),增加了术后复发风险并降低了生存率。目前缺乏用于预测PTC中Delphian淋巴结转移(DLNM)的机器学习(ML)模型。本研究旨在全面评估标准临床指标对DLNM预测的意义,同时构建一个可靠且广泛适用的集成ML框架,以支持手术规划和治疗决策。本研究纳入了1993例在2020年至2023年期间接受根治性手术的连续性PTC患者。根据手术时间,我们将队列分为训练队列(n = 1395)和验证队列(n = 598)。应用Boruta算法选择特征变量,随后开发了一种创新的ML结构,该结构结合了12种ML技术,进行了113种排列组合,以创建一个统一的预测模型(DLNM指数)。采用ROC分析、校准曲线、Bootstrapping、10折交叉验证、受限立方样条(RCS)回归、多变量逻辑回归和亚组分析来评估DLNM指数的预测准确性和判别能力。通过Shapley加性解释(SHAP)方法完成模型解释和特征影响可视化。基于通过Boruta算法选择的14个特征,我们将它们整合到12种ML方法中,产生了113种排列组合,从中我们确定了 superior算法,以建立一个基于ML的共识诊断模型(DLNM指数)。DLNM指数在两个队列中均表现出优异的诊断价值,平均AUC为0.763,具有判别能力,可作为独立危险因素(<0.001)。与已发表的模型相比,它在预测性能方面表现更好,净效益更大(<0.05)。Bootstrapping和10折交叉验证以及亚组分析表明,DLNM指数总体上具有稳健性和可推广性。SHAP解释了特征排名(肿瘤大小、右侧4区淋巴结、FT4、TG和T3)的重要性,并可视化了全局和个体风险预测。RCS回归表明DLNM指数、TG、肿瘤大小、FT3和DLNM风险之间存在非线性关系。构建并验证了一个基于多种ML算法的包含12个临床特征的优化可解释模型(DLNM指数),为PTC中的DLNM提供了一种经济、易用且精确的诊断工具,这对临床实践具有潜在意义。SHAP解释和RCS回归量化并可视化了肿瘤大小和FT4是增加DLNM风险的最重要变量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6822/11905415/696f22ecb31d/jcav16p2041g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验