Hanani Ahmad A, Donmez Turker Berk, Kutlu Mustafa, Mansour Mohammed
Biomedical and Clinical Basic Skills Department, Faculty of Medicine and Health Sciences, An-Najah National University, Nablus, Palestine.
Biomedical Engineering Department, Sakarya University of Applied Sciences, Sakarya, Turkey.
Medicine (Baltimore). 2025 May 30;104(22):e42667. doi: 10.1097/MD.0000000000042667.
Recurrence prediction in well-differentiated thyroid cancer remains a clinical challenge, necessitating more accurate and interpretable predictive models. This study investigates the use of a supervised CatBoost classifier to predict recurrence in well-differentiated thyroid cancer patients, comparing its performance against other ensemble models and employing Shapley Additive Explanations (SHAP) to enhance interpretability. A dataset comprising 383 patients with diverse demographic, clinical, and pathological variables was utilized. Data preprocessing steps included handling values and encoding categorical features. The dataset was split into training and testing sets using a 70:30 ratio. Model performance was evaluated using accuracy and area under the receiver operating characteristic curve. A comparative analysis was conducted with other ensemble methods, such as Extra Trees, LightGBM, and XGBoost. SHAP analysis was employed to determine feature importance and assess model interpretability at both the global and local levels. The supervised CatBoost classifier demonstrated superior performance, achieving an accuracy of 97% and an area under the receiver operating characteristic curve of 0.99, outperforming competing models. SHAP analysis revealed that treatment response (SHAP value: 2.077), risk stratification (SHAP value: 0.859), and lymph node involvement (N) (SHAP value: 0.596) were the most influential predictors of recurrence. Local SHAP analyses provided insight into individual predictions, highlighting that misclassification often resulted from overemphasizing a single factor while overlooking other clinically relevant indicators. The supervised CatBoost classifier demonstrated high predictive performance and enhanced interpretability through SHAP analysis. These findings underscore the importance of incorporating multiple predictive factors to improve recurrence risk assessment. While the model shows promise in personalizing thyroid cancer management, further validation on larger, more diverse datasets is warranted to ensure robustness.
高分化甲状腺癌的复发预测仍然是一项临床挑战,因此需要更准确且可解释的预测模型。本研究调查了使用有监督的CatBoost分类器来预测高分化甲状腺癌患者的复发情况,将其性能与其他集成模型进行比较,并采用Shapley值加法解释(SHAP)来增强可解释性。使用了一个包含383名具有不同人口统计学、临床和病理变量患者的数据集。数据预处理步骤包括处理数值和编码分类特征。数据集按70:30的比例分为训练集和测试集。使用准确率和受试者工作特征曲线下面积评估模型性能。与其他集成方法(如Extra Trees、LightGBM和XGBoost)进行了对比分析。采用SHAP分析来确定特征重要性,并在全局和局部层面评估模型的可解释性。有监督的CatBoost分类器表现出卓越性能,准确率达到97%,受试者工作特征曲线下面积为0.99,优于竞争模型。SHAP分析显示,治疗反应(SHAP值:2.077)、风险分层(SHAP值:0.859)和淋巴结受累情况(N)(SHAP值:0.596)是复发的最有影响力的预测因素。局部SHAP分析为个体预测提供了见解,突出表明错误分类通常是由于过度强调单一因素而忽略了其他临床相关指标。有监督的CatBoost分类器通过SHAP分析展示了高预测性能和增强的可解释性。这些发现强调了纳入多个预测因素以改善复发风险评估的重要性。虽然该模型在甲状腺癌个体化管理方面显示出前景,但有必要在更大、更多样化的数据集上进行进一步验证,以确保其稳健性。