基于两个数据库的预测分化型甲状腺癌肺转移风险的机器学习模型的开发与验证

Development and validation of machine learning models for predicting lung metastasis risk in differentiated thyroid cancer based on two databases.

作者信息

Shen Haolin, Yang Caiyun, Wang Yuegui, Liao Jianmei, Zuo Xianbo, Zhang Bo, Yang Xiao

机构信息

Department of Ultrasound, Zhangzhou Municipal Hospital Affiliated to Fujian Medical University, Zhangzhou, China.

Department of Dermatology, China-Japan Friendship Hospital, Beijing, China.

出版信息

Gland Surg. 2024 Nov 30;13(11):2174-2188. doi: 10.21037/gs-24-481. Epub 2024 Nov 26.

DOI:10.21037/gs-24-481

PMID:39678420

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11635582/

Abstract

BACKGROUND

Differentiated thyroid cancer (DTC) progresses slowly, but patients with lung metastasis (LM) have a poor prognosis. The aim of this study was to develop and evaluate the predictive ability of machine learning (ML) models in estimating the risk of LM in patients with DTC and to identify the independent risk factors specific to different age and gender subgroups.

METHODS

The demographic and clinicopathological data of patients with DTC were obtained from two databases: firstly, the National Institutes of Health Surveillance, Epidemiology, and End Results (SEER) database [2010-2015], which provides extensive epidemiological and clinical information on cancer patients; secondly, the Zhangzhou Municipal Hospital Affiliated to Fujian Medical University [2014-2017], which focuses more on patients' specific clinicopathological characteristics and treatment outcomes. Common variables from both databases were extracted. The data were then split into training, testing and validation sets. The training set was used to build and train ML models, while the testing and validation set were employed to assess the performance of these models. In terms of model development, we established five different ML models: logistic regression (LR), random forest (RF), decision tree (DT), extreme gradient boosting (XGBoost), and gradient boosting machine (GBM). For model validation, we utilized various evaluation metrics, including accuracy, precision, recall, F1 score, Brier score, area under the receiver operating characteristic (ROC) curve (AUROC), area under the precision-recall (PR) curve (PR-AUC), calibration curve, and decision curve analysis (DCA). The importance of various features was ranked and visualized for the top-performing models.

RESULTS

The analysis identified age, gender, tumor size, T stage, N stage, and histologic type as significant independent risk factors for LM. The effects of gender, T stage, and histological type on the risk of LM varied across the different age subgroups. In the female population, tumor size was an independent risk factor for LM, while it was not in the male population. GBM achieved an AUROC of 0.982, a Brier score of 0.047, an accuracy of 0.818, and an F1 score of 0.818 in the validation set, outperforming the other models.

CONCLUSIONS

The GBM model emerged as an effective tool for identifying high-risk LM populations in DTC, with the potential to guide clinical practice and facilitate the development of individualized treatment plans. Further research to validate these findings across more diverse patient populations and clinical settings is recommended.

摘要

背景

分化型甲状腺癌（DTC）进展缓慢，但发生肺转移（LM）的患者预后较差。本研究的目的是开发并评估机器学习（ML）模型在预测DTC患者发生LM风险方面的能力，并确定不同年龄和性别亚组特有的独立危险因素。

方法

DTC患者的人口统计学和临床病理数据来自两个数据库：首先是美国国立卫生研究院监测、流行病学和最终结果（SEER）数据库[2010 - 2015年]，该数据库提供了癌症患者广泛的流行病学和临床信息；其次是福建医科大学附属漳州市医院[2014 - 2017年]，该数据库更侧重于患者的特定临床病理特征和治疗结果。提取两个数据库中的共同变量。然后将数据分为训练集、测试集和验证集。训练集用于构建和训练ML模型，而测试集和验证集用于评估这些模型的性能。在模型开发方面，我们建立了五种不同的ML模型：逻辑回归（LR）、随机森林（RF）、决策树（DT）、极端梯度提升（XGBoost）和梯度提升机（GBM）。对于模型验证，我们使用了各种评估指标，包括准确性、精确性、召回率、F1分数、布里尔分数、受试者操作特征（ROC）曲线下面积（AUROC）、精确召回（PR）曲线下面积（PR - AUC）、校准曲线和决策曲线分析（DCA）。对表现最佳的模型对各种特征的重要性进行排序并可视化。

结果

分析确定年龄、性别、肿瘤大小、T分期、N分期和组织学类型是LM的重要独立危险因素。性别、T分期和组织学类型对LM风险的影响在不同年龄亚组中有所不同。在女性人群中，肿瘤大小是LM的独立危险因素，而在男性人群中则不是。GBM在验证集中的AUROC为0.982，布里尔分数为0.047，准确性为0.818，F1分数为0.818，优于其他模型。

结论

GBM模型成为识别DTC中高风险LM人群的有效工具，有可能指导临床实践并促进个体化治疗方案的制定。建议进一步开展研究，在更多样化的患者群体和临床环境中验证这些发现。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于两个数据库的预测分化型甲状腺癌肺转移风险的机器学习模型的开发与验证

Development and validation of machine learning models for predicting lung metastasis risk in differentiated thyroid cancer based on two databases.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

基于两个数据库的预测分化型甲状腺癌肺转移风险的机器学习模型的开发与验证

Development and validation of machine learning models for predicting lung metastasis risk in differentiated thyroid cancer based on two databases.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献