Suppr超能文献

基于可解释机器学习的胰腺癌患者个性化三年生存预测与预后预测:一项基于人群的研究及外部验证

Personalized three-year survival prediction and prognosis forecast by interpretable machine learning for pancreatic cancer patients: a population-based study and an external validation.

作者信息

Teng Buwei, Zhang Xiaofeng, Ge Mingshu, Miao Miao, Li Wei, Ma Jun

机构信息

Department of Hepatobiliary Surgery, The Affiliated Lianyungang Hospital of Xuzhou Medical University/The First People's Hospital of Lianyungang, Lianyungang, China.

Department of Imaging, The Affiliated Huai'an Hospital of Xuzhou Medical University and the Second People's Hospital of Huai'an, Huai'an, China.

出版信息

Front Oncol. 2024 Oct 21;14:1488118. doi: 10.3389/fonc.2024.1488118. eCollection 2024.

Abstract

PURPOSE

The overall survival of patients with pancreatic cancer is extremely low. We aimed to establish machine learning (ML) based model to accurately predict three-year survival and prognosis of pancreatic cancer patients.

METHODS

We analyzed pancreatic cancer patients from the Surveillance, Epidemiology, and End Results (SEER) database between 2000 and 2021. Univariate and multivariate logistic analysis were employed to select variables. Recursive Feature Elimination (RFE) method based on 6 ML algorithms was utilized in feature selection. To construct predictive model, 13 ML algorithms were evaluated by area under the curve (AUC), area under precision-recall curve (PRAUC), accuracy, sensitivity, specificity, precision, cross-entropy, Brier scores and Balanced Accuracy (bacc) and F Beta Score (fbeta). An optimal ML model was constructed to predict three-year survival, and the predictive results were explained by SHapley Additive exPlanations (SHAP) framework. Meanwhile, 101 ML algorithm combinations were developed to select the best model with highest C-index to predict prognosis of pancreatic cancer patients.

RESULTS

A total of 20,064 pancreatic cancer patients from SEER database was consecutively enrolled. We utilized eight clinical variables to establish prediction model for three-year survival. CatBoost model was selected as the best prediction model, and AUC was 0.932 [0.924, 0.939], 0.899 [0.873, 0.934] and 0.826 [0.735, 0.919] in training, internal test and external test sets, with 0.839 [0.831, 0.847] accuracy, 0.872 [0.858, 0.887] sensitivity, 0.803 [0.784, 0.825] specificity and 0.832 [0.821, 0.853] precision. Surgery type had the greatest effects on three-year survival according to SHAP results. For prognosis prediction, "RSF+GBM" algorithm was the best prognostic model with C-index of 0.774, 0.722 and 0.674 in training, internal test and external test sets.

CONCLUSIONS

Our ML models demonstrate excellent accuracy and reliability, offering more precise personalized prognostic prediction to pancreatic cancer patients.

摘要

目的

胰腺癌患者的总体生存率极低。我们旨在建立基于机器学习(ML)的模型,以准确预测胰腺癌患者的三年生存率和预后。

方法

我们分析了2000年至2021年间监测、流行病学和最终结果(SEER)数据库中的胰腺癌患者。采用单变量和多变量逻辑分析来选择变量。在特征选择中使用了基于6种ML算法的递归特征消除(RFE)方法。为构建预测模型,通过曲线下面积(AUC)、精确召回曲线下面积(PRAUC)、准确性、敏感性、特异性、精确率、交叉熵、布里尔分数和平衡准确性(bacc)以及F贝塔分数(fbeta)对13种ML算法进行了评估。构建了一个最优的ML模型来预测三年生存率,并通过SHapley加性解释(SHAP)框架对预测结果进行了解释。同时,开发了101种ML算法组合,以选择具有最高C指数的最佳模型来预测胰腺癌患者的预后。

结果

连续纳入了SEER数据库中的20064例胰腺癌患者。我们利用8个临床变量建立了三年生存率的预测模型。选择CatBoost模型作为最佳预测模型,在训练集、内部测试集和外部测试集中,AUC分别为0.932[0.924,0.939]、0.899[0.873,0.934]和0.826[0.735,0.919],准确性为0.839[0.831,0.847],敏感性为0.872[0.858,0.887],特异性为0.803[0.784,0.825],精确率为0.832[0.821,0.85]。根据SHAP结果,手术类型对三年生存率的影响最大。对于预后预测,“RSF+GBM”算法是最佳预后模型,在训练集、内部测试集和外部测试集中的C指数分别为0.774、0.722和0.674。

结论

我们的ML模型显示出优异的准确性和可靠性,为胰腺癌患者提供了更精确的个性化预后预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d214/11532159/f09ce8959ae9/fonc-14-1488118-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验