Suppr超能文献

IDPC 聚类分析与可解释机器学习在 ESCC 患者生存风险预测中的整合。

Integration of IDPC Clustering Analysis and Interpretable Machine Learning for Survival Risk Prediction of Patients with ESCC.

机构信息

Henan Key Lab of Information-Based Electrical Appliances, Zhengzhou University of Light Industry, Zhengzhou, 450002, China.

State Key Laboratory of Esophageal Cancer Prevention and Treatment and Henan Key Laboratory for Esophageal Cancer Research of The First Affiliated Hospital, Zhengzhou University, Zhengzhou, 450052, China.

出版信息

Interdiscip Sci. 2023 Sep;15(3):480-498. doi: 10.1007/s12539-023-00569-9. Epub 2023 May 30.

Abstract

Precise forecasting of survival risk plays a pivotal role in comprehending and predicting the prognosis of patients afflicted with esophageal squamous cell carcinoma (ESCC). The existing methods have the problems of insufficient fitting ability and poor interpretability. To address this issue, this work proposes a novel interpretable survival risk prediction method for ESCC patients based on extreme gradient boosting improved by whale optimization algorithm (WOA-XGBoost) and shapley additive explanations (SHAP). Given the imbalanced nature of the data set, the adaptive synthetic sampling (ADASYN) is first used to generate the samples with high survival risk. Then, an improved clustering by fast search and find of density peaks (IDPC) algorithm based on cosine distance and K nearest neighbors is used to cluster the patients. Next, the prediction model for each cluster is obtained by WOA-XGBoost and the constructed model is visualized with SHAP to uncover the factors hidden in the structured model and improve the interpretability of the black-box model. Finally, the effectiveness of the proposed scheme is demonstrated by analyzing the data collected from the First Affiliated Hospital of Zhengzhou University. The results of the analysis reveal that the proposed methodology exhibits superior performance, as indicated by the area under the receiver operating characteristic curve (AUROC) of 0.918 and accuracy of 0.881.

摘要

准确预测生存风险在理解和预测食管鳞状细胞癌(ESCC)患者的预后方面起着关键作用。现有的方法存在拟合能力不足和可解释性差的问题。针对这一问题,本工作提出了一种基于鲸鱼优化算法(WOA)改进的极端梯度提升和 Shapley 加性解释(SHAP)的 ESCC 患者可解释生存风险预测新方法。针对数据集的不平衡性,首先使用自适应合成采样(ADASYN)生成高生存风险的样本。然后,使用基于余弦距离和 K 最近邻的改进快速搜索和发现密度峰聚类算法(IDPC)对患者进行聚类。接下来,通过 WOA-XGBoost 获得每个聚类的预测模型,并使用 SHAP 对构建的模型进行可视化,以揭示结构模型中隐藏的因素,提高黑盒模型的可解释性。最后,通过分析郑州大学第一附属医院采集的数据来验证所提出方案的有效性。分析结果表明,所提出的方法表现出优越的性能,接收器工作特征曲线下的面积(AUROC)为 0.918,准确率为 0.881。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验