Department of Computer Engineering, Urmia University, Urmia, Iran.
SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark.
Sci Rep. 2024 Jan 29;14(1):2371. doi: 10.1038/s41598-024-53006-2.
In this study, we utilized data from the Surveillance, Epidemiology, and End Results (SEER) database to predict the glioblastoma patients' survival outcomes. To assess dataset skewness and detect feature importance, we applied Pearson's second coefficient test of skewness and the Ordinary Least Squares method, respectively. Using two sampling strategies, holdout and five-fold cross-validation, we developed five machine learning (ML) models alongside a feed-forward deep neural network (DNN) for the multiclass classification and regression prediction of glioblastoma patient survival. After balancing the classification and regression datasets, we obtained 46,340 and 28,573 samples, respectively. Shapley additive explanations (SHAP) were then used to explain the decision-making process of the best model. In both classification and regression tasks, as well as across holdout and cross-validation sampling strategies, the DNN consistently outperformed the ML models. Notably, the accuracy were 90.25% and 90.22% for holdout and five-fold cross-validation, respectively, while the corresponding R values were 0.6565 and 0.6622. SHAP analysis revealed the importance of age at diagnosis as the most influential feature in the DNN's survival predictions. These findings suggest that the DNN holds promise as a practical auxiliary tool for clinicians, aiding them in optimal decision-making concerning the treatment and care trajectories for glioblastoma patients.
在这项研究中,我们利用了监测、流行病学和最终结果 (SEER) 数据库的数据来预测胶质母细胞瘤患者的生存结果。为了评估数据集的偏度并检测特征的重要性,我们分别应用了 Pearson 的第二偏度系数检验和最小二乘法。我们使用两种抽样策略(留一法和五重交叉验证),以及前馈深度神经网络 (DNN),为胶质母细胞瘤患者生存的多类分类和回归预测开发了五个机器学习 (ML) 模型。在平衡分类和回归数据集后,我们分别获得了 46340 和 28573 个样本。然后,我们使用 Shapley 可加性解释 (SHAP) 来解释最佳模型的决策过程。在分类和回归任务中,以及在留一法和五重交叉验证抽样策略中,DNN 始终优于 ML 模型。值得注意的是,留一法和五重交叉验证的准确率分别为 90.25%和 90.22%,相应的 R 值分别为 0.6565 和 0.6622。SHAP 分析揭示了年龄在诊断时作为 DNN 生存预测中最具影响力的特征的重要性。这些发现表明,DNN 有望成为临床医生的实用辅助工具,帮助他们在胶质母细胞瘤患者的治疗和护理轨迹方面做出最佳决策。