Tian Xu, Li Haoyang, Li Feili, Jiménez-Herrera María F, Ren Yi, Shang Hongcai
Division of Science & Technology and Foreign Affairs, Chongqing Traditional Chinese Medicine Hospital, Chongqing, 400020, China.
School of Data Science, The Chinese University of Hong Kong, Shenzhen, 518172, China.
Support Care Cancer. 2024 Dec 30;33(1):63. doi: 10.1007/s00520-024-09127-5.
Early and accurate identification of the risk of psychological distress allows for timely intervention and improved prognosis. Current methods for predicting psychological distress among lung cancer patients using readily available data are limited. This study aimed to develop a robust machine learning (ML) model for determining the risk of psychological distress among lung cancer patients.
A cross-sectional study was designed to collect data from 342 lung cancer patients. Least Absolute Shrinkage and Selection Operator (LASSO) was used for feature selection. Model training and validation were conducted with bootstrap resampling method. Fivefold cross-validation evaluated and optimized the model with parameter tuning. Feature importance was assessed using SHapley additive exPlanations (SHAP) method.
The model identified seven independent risk factors of psychological distress: residence (β = 0.141), diagnosis duration (β = 0.055), TNM stage (β = 0.098), pain severity (β = 0.067), perceived stigma (β = 0.052), illness perception (β = 0.100), and coping style (β = 0.097). Among the eight ML algorithms evaluated, the extreme gradient boosting (XGBoost) algorithm demonstrated the highest performance with AUROC values of 0.988, 0.945, and 0.922 for the training, validation, and test sets, respectively. The model's results were further explained using SHAP, which revealed the importance and contribution of each risk factor to the overall distress risk. A web-based tool was developed based on this model to facilitate clinical use.
The XGBoost classifier demonstrated exceptional performance, and clinical implementation of the web-based risk calculator can serve as an easy-to-use tool for health practitioners to formulate early prevention and intervention strategies.
早期准确识别心理困扰风险有助于及时干预并改善预后。目前利用现成数据预测肺癌患者心理困扰的方法有限。本研究旨在开发一种强大的机器学习(ML)模型,以确定肺癌患者心理困扰的风险。
设计一项横断面研究,收集342例肺癌患者的数据。采用最小绝对收缩和选择算子(LASSO)进行特征选择。使用自助重采样方法进行模型训练和验证。通过五折交叉验证对模型进行评估并通过参数调整进行优化。使用夏普利值附加解释(SHAP)方法评估特征重要性。
该模型确定了心理困扰的七个独立风险因素:居住地(β = 0.141)、诊断时长(β = 0.055)、TNM分期(β = 0.098)、疼痛严重程度(β = 0.067)、感知耻辱感(β = 0.052)、疾病认知(β = 0.100)和应对方式(β = 0.097)。在评估的八种ML算法中,极端梯度提升(XGBoost)算法表现最佳,训练集、验证集和测试集的曲线下面积(AUROC)值分别为0.988、0.945和0.922。使用SHAP进一步解释了模型结果,揭示了每个风险因素对总体困扰风险的重要性和贡献。基于该模型开发了一个网络工具,以方便临床使用。
XGBoost分类器表现卓越,基于网络的风险计算器的临床应用可为医护人员制定早期预防和干预策略提供一个易于使用的工具。