Suppr超能文献

用于预测老年人肺癌发病风险的集成机器学习模型:一项回顾性纵向研究

Ensemble machine learning models for lung cancer incidence risk prediction in the elderly: a retrospective longitudinal study.

作者信息

Chen Songjing, Wu Sizhu

机构信息

Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.

出版信息

BMC Cancer. 2025 Jan 22;25(1):126. doi: 10.1186/s12885-025-13562-w.

Abstract

BACKGROUND

Identifying high risk factors and predicting lung cancer incidence risk are essential to prevention and intervention of lung cancer for the elderly. We aim to develop lung cancer incidence risk prediction model in the elderly to facilitate early intervention and prevention of lung cancer.

METHODS

We stratified the population into six subgroups according to age and gender. For each subgroup, random forest, extreme gradient boosting, deep neural networks, support vector machine, multiple logistic regression and deep Q network (DQN) models were developed and validated. Models were trained and tested using samples from 2000 to 2015 and independent external validated through those from 2016 to 2019. The suitable model for lung cancer risk prediction and high risk factors identification was chosen based on internal validation and independent external validation.

RESULTS

The DQN model achieved the optimal prediction performance in stratified subgroups, with AUROC ranging from 0.937 to 0.953, recall ranging from 0.932 to 0.943, F-score ranging from 0.929 to 0.946, precision ranging from 0.926 to 0.952, F-score ranging from 0.933 to 0.963 and RMSE ranging from 0.21 to 0.27. SHAP values were supplied for model interpretability. High risk factors of lung cancer incidence were identified in the elderly. Men ≥ 65 carrying C > A/G > T mutation had the highest lung cancer incidence decrease of 39.5% after five years quitting in stratified elderly groups, which were 1.83 times more than women ≥ 65 not carrying C > A/G > T mutation.

CONCLUSIONS

The DQN model may be suitable for identifying high risk factors and predicting lung cancer risk with high performance. The proposed intervention and diagnosis pathways could be used for early screening and intervention before the occurrence of lung cancer, which could help oncologists develop targeted intervention strategies for the stratified elderly to reduce lung cancer incidence and improve therapeutic effect. Proposed method could also be used in predicting the risk of other chronic diseases to help conduct intervention and reduce incidence.

摘要

背景

识别高危因素并预测肺癌发病风险对于老年人肺癌的预防和干预至关重要。我们旨在开发老年人肺癌发病风险预测模型,以促进肺癌的早期干预和预防。

方法

我们根据年龄和性别将人群分为六个亚组。对于每个亚组,开发并验证了随机森林、极端梯度提升、深度神经网络、支持向量机、多元逻辑回归和深度Q网络(DQN)模型。使用2000年至2015年的样本对模型进行训练和测试,并通过2016年至2019年的样本进行独立外部验证。基于内部验证和独立外部验证,选择适合肺癌风险预测和高危因素识别的模型。

结果

DQN模型在分层亚组中实现了最佳预测性能,AUROC范围为0.937至0.953,召回率范围为0.932至0.943,F分数范围为0.929至0.946,精确率范围为0.926至0.952,F分数范围为0.933至0.963,RMSE范围为0.21至0.27。提供了SHAP值以提高模型的可解释性。在老年人中识别出了肺癌发病的高危因素。在分层老年组中,65岁及以上携带C>A/G>T突变的男性在戒烟五年后肺癌发病率下降幅度最大,为39.5%,是65岁及以上未携带C>A/G>T突变女性的1.83倍。

结论

DQN模型可能适用于识别高危因素并高性能地预测肺癌风险。所提出的干预和诊断途径可用于肺癌发生前的早期筛查和干预,这有助于肿瘤学家为分层老年人制定有针对性的干预策略,以降低肺癌发病率并提高治疗效果。所提出的方法还可用于预测其他慢性病的风险,以帮助进行干预并降低发病率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a61d/11755819/80df46f1c6ac/12885_2025_13562_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验