一种可解释的两阶段建模方法，用于预测肺癌患者生存率。

An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction.

机构信息

Robert B. Willumstad School of Business, Adelphi University, Garden City, NY 11530, USA.

McDonough School of Business, Georgetown University, Washington, DC 20057, USA.

出版信息

Sensors (Basel). 2022 Sep 8;22(18):6783. doi: 10.3390/s22186783.

DOI:10.3390/s22186783

PMID:36146145

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9503480/

Abstract

Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for 0.5-, 1-, 1.5-, 2-, 2.5-, and 3-year time-points (phase I) and predicting the number of survival months within 3 years (phase II) using recent Surveillance, Epidemiology, and End Results data from 2010 to 2017. In this study, we employ three analytical models (general linear model, extreme gradient boosting, and artificial neural networks), five data balancing techniques (synthetic minority oversampling technique (SMOTE), relocating safe level SMOTE, borderline SMOTE, adaptive synthetic sampling, and majority weighted minority oversampling technique), two feature selection methods (least absolute shrinkage and selection operator (LASSO) and random forest), and the one-hot encoding approach. By implementing a comprehensive data preparation phase, we demonstrate that a computationally efficient and interpretable method such as GLM performs comparably to more complex models. Moreover, we quantify the effects of individual features in phase I and II by exploiting GLM coefficients. To the best of our knowledge, this study is the first to (a) implement a comprehensive data processing approach to develop performant, computationally efficient, and interpretable methods in comparison to black-box models, (b) visualize top factors impacting survival odds by utilizing the change in odds ratio, and (c) comprehensively explore short-term lung cancer survival using a two-phase approach.

摘要

虽然肺癌的生存状况和生存时间预测主要是单独研究的，但仍难以找到一种能够以可解释的方式利用这两个领域的方案。我们提出了一个两阶段数据分析框架，该框架能够对 0.5 年、1 年、1.5 年、2 年、2.5 年和 3 年的生存状况进行分类（第一阶段），并预测 3 年内的生存月数（第二阶段），使用的是 2010 年至 2017 年的最新监测、流行病学和最终结果数据。在这项研究中，我们使用了三种分析模型（广义线性模型、极端梯度提升和人工神经网络）、五种数据平衡技术（合成少数过采样技术（SMOTE）、重新定位安全级别 SMOTE、边界 SMOTE、自适应合成采样和多数加权少数过采样技术）、两种特征选择方法（最小绝对收缩和选择算子（LASSO）和随机森林）和一位热编码方法。通过实施全面的数据准备阶段，我们证明了像 GLM 这样的计算效率高且可解释的方法可以与更复杂的模型相媲美。此外，我们通过利用 GLM 系数在第一阶段和第二阶段量化了单个特征的影响。据我们所知，这项研究是第一个(a) 实施全面的数据处理方法来开发性能高、计算效率高且可解释的方法，与黑盒模型相比，(b) 通过利用比值比的变化来可视化影响生存几率的最重要因素，以及(c) 全面探索使用两阶段方法的短期肺癌生存情况。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8ad/9503480/7c6ed8fce214/sensors-22-06783-g001.jpg

相似文献

An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction.一种可解释的两阶段建模方法，用于预测肺癌患者生存率。

Sensors (Basel). 2022 Sep 8;22(18):6783. doi: 10.3390/s22186783.

A two-stage modeling approach for breast cancer survivability prediction.两阶段建模方法用于乳腺癌生存预测。

Int J Med Inform. 2021 May;149:104438. doi: 10.1016/j.ijmedinf.2021.104438. Epub 2021 Mar 11.

SMOTE-CD: SMOTE for compositional data.SMOTE-CD：针对组合数据的 SMOTE 方法。

PLoS One. 2023 Jun 29;18(6):e0287705. doi: 10.1371/journal.pone.0287705. eCollection 2023.

An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data.利用抽样和特征选择技术解决不平衡患者分类数据，提高乳腺癌患者生存率预测。

BMC Med Inform Decis Mak. 2013 Nov 9;13:124. doi: 10.1186/1472-6947-13-124.

Improvement of P300-Based Brain-Computer Interfaces for Home Appliances Control by Data Balancing Techniques.基于 P300 的脑机接口的数据均衡技术在家用电器控制中的改进。

Sensors (Basel). 2020 Sep 29;20(19):5576. doi: 10.3390/s20195576.

Lung cancer survival period prediction and understanding: Deep learning approaches.肺癌生存期预测与认识：深度学习方法。

Int J Med Inform. 2021 Apr;148:104371. doi: 10.1016/j.ijmedinf.2020.104371. Epub 2020 Dec 29.

Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method.通过结合合成少数过采样技术和分数融合方法改善肺癌预后评估

Med Phys. 2016 Jun;43(6):2694-2703. doi: 10.1118/1.4948499.

Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier.基于极端梯度提升分类器的多视图特征预测蛋白质泛素化位点。

J Mol Graph Model. 2021 Sep;107:107962. doi: 10.1016/j.jmgm.2021.107962. Epub 2021 Jun 15.

A comprehensive data level analysis for cancer diagnosis on imbalanced data.针对不平衡数据进行癌症诊断的全面数据级别分析。

J Biomed Inform. 2019 Feb;90:103089. doi: 10.1016/j.jbi.2018.12.003. Epub 2019 Jan 3.

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合，以预测放射性肺损伤。

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

本文引用的文献

A Model for Predicting Cervical Cancer Using Machine Learning Algorithms.基于机器学习算法的宫颈癌预测模型。

Sensors (Basel). 2022 May 29;22(11):4132. doi: 10.3390/s22114132.

Towards Interpretable Deep Learning: A Feature Selection Framework for Prognostics and Health Management Using Deep Neural Networks.迈向可解释的深度学习：基于深度神经网络的预测与健康管理的特征选择框架。

Sensors (Basel). 2021 Sep 1;21(17):5888. doi: 10.3390/s21175888.

Non-Invasive Detection and Staging of Colorectal Cancer Using a Portable Electronic Nose.使用便携式电子鼻进行结直肠癌的非侵入性检测和分期。

Sensors (Basel). 2021 Aug 12;21(16):5440. doi: 10.3390/s21165440.

A two-stage modeling approach for breast cancer survivability prediction.两阶段建模方法用于乳腺癌生存预测。

Int J Med Inform. 2021 May;149:104438. doi: 10.1016/j.ijmedinf.2021.104438. Epub 2021 Mar 11.

A Machine Learning-Based Investigation of Gender-Specific Prognosis of Lung Cancers.基于机器学习的肺癌性别特异性预后研究。

Medicina (Kaunas). 2021 Jan 22;57(2):99. doi: 10.3390/medicina57020099.

Lung cancer survival period prediction and understanding: Deep learning approaches.肺癌生存期预测与认识：深度学习方法。

Int J Med Inform. 2021 Apr;148:104371. doi: 10.1016/j.ijmedinf.2020.104371. Epub 2020 Dec 29.

Survival Nomogram for Stage IB Non-Small-Cell Lung Cancer Patients, Based on the SEER Database and an External Validation Cohort.基于 SEER 数据库和外部验证队列的 IB 期非小细胞肺癌患者生存列线图

Ann Surg Oncol. 2021 Jul;28(7):3941-3950. doi: 10.1245/s10434-020-09362-0. Epub 2020 Nov 28.

The Positive Lymph Node Ratio Predicts Survival in TNM Non-Small Cell Lung Cancer: A Nomogram Using the SEER Database.阳性淋巴结比率可预测TNM非小细胞肺癌的生存情况：一项基于监测、流行病学和最终结果（SEER）数据库的列线图研究

Front Oncol. 2020 Aug 5;10:1356. doi: 10.3389/fonc.2020.01356. eCollection 2020.

Comparative study of large cell neuroendocrine carcinoma and small cell lung carcinoma in high-grade neuroendocrine tumors of the lung: a large population-based study.肺高级别神经内分泌肿瘤中，大细胞神经内分泌癌与小细胞肺癌的比较研究：一项基于大人群的研究

J Cancer. 2019 Jul 10;10(18):4226-4236. doi: 10.7150/jca.33367. eCollection 2019.

Lung Cancer Survival Prediction via Machine Learning Regression, Classification, and Statistical Techniques.通过机器学习回归、分类和统计技术进行肺癌生存预测。

Proc IEEE Int Symp Signal Proc Inf Tech. 2018 Dec;2018:632-637. doi: 10.1109/ISSPIT.2018.8642753. Epub 2019 Feb 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种可解释的两阶段建模方法，用于预测肺癌患者生存率。

An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献