Suppr超能文献

一种可解释的两阶段建模方法,用于预测肺癌患者生存率。

An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction.

机构信息

Robert B. Willumstad School of Business, Adelphi University, Garden City, NY 11530, USA.

McDonough School of Business, Georgetown University, Washington, DC 20057, USA.

出版信息

Sensors (Basel). 2022 Sep 8;22(18):6783. doi: 10.3390/s22186783.

Abstract

Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for 0.5-, 1-, 1.5-, 2-, 2.5-, and 3-year time-points (phase I) and predicting the number of survival months within 3 years (phase II) using recent Surveillance, Epidemiology, and End Results data from 2010 to 2017. In this study, we employ three analytical models (general linear model, extreme gradient boosting, and artificial neural networks), five data balancing techniques (synthetic minority oversampling technique (SMOTE), relocating safe level SMOTE, borderline SMOTE, adaptive synthetic sampling, and majority weighted minority oversampling technique), two feature selection methods (least absolute shrinkage and selection operator (LASSO) and random forest), and the one-hot encoding approach. By implementing a comprehensive data preparation phase, we demonstrate that a computationally efficient and interpretable method such as GLM performs comparably to more complex models. Moreover, we quantify the effects of individual features in phase I and II by exploiting GLM coefficients. To the best of our knowledge, this study is the first to (a) implement a comprehensive data processing approach to develop performant, computationally efficient, and interpretable methods in comparison to black-box models, (b) visualize top factors impacting survival odds by utilizing the change in odds ratio, and (c) comprehensively explore short-term lung cancer survival using a two-phase approach.

摘要

虽然肺癌的生存状况和生存时间预测主要是单独研究的,但仍难以找到一种能够以可解释的方式利用这两个领域的方案。我们提出了一个两阶段数据分析框架,该框架能够对 0.5 年、1 年、1.5 年、2 年、2.5 年和 3 年的生存状况进行分类(第一阶段),并预测 3 年内的生存月数(第二阶段),使用的是 2010 年至 2017 年的最新监测、流行病学和最终结果数据。在这项研究中,我们使用了三种分析模型(广义线性模型、极端梯度提升和人工神经网络)、五种数据平衡技术(合成少数过采样技术(SMOTE)、重新定位安全级别 SMOTE、边界 SMOTE、自适应合成采样和多数加权少数过采样技术)、两种特征选择方法(最小绝对收缩和选择算子(LASSO)和随机森林)和一位热编码方法。通过实施全面的数据准备阶段,我们证明了像 GLM 这样的计算效率高且可解释的方法可以与更复杂的模型相媲美。此外,我们通过利用 GLM 系数在第一阶段和第二阶段量化了单个特征的影响。据我们所知,这项研究是第一个(a) 实施全面的数据处理方法来开发性能高、计算效率高且可解释的方法,与黑盒模型相比,(b) 通过利用比值比的变化来可视化影响生存几率的最重要因素,以及(c) 全面探索使用两阶段方法的短期肺癌生存情况。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8ad/9503480/7c6ed8fce214/sensors-22-06783-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验