Bartholomai James A, Frieboes Hermann B
Dept. of Bioengineering University of Louisville Louisville, KY.
Proc IEEE Int Symp Signal Proc Inf Tech. 2018 Dec;2018:632-637. doi: 10.1109/ISSPIT.2018.8642753. Epub 2019 Feb 18.
A regression model is developed to predict survival time in months for lung cancer patients. It was previously shown that predictive models perform accurately for short survival times of less than 6 months; however, model accuracy is reduced when attempting to predict longer survival times. This study employs an approach for which regression models are used in combination with a classification model to predict survival time. A set of de-identified lung cancer patient data was obtained from the Surveillance, Epidemiology, and End Results (SEER) database. The models use a subset of factors selected by ANOVA. Model accuracy is measured by a confusion matrix for classification and by Root Mean Square Error (RMSE) for regression. Random Forests are used for classification, while general Linear Regression, Gradient Boosted Machines (GBM), and Random Forests are used for regression. The regression results show that RF had the best performance for survival times ≤6 and >24 months (RMSE 10.52 and 20.51, respectively), while GBM performed best for 7-24 months (RMSE 15.65). Comparison plots of the results further indicate that the regression models perform better for shorter survival times than the RMSE values are able to reflect.
开发了一种回归模型来预测肺癌患者以月为单位的生存时间。先前的研究表明,预测模型对于短于6个月的生存时间预测准确;然而,在尝试预测更长的生存时间时,模型准确性会降低。本研究采用一种方法,将回归模型与分类模型结合起来预测生存时间。从监测、流行病学和最终结果(SEER)数据库中获取了一组经过去识别处理的肺癌患者数据。这些模型使用通过方差分析选择的因素子集。模型准确性通过分类的混淆矩阵和回归的均方根误差(RMSE)来衡量。随机森林用于分类,而普通线性回归、梯度提升机(GBM)和随机森林用于回归。回归结果表明,对于生存时间≤6个月和>24个月的情况,随机森林表现最佳(RMSE分别为10.52和20.51),而对于7 - 24个月的情况,梯度提升机表现最佳(RMSE为15.65)。结果的比较图进一步表明,回归模型对于较短的生存时间表现优于RMSE值所能反映的情况。