School of Science and Engineering, Lanzhou University of Finance and Economics, Lanzhou, Gansu, China.
Departments of Medicine.
Med Care. 2020 May;58(5):461-467. doi: 10.1097/MLR.0000000000001288.
Prognostic modeling in health care has been predominantly statistical, despite a rapid growth of literature on machine-learning approaches in biological data analysis. We aim to assess the relative importance of variables in predicting overall survival among patients with non-small cell lung cancer using a Variable Importance (VIMP) approach in a machine-learning Random Survival Forest (RSF) model for posttreatment planning and follow-up.
A total of 935 non-small cell lung cancer patients were randomly and equally divided into 2 training and testing cohorts in an RFS model. The prognostic variables included age, sex, race, the TNM Classification of Malignant Tumors (TNM) stage, smoking history, Eastern Cooperative Oncology Group performance status, histologic type, treatment category, maximum standard uptake value of whole-body tumor (SUVmaxWB), whole-body metabolic tumor volume (MTVwb), and Charlson Comorbidity Index. The VIMP was calculated using a permutation method in the RSF model. We further compared the VIMP of the RSF model to that of the standard Cox survival model. We examined the order of VIMP with the differential functional forms of the variables.
In both the RSF and the standard Cox models, the most important variables are treatment category, TNM stage, and MTVwb. The order of VIMP is more robust in RSF model than in Cox model regarding the differential functional forms of the variables.
The RSF VIMP approach can be applied alongside with the Cox model to further advance the understanding of the roles of prognostic factors, and improve prognostic precision and care efficiency.
尽管机器学习方法在生物数据分析方面的文献迅速增长,但医疗保健中的预后建模主要还是基于统计学方法。我们旨在通过机器学习随机生存森林(RSF)模型中的变量重要性(VIMP)方法评估非小细胞肺癌患者总体生存率的预测中变量的相对重要性,用于治疗后规划和随访。
在 RFS 模型中,共有 935 名非小细胞肺癌患者被随机均等分为 2 个训练和测试队列。预后变量包括年龄、性别、种族、TNM 恶性肿瘤分类(TNM)分期、吸烟史、东部合作肿瘤学组表现状态、组织学类型、治疗类别、全身肿瘤最大标准化摄取值(SUVmaxWB)、全身代谢肿瘤体积(MTVwb)和 Charlson 合并症指数。使用 RSF 模型中的置换方法计算 VIMP。我们进一步比较了 RSF 模型和标准 Cox 生存模型的 VIMP。我们还检查了 VIMP 与变量的差分函数形式的顺序。
在 RSF 和标准 Cox 模型中,最重要的变量是治疗类别、TNM 分期和 MTVwb。在 Cox 模型中,VIMP 的顺序因变量的差分函数形式而更加稳健。
RSF VIMP 方法可以与 Cox 模型一起应用,以进一步加深对预后因素作用的理解,并提高预后精度和护理效率。