Division of Pulmonology, Department of Internal Medicine, The Armed Forces Goyang Hospital, Goyang, Republic of Korea.
Department of Statistics, Pusan National University, Busan, Republic of Korea.
Thorac Cancer. 2022 Dec;13(23):3353-3361. doi: 10.1111/1759-7714.14694. Epub 2022 Oct 24.
BACKGROUND: The present study aimed to evaluate the performance of several machine learning (ML) algorithms in predicting 1-year afatinib continuation and 2-year survival after afatinib initiation and to identify the differences in survival outcomes between ML-classified strata. METHODS: Data that were also used in the RESET study were retrospectively collected from 16 hospitals in South Korea. A stratified random sampling method was applied to split the data into training and test sets (70:30 split ratio). Clinical information, such as age, sex, tumor stage, smoking, performance status, metastasis, type of metastasis, dose adjustment, and pathologic information on EGFR mutations were inputted. Training was performed using eight ML algorithms: logistic regression, decision tree, deep neural network, random forest, support vector machine, boosting, bagging, and the naïve Bayes classifier. The model performance was assessed based on sensitivity, specificity, and accuracy. Area under the receiver operator characteristic curve (AUC) was calculated and compared between the ML models using DeLong's test. A Kaplan-Meier (KM) curve was used to visualize the identified strata obtained from the ML models. RESULTS: No significant differences in the input variables were observed between the training and test datasets. The best-performing models were support vector machine in predicting 1-year afatinib continuation (AUC 0.626) and decision tree in 2-year survival after afatinib start (AUC 0.644), although the performances of the ML models were comparable and did not display any predictive roles. KM analysis and log-rank test revealed significant differences between the strata identified from the ML model (p < 0.001) in terms of both time-on-treatment (TOT) and overall survival (OS). CONCLUSION: The performances of ML models in our study found no discernible roles in predicting afatinib-related outcomes, although the identified strata revealed different TOT and OS in the KM analysis. This implies the strength of ML in predicting the survival outcome, as well as the limitation of electronic medical record-based variables in ML algorithms. Careful consideration of variable inclusion is likely to improve the general model performance.
背景:本研究旨在评估几种机器学习(ML)算法在预测阿法替尼起始后 1 年的续用和 2 年的生存方面的性能,并确定 ML 分类层之间生存结果的差异。
方法:本研究回顾性地从韩国 16 家医院收集了也用于 RESET 研究的数据。采用分层随机抽样法将数据分为训练集和测试集(70:30 分割比)。输入的临床信息包括年龄、性别、肿瘤分期、吸烟、表现状态、转移、转移类型、剂量调整以及 EGFR 突变的病理信息。使用 8 种 ML 算法(逻辑回归、决策树、深度神经网络、随机森林、支持向量机、提升、装袋和朴素贝叶斯分类器)进行训练。基于敏感性、特异性和准确性评估模型性能。使用 DeLong 检验比较 ML 模型之间的接收器操作特征曲线(ROC)下面积(AUC)。使用 Kaplan-Meier(KM)曲线可视化从 ML 模型中获得的识别层。
结果:训练集和测试数据集之间输入变量无显著差异。支持向量机在预测阿法替尼 1 年续用方面表现最佳(AUC 0.626),决策树在阿法替尼起始后 2 年生存方面表现最佳(AUC 0.644),尽管 ML 模型的性能相当,且没有表现出任何预测作用。KM 分析和对数秩检验显示,从 ML 模型中识别出的层在治疗时间(TOT)和总生存期(OS)方面存在显著差异(p<0.001)。
结论:尽管 KM 分析中识别出的层在 TOT 和 OS 方面存在差异,但我们研究中的 ML 模型在预测阿法替尼相关结果方面没有发现明显的作用。这意味着 ML 在预测生存结果方面的优势,以及电子病历变量在 ML 算法中的局限性。仔细考虑变量的纳入可能会提高总体模型性能。
Asian Pac J Cancer Prev. 2021-5-1
Cochrane Database Syst Rev. 2016-5-25
Am J Respir Crit Care Med. 2021-8-15
Tuberc Respir Dis (Seoul). 2021-4
Clin Kidney J. 2020-11-24