Feng Youdan, Song Fan, Zhang Peng, Fan Guangda, Zhang Tianyi, Zhao Xiangyu, Ma Chenbin, Sun Yangyang, Song Xiao, Pu Huangsheng, Liu Fei, Zhang Guanglei
Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China.
School of Medical Imaging, Shanxi Medical University, Taiyuan, China.
Front Pharmacol. 2022 Jun 27;13:897597. doi: 10.3389/fphar.2022.897597. eCollection 2022.
We aimed to identify whether ensemble learning can improve the performance of the epidermal growth factor receptor (EGFR) mutation status predicting model. We retrospectively collected 168 patients with non-small cell lung cancer (NSCLC), who underwent both computed tomography (CT) examination and EGFR test. Using the radiomics features extracted from the CT images, an ensemble model was established with four individual classifiers: logistic regression (LR), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost). The synthetic minority oversampling technique (SMOTE) was also used to decrease the influence of data imbalance. The performances of the predicting model were evaluated using the area under the curve (AUC). Based on the 26 radiomics features after feature selection, the SVM performed best (AUCs of 0.8634 and 0.7885 on the training and test sets, respectively) among four individual classifiers. The ensemble model of RF, XGBoost, and LR achieved the best performance (AUCs of 0.8465 and 0.8654 on the training and test sets, respectively). Ensemble learning can improve the model performance in predicting the EGFR mutation status of patients with NSCLC, showing potential value in clinical practice.
我们旨在确定集成学习是否能够提高表皮生长因子受体(EGFR)突变状态预测模型的性能。我们回顾性收集了168例非小细胞肺癌(NSCLC)患者,这些患者均接受了计算机断层扫描(CT)检查和EGFR检测。利用从CT图像中提取的放射组学特征,建立了一个包含四个独立分类器的集成模型:逻辑回归(LR)、支持向量机(SVM)、随机森林(RF)和极端梯度提升(XGBoost)。还使用了合成少数过采样技术(SMOTE)来减少数据不平衡的影响。使用曲线下面积(AUC)评估预测模型的性能。基于特征选择后的26个放射组学特征,在四个独立分类器中,SVM表现最佳(训练集和测试集的AUC分别为0.8634和0.7885)。RF、XGBoost和LR的集成模型取得了最佳性能(训练集和测试集的AUC分别为0.8465和0.8654)。集成学习可以提高预测NSCLC患者EGFR突变状态的模型性能,在临床实践中显示出潜在价值。