Hao Peng, Deng Bo-Yu, Huang Chan-Tao, Xu Jun, Zhou Fang, Liu Zhe-Xing, Zhou Wu, Xu Yi-Kai
Nanfang Hospital, Southern Medical University, Guangzhou, China.
School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China.
Front Oncol. 2022 Oct 20;12:994285. doi: 10.3389/fonc.2022.994285. eCollection 2022.
To develop an appropriate machine learning model for predicting anaplastic lymphoma kinase (ALK) rearrangement status in non-small cell lung cancer (NSCLC) patients using computed tomography (CT) images and clinical features.
This study included 193 patients with NSCLC (154 in the training cohort, 39 in the validation cohort), 68 of whom tested positive for ALK rearrangements and 125 of whom tested negative. From the nonenhanced CT scans, 157 radiomic characteristics were extracted, and 8 clinical features were collected. Five machine learning (ML) models were assessed to find the best classification model for predicting ALK rearrangement status. A radiomic signature was developed using the least absolute shrinkage and selection operator (LASSO) algorithm. The predictive performance of the models based on radiomic features, clinical features, and their combination was assessed by receiver operating characteristic (ROC) curves.
The support vector machine (SVM) model had the highest AUC of 0.914 for classification. The clinical features model had an AUC=0.805 (95% CI 0.731-0.877) and an AUC=0.735 (95% CI 0.566-0.863) in the training and validation cohorts, respectively. The CT image-based ML model had an AUC=0.953 (95% CI 0.913-1.0) in the training cohort and an AUC=0.890 (95% CI 0.778-0.971) in the validation cohort. For predicting ALK rearrangement status, the ML model based on CT images and clinical features performed better than the model based on only clinical information or CT images, with an AUC of 0.965 (95% CI 0.826-0.882) in the primary cohort and an AUC of 0.914 (95% CI 0.804-0.893) in the validation cohort.
Our findings revealed that ALK rearrangement status could be accurately predicted using an ML-based classification model based on CT images and clinical data.
利用计算机断层扫描(CT)图像和临床特征开发一种合适的机器学习模型,用于预测非小细胞肺癌(NSCLC)患者的间变性淋巴瘤激酶(ALK)重排状态。
本研究纳入了193例NSCLC患者(训练队列154例,验证队列39例),其中68例ALK重排检测呈阳性,125例呈阴性。从非增强CT扫描中提取了157个影像组学特征,并收集了8个临床特征。评估了五种机器学习(ML)模型,以找到预测ALK重排状态的最佳分类模型。使用最小绝对收缩和选择算子(LASSO)算法开发了一种影像组学特征。通过受试者操作特征(ROC)曲线评估基于影像组学特征、临床特征及其组合的模型的预测性能。
支持向量机(SVM)模型的分类AUC最高,为0.914。临床特征模型在训练队列和验证队列中的AUC分别为0.805(95%CI 0.731-0.877)和0.735(95%CI 0.566-0.863)。基于CT图像的ML模型在训练队列中的AUC为0.953(95%CI 0.913-1.0),在验证队列中的AUC为0.890(95%CI 0.778-0.971)。对于预测ALK重排状态,基于CT图像和临床特征的ML模型比仅基于临床信息或CT图像的模型表现更好,在主要队列中的AUC为0.965(95%CI 0.826-0.882),在验证队列中的AUC为0.914(95%CI 0.804-0.893)。
我们的研究结果表明,基于CT图像和临床数据的基于ML的分类模型可以准确预测ALK重排状态。