Zhang Hongjian, Fan Xiao, Zhang Junxia, Wei Zhiyuan, Feng Wei, Hu Yifang, Ni Jiaying, Yao Fushen, Zhou Gaoxin, Wan Cheng, Zhang Xin, Wang Junjie, Liu Yun, You Yongping, Yu Yun
Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China.
Department of Neurosurgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu, China.
Front Oncol. 2023 Aug 30;13:1143688. doi: 10.3389/fonc.2023.1143688. eCollection 2023.
In adult diffuse glioma, preoperative detection of isocitrate dehydrogenase () status helps clinicians develop surgical strategies and evaluate patient prognosis. Here, we aim to identify an optimal machine-learning model for prediction of genotyping by combining deep-learning (DL) signatures and conventional radiomics (CR) features as model predictors.
In this study, a total of 486 patients with adult diffuse gliomas were retrospectively collected from our medical center (n=268) and the public database (TCGA, n=218). All included patients were randomly divided into the training and validation sets by using nested 10-fold cross-validation. A total of 6,736 CR features were extracted from four MRI modalities in each patient, namely T1WI, T1CE, T2WI, and FLAIR. The LASSO algorithm was performed for CR feature selection. In each MRI modality, we applied a CNN+LSTM-based neural network to extract DL features and integrate these features into a DL signature after the fully connected layer with sigmoid activation. Eight classic machine-learning models were analyzed and compared in terms of their prediction performance and stability in genotyping by combining the LASSO-selected CR features and integrated DL signatures as model predictors. In the validation sets, the prediction performance was evaluated by using accuracy and the area under the curve (AUC) of the receiver operating characteristics, while the model stability was analyzed by using the relative standard deviation of the AUC (RSD). Subgroup analyses of DL signatures and CR features were also individually conducted to explore their independent prediction values.
Logistic regression (LR) achieved favorable prediction performance (AUC: 0.920 ± 0.043, accuracy: 0.843 ± 0.044), whereas support vector machine with the linear kernel (l-SVM) displayed low prediction performance (AUC: 0.812 ± 0.052, accuracy: 0.821 ± 0.050). With regard to stability, LR also showed high robustness against data perturbation (RSD: 4.7%). Subgroup analyses showed that DL signatures outperformed CR features (DL, AUC: 0.915 ± 0.054, accuracy: 0.835 ± 0.061, RSD: 5.9%; CR, AUC: 0.830 ± 0.066, accuracy: 0.771 ± 0.051, RSD: 8.0%), while DL and DL+CR achieved similar prediction results.
In genotyping, LR is a promising machine-learning classification model. Compared with CR features, DL signatures exhibit markedly superior prediction values and discriminative capability.
在成人弥漫性胶质瘤中,术前检测异柠檬酸脱氢酶(IDH)状态有助于临床医生制定手术策略并评估患者预后。在此,我们旨在通过结合深度学习(DL)特征和传统放射组学(CR)特征作为模型预测因子,确定一种用于预测IDH基因分型的最佳机器学习模型。
在本研究中,我们从我们的医学中心(n = 268)和公共数据库(TCGA,n = 218)中回顾性收集了总共486例成人弥漫性胶质瘤患者。所有纳入的患者通过使用嵌套10折交叉验证被随机分为训练集和验证集。在每个患者中,从四种MRI模态(即T1WI、T1CE、T2WI和FLAIR)中提取总共6736个CR特征。对CR特征进行LASSO算法选择。在每种MRI模态中,我们应用基于CNN + LSTM的神经网络提取DL特征,并在具有 sigmoid激活的全连接层之后将这些特征整合到一个DL特征中。通过结合LASSO选择的CR特征和整合的DL特征作为模型预测因子,分析并比较了八个经典机器学习模型在IDH基因分型中的预测性能和稳定性。在验证集中,通过使用接受者操作特征曲线下面积(AUC)和准确性来评估预测性能,而通过使用AUC的相对标准差(RSD)来分析模型稳定性。还分别对DL特征和CR特征进行了亚组分析,以探索它们的独立预测价值。
逻辑回归(LR)取得了良好的预测性能(AUC:0.920 ± 0.043,准确性:0.843 ± 0.044),而具有线性核的支持向量机(l - SVM)显示出较低的预测性能(AUC:0.812 ± 0.052,准确性:0.821 ± 0.050)。关于稳定性,LR对数据扰动也显示出高稳健性(RSD:4.7%)。亚组分析表明,DL特征优于CR特征(DL,AUC:0.915 ± 0.054,准确性:0.835 ± 0.061,RSD:5.9%;CR,AUC:0.830 ± 0.066,准确性:0.771 ± 0.051,RSD:8.0%),而DL和DL + CR取得了相似的预测结果。
在IDH基因分型中,LR是一个有前景的机器学习分类模型。与CR特征相比,DL特征表现出明显更优的预测价值和判别能力。