Shanghai Key Laboratory of Molecular Imaging, Shanghai University of Medicine and Health Sciences, Shanghai, China.
Department of nuclear medicine, Fudan University Shanghai Cancer Center, Shanghai, China.
Med Phys. 2024 Jul;51(7):4872-4887. doi: 10.1002/mp.16947. Epub 2024 Jan 29.
Accurate, noninvasive, and reliable assessment of epidermal growth factor receptor (EGFR) mutation status and EGFR molecular subtypes is essential for treatment plan selection and individualized therapy in lung adenocarcinoma (LUAD). Radiomics models based on F-FDG PET/CT have great potential in identifying EGFR mutation status and EGFR subtypes in patients with LUAD. The validation of multi-center data, model visualization, and interpretation are significantly important for the management, application and trust of machine learning predictive models. However, few EGFR-related research involved model visualization and interpretation, and multi-center trial.
To develop explainable optimal predictive models based on handcrafted radiomics features (HRFs) extracted from multi-center F-FDG PET/CT to predict EGFR mutation status and molecular subtypes in LUAD.
Baseline F-FDG PET/CT images of 383 LUAD patients from three hospitals and one public data set were collected. Further, 1808 HRFs were extracted from the primary tumor regions using Pyradiomics. Predictive models were built based on cross-combination of seven feature selection methods and seven machine learning algorithms. Yellowbrick and explainable artificial intelligence technology were used for model visualization and interpretation. Receiver operating characteristic curve, classification report and confusion matrix were used for model performance evaluation. Clinical applicability of the optimal models was assessed by decision curve analysis.
STACK feature selection method combined with light gradient boosting machine (LGBM) reached optimal performance in identifying EGFR mutation status ([area under the curve] AUC = 0.81 in the internal test cohort; AUC = 0.62 in the external test cohort). Random forest feature selection method combined with LGBM reached optimal performance in predicting EGFR mutation molecular subtypes (AUC = 0.89 in the internal test cohort; AUC = 0.61 in the external test cohort).
Explainable machine learning models combined with radiomics features extracted from multi-center/scanner F-FDG PET/CT have certain potential to identify EGFR mutation status and subtypes in LUAD, which might be helpful to the treatment of LUAD.
准确、无创和可靠的表皮生长因子受体(EGFR)突变状态和 EGFR 分子亚型评估对于肺腺癌(LUAD)的治疗方案选择和个体化治疗至关重要。基于 F-FDG PET/CT 的放射组学模型在识别 LUAD 患者的 EGFR 突变状态和 EGFR 亚型方面具有很大的潜力。多中心数据的验证、模型可视化和解释对于机器学习预测模型的管理、应用和信任非常重要。然而,涉及 EGFR 相关研究的模型可视化和解释以及多中心试验的研究很少。
开发基于多中心 F-FDG PET/CT 提取的手工放射组学特征(HRFs)的可解释最优预测模型,以预测 LUAD 中的 EGFR 突变状态和分子亚型。
收集了来自三家医院和一个公共数据集的 383 例 LUAD 患者的基线 F-FDG PET/CT 图像。进一步使用 Pyradiomics 从原发肿瘤区域提取了 1808 个 HRFs。基于七种特征选择方法和七种机器学习算法的交叉组合构建预测模型。使用 Yellowbrick 和可解释人工智能技术进行模型可视化和解释。使用受试者工作特征曲线、分类报告和混淆矩阵评估模型性能。通过决策曲线分析评估最优模型的临床适用性。
STACK 特征选择方法与轻梯度提升机(LGBM)相结合,在识别 EGFR 突变状态方面表现出最佳性能(内部测试队列的曲线下面积[AUC]为 0.81;外部测试队列的 AUC 为 0.62)。随机森林特征选择方法与 LGBM 相结合,在预测 EGFR 突变分子亚型方面表现出最佳性能(内部测试队列的 AUC 为 0.89;外部测试队列的 AUC 为 0.61)。
结合多中心/扫描仪 F-FDG PET/CT 提取的放射组学特征的可解释机器学习模型具有一定的潜力,可以识别 LUAD 中的 EGFR 突变状态和亚型,这可能有助于 LUAD 的治疗。