Lv Jieqin, Chen Xiaohui, Liu Xinran, Du Dongyang, Lv Wenbing, Lu Lijun, Wu Hubing
School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, China.
Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, China.
Front Oncol. 2022 Jan 28;12:788968. doi: 10.3389/fonc.2022.788968. eCollection 2022.
To develop and validate the imbalanced data correction based PET/CT radiomics model for predicting lymph node metastasis (LNM) in clinical stage T1 lung adenocarcinoma (LUAD).
A total of 183 patients (148/35 non-metastasis/LNM) with pathologically confirmed LUAD were retrospectively included. The cohorts were divided into training vs. validation cohort in a ratio of 7:3. A total of 487 radiomics features were extracted from PET and CT components separately for radiomics model construction. Four clinical features and seven PET/CT radiological features were extracted for traditional model construction. To balance the distribution of majority (non-metastasis) class and minority (LNM) class, the imbalance-adjustment strategies using ten data re-sampling methods were adopted. Three multivariate models (denoted as Traditional, Radiomics, and Combined) were constructed using multivariable logistic regression analysis, where the combined model incorporated all of the significant clinical, radiological, and radiomics features. One hundred times repeated Monte Carlo cross-validation was used to assess the application order of feature selection and imbalance-adjustment strategies in the machine learning pipeline. Prediction performance of each model was evaluated using the area under the receiver operating characteristic curve (AUC) and Geometric mean score (G-mean).
A total of 2 clinical parameters, 2 radiological features, 3 PET, and 5 CT radiomics features were significantly associated with LNM. The combined model with Edited Nearest Neighbors (ENN) re-sampling methods showed strong prediction performance than traditional model or radiomics model with the AUC of 0.94 (95%CI = 0.86-0.97) vs. 0.89 (95%CI = 0.79-0.93), 0.92 (95%CI = 0.85-0.97), and G-mean of 0.88 vs. 0.82, 0.80 in the training cohort, and the AUC of 0.75 (95%CI = 0.57-0.91) vs. 0.68 (95%CI = 0.36-0.83), 0.71 (95%CI = 0.48-0.83) and G-mean of 0.76 vs. 0.64, 0.51 in the validation cohort. The combination of performing feature selection before data re-sampling obtains a better result than the reverse combination (AUC 0.76 ± 0.06 vs. 0.70 ± 0.07, <0.001).
The combined model (consisting of age, histological type, C/T ratio, MATV, and radiomics signature) integrated with ENN re-sampling methods had strong lymph node metastasis prediction performance for imbalance cohorts in clinical stage T1 LUAD. Radiomics signatures extracted from PET/CT images could provide complementary prediction information compared with traditional model.
开发并验证基于不平衡数据校正的PET/CT影像组学模型,用于预测临床I期肺腺癌(LUAD)的淋巴结转移(LNM)。
回顾性纳入183例经病理证实的LUAD患者(148例无转移/35例有LNM)。队列按7:3的比例分为训练组和验证组。分别从PET和CT图像中提取487个影像组学特征用于构建影像组学模型。提取4个临床特征和7个PET/CT放射学特征用于构建传统模型。为平衡多数(无转移)类和少数(有LNM)类的分布,采用了10种数据重采样方法的不平衡调整策略。使用多变量逻辑回归分析构建3个多变量模型(分别记为传统模型、影像组学模型和联合模型),联合模型纳入了所有显著的临床、放射学和影像组学特征。采用100次重复的蒙特卡罗交叉验证来评估特征选择和不平衡调整策略在机器学习流程中的应用顺序。使用受试者操作特征曲线下面积(AUC)和几何平均得分(G-mean)评估每个模型的预测性能。
共有2个临床参数、2个放射学特征、3个PET和5个CT影像组学特征与LNM显著相关。采用编辑最近邻(ENN)重采样方法的联合模型比传统模型或影像组学模型具有更强的预测性能,训练组的AUC为0.94(95%CI = 0.86 - 0.97),而传统模型为0.89(95%CI = 0.79 - 0.93),影像组学模型为0.92(95%CI = 0.85 - 0.97);G-mean分别为0.88、0.82、0.80。验证组的AUC为0.75(95%CI = 0.57 - 0.91),传统模型为0.68(95%CI = 0.36 - 0.83),影像组学模型为0.71(95%CI = 0.48 - 0.83);G-mean分别为0.76、0.64、0.51。在数据重采样前进行特征选择的组合比反向组合获得了更好的结果(AUC 0.76 ± 0.06 vs. 0.70 ± 0.07,<0.001)。
与传统模型相比,联合ENN重采样方法的联合模型(由年龄、组织学类型、C/T比值、MATV和影像组学特征组成)对临床I期LUAD不平衡队列具有很强的淋巴结转移预测性能。从PET/CT图像中提取的影像组学特征可为预测提供补充信息。