Yang Fengchang, Chen Wei, Wei Haifeng, Zhang Xianru, Yuan Shuanghu, Qiao Xu, Chen Yen-Wei
Department of Radiology, Shandong Cancer Hospital and Institute, Cheeloo College of Medicine, Shandong University, Jinan, China.
Department of Implantology, School and Hospital of Stomatology, Cheeloo College of Medicine, Shandong University, Jinan, China.
Front Oncol. 2021 Jan 14;10:608598. doi: 10.3389/fonc.2020.608598. eCollection 2020.
Histologic phenotype identification of Non-Small Cell Lung Cancer (NSCLC) is essential for treatment planning and prognostic prediction. The prediction model based on radiomics analysis has the potential to quantify tumor phenotypic characteristics non-invasively. However, most existing studies focus on relatively small datasets, which limits the performance and potential clinical applicability of their constructed models.
To fully explore the impact of different datasets on radiomics studies related to the classification of histological subtypes of NSCLC, we retrospectively collected three datasets from multi-centers and then performed extensive analysis. Each of the three datasets was used as the training dataset separately to build a model and was validated on the remaining two datasets. A model was then developed by merging all the datasets into a large dataset, which was randomly split into a training dataset and a testing dataset. For each model, a total of 788 radiomic features were extracted from the segmented tumor volumes. Then three widely used features selection methods, including minimum Redundancy Maximum Relevance Feature Selection (mRMR), Sequential Forward Selection (SFS), and Least Absolute Shrinkage and Selection Operator (LASSO) were used to select the most important features. Finally, three classification methods, including Logistics Regression (LR), Support Vector Machines (SVM), and Random Forest (RF) were independently evaluated on the selected features to investigate the prediction ability of the radiomics models.
When using a single dataset for modeling, the results on the testing set were poor, with AUC values ranging from 0.54 to 0.64. When the merged dataset was used for modeling, the average AUC value in the testing set was 0.78, showing relatively good predictive performance.
Models based on radiomics analysis have the potential to classify NSCLC subtypes, but their generalization capabilities should be carefully considered.
非小细胞肺癌(NSCLC)的组织学表型鉴定对于治疗方案规划和预后预测至关重要。基于放射组学分析的预测模型有潜力非侵入性地量化肿瘤表型特征。然而,大多数现有研究集中在相对较小的数据集上,这限制了其构建模型的性能和潜在临床适用性。
为了充分探索不同数据集对与NSCLC组织学亚型分类相关的放射组学研究的影响,我们回顾性地从多中心收集了三个数据集,然后进行了广泛分析。三个数据集中的每一个都分别用作训练数据集来构建模型,并在其余两个数据集上进行验证。然后通过将所有数据集合并成一个大数据集来开发一个模型,该大数据集被随机拆分为一个训练数据集和一个测试数据集。对于每个模型,从分割的肿瘤体积中总共提取788个放射组学特征。然后使用三种广泛使用的特征选择方法,包括最小冗余最大相关特征选择(mRMR)、顺序前向选择(SFS)和最小绝对收缩和选择算子(LASSO)来选择最重要的特征。最后,对所选特征独立评估三种分类方法,包括逻辑回归(LR)、支持向量机(SVM)和随机森林(RF),以研究放射组学模型的预测能力。
当使用单个数据集进行建模时,测试集上的结果较差,AUC值范围为0.54至0.64。当使用合并数据集进行建模时,测试集中的平均AUC值为0.78,显示出相对较好的预测性能。
基于放射组学分析构建的模型有潜力对NSCLC亚型进行分类,但其泛化能力应谨慎考虑。