CT NSCLC 影像组学与特征选择方法、预测模型及相关因素的不一致性。

Inconsistent CT NSCLC radiomics associated with feature selection methods, predictive models and related factors.

机构信息

Department of Radiology, University of Kentucky, Lexington, KY 40536, United States of America.

出版信息

Phys Med Biol. 2023 Jun 12;68(12). doi: 10.1088/1361-6560/acce1c.

PMID:37072008

Abstract

To investigate potential uncertainties in CT-based non-small cell lung cancer (NSCLC) radiomics associated with feature selection methods, predictive models, and their related factors.. CT images from 496 pre-treatment NSCLC patients were retrospectively retrieved from a GE CT scanner. The original patient cohort (100%) was sampled to generate 25%, 50%, and 75% sub-cohorts to investigate potential impact of cohort size. Radiomic features were extracted from the lung nodule using IBEX. Five feature selection methods (analysis of variance, least absolute shrinkage and selection operator, mutual information, minimum redundancy-maximum relevance, Relief) and seven predictive models (DT-decision tree, RF-random forest, LR-logistic regression, SVC-support vector classifier, KNN-k-nearest neighbor, GB-gradient boost, NB-Naïve-Bayesian) were included for the analysis. Cohort size and cohort composition (i.e. same sized cohorts with partially different patients) were investigated as factors related to feature selection methods. The number of input features and model validation methods (2-, 5-, and 10-fold cross-validation) were investigated for predictive models. Using a two-year survival endpoint, AUC values were calculated for the various combinations.. Features ranked by different feature selection methods are not consistent and dependent on cohort size, even for the same methods. Two methods, Relief and LASSO, select 17 and 14 features from 25 common features to all cohort sizes, respectively, while other 3 feature selection methods have <10 features common to all cohort sizes. Feature rankings also highly depend on minor differences in cohort composition. AUCs for the 2100 tested combinations vary from 0.427 to 0.973. Among them, only 16 combinations achieve an AUC > 0.65. There is no clear path to reliable CT NSCLC radiomics.. The use of different feature selection methods and predictive models can generate inconsistent results. This should be further investigated to improve the reliability of radiomic studies.

摘要

为了研究与特征选择方法、预测模型及其相关因素有关的基于 CT 的非小细胞肺癌（NSCLC）放射组学的潜在不确定性。从 GE CT 扫描仪中回顾性地检索了 496 例预处理 NSCLC 患者的 CT 图像。对原始患者队列（100%）进行采样，生成 25%、50%和 75%的亚队列，以研究队列大小的潜在影响。使用 IBEX 从肺结节中提取放射组学特征。纳入了 5 种特征选择方法（方差分析、最小绝对值收缩和选择算子、互信息、最小冗余-最大相关性、Relief）和 7 种预测模型（DT-决策树、RF-随机森林、LR-逻辑回归、SVC-支持向量分类器、KNN-K 最近邻、GB-梯度提升、NB-朴素贝叶斯）进行分析。将队列大小和队列组成（即具有部分不同患者的相同大小的队列）作为与特征选择方法相关的因素进行了研究。对预测模型的输入特征数量和模型验证方法（2 倍、5 倍和 10 倍交叉验证）进行了研究。使用两年的生存终点，计算了各种组合的 AUC 值。不同特征选择方法排序的特征不一致，并且取决于队列大小，即使是对于相同的方法也是如此。两种方法，Relief 和 LASSO，分别从 25 个共有特征中选择 17 个和 14 个特征到所有队列大小，而其他 3 种特征选择方法在所有队列大小中具有<10 个共有特征。特征排序也高度依赖于队列组成的微小差异。在测试的 2100 种组合中，AUC 值从 0.427 到 0.973 不等。其中，只有 16 种组合的 AUC 大于 0.65。目前还没有明确的途径可以实现可靠的 CT NSCLC 放射组学。使用不同的特征选择方法和预测模型可能会产生不一致的结果。应该进一步研究这些方法，以提高放射组学研究的可靠性。