Demircioğlu Aydin
Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, 45157, Essen, Germany.
Insights Imaging. 2022 Feb 24;13(1):28. doi: 10.1186/s13244-022-01170-2.
In radiomic studies, several models are often trained with different combinations of feature selection methods and classifiers. The features of the best model are usually considered relevant to the problem, and they represent potential biomarkers. Features selected from statistically similarly performing models are generally not studied. To understand the degree to which the selected features of these statistically similar models differ, 14 publicly available datasets, 8 feature selection methods, and 8 classifiers were used in this retrospective study. For each combination of feature selection and classifier, a model was trained, and its performance was measured with AUC-ROC. The best-performing model was compared to other models using a DeLong test. Models that were statistically similar were compared in terms of their selected features.
Approximately 57% of all models analyzed were statistically similar to the best-performing model. Feature selection methods were, in general, relatively unstable (0.58; range 0.35-0.84). The features selected by different models varied largely (0.19; range 0.02-0.42), although the selected features themselves were highly correlated (0.71; range 0.4-0.92).
Feature relevance in radiomics strongly depends on the model used, and statistically similar models will generally identify different features as relevant. Considering features selected by a single model is misleading, and it is often not possible to directly determine whether such features are candidate biomarkers.
在放射组学研究中,通常使用不同的特征选择方法和分类器组合来训练多个模型。最佳模型的特征通常被认为与问题相关,并且它们代表潜在的生物标志物。从统计性能相似的模型中选择的特征一般不会被研究。为了了解这些统计性能相似的模型所选择的特征之间的差异程度,本回顾性研究使用了14个公开可用的数据集、8种特征选择方法和8种分类器。对于特征选择和分类器的每种组合,训练一个模型,并使用AUC-ROC来衡量其性能。使用德龙检验将性能最佳的模型与其他模型进行比较。对统计性能相似的模型在其选择的特征方面进行比较。
所有分析的模型中约57%与性能最佳的模型在统计上相似。一般来说,特征选择方法相对不稳定(0.58;范围0.35 - 0.84)。不同模型选择的特征差异很大(0.19;范围0.02 - 0.42),尽管所选特征本身高度相关(0.71;范围0.4 - 0.92)。
放射组学中的特征相关性很大程度上取决于所使用的模型,并且统计性能相似的模型通常会将不同的特征识别为相关特征。仅考虑单个模型选择的特征会产生误导,而且通常无法直接确定这些特征是否为候选生物标志物。