Demircioğlu Aydin
Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, 45147, Essen, Germany.
Sci Rep. 2025 Sep 5;15(1):32368. doi: 10.1038/s41598-025-16070-w.
In radiomics, feature selection methods are primarily used to eliminate redundant features and identify relevant ones. Feature projection methods, such as principal component analysis (PCA), are often avoided due to concerns that recombining features may compromise interpretability. However, since most radiomic features lack inherent semantic meaning, prioritizing interpretability over predictive performance may not be justified. This study investigates whether feature projection methods can improve predictive performance compared to feature selection, as measured by the area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (AUPRC), and the F1, F0.5 and F2 scores. Models were trained on a large collection of 50 binary classification radiomic datasets derived from CT and MRI of various organs and representing different clinical outcomes. Evaluation was performed using nested, stratified 5-fold cross-validation with 10 repeats. Nine feature projection methods, including PCA, Kernel PCA, and Non-Negative Matrix Factorization (NMF), were compared to nine selection methods, such as Minimum Redundancy Maximum Relevance (MRMRe), Extremely Randomized Trees (ET), and LASSO, using four classifiers. The results showed that selection methods, particularly ET, MRMRe, Boruta, and LASSO, achieved the highest overall performance. Importantly, performance varied considerably across datasets, and some projection methods, such as NMF, occasionally outperformed all selection methods on individual datasets, indicating their potential utility. However, the average difference between selection methods and projection methods across all datasets was negligible and statistically insignificant, suggesting that both perform similarly based solely on methodological considerations. These findings support the notion that, in a typical radiomics study, selection methods should remain the primary approach but also emphasize the importance of considering projection methods in order to achieve the highest performance.
在放射组学中,特征选择方法主要用于消除冗余特征并识别相关特征。由于担心重新组合特征可能会损害可解释性,人们通常避免使用特征投影方法,如主成分分析(PCA)。然而,由于大多数放射组学特征缺乏内在语义含义,将可解释性置于预测性能之上可能并不合理。本研究调查了与特征选择相比,特征投影方法是否能提高预测性能,通过接收者操作特征曲线(AUC)下的面积、精确召回率曲线(AUPRC)下的面积以及F1、F0.5和F2分数来衡量。模型在大量由各种器官的CT和MRI数据衍生而来、代表不同临床结果的50个二元分类放射组学数据集上进行训练。使用嵌套的分层5折交叉验证并重复10次进行评估。将包括PCA、核主成分分析(Kernel PCA)和非负矩阵分解(NMF)在内的9种特征投影方法与9种选择方法,如最小冗余最大相关性(MRMRe)、极端随机树(ET)和套索回归(LASSO),使用四种分类器进行比较。结果表明,选择方法,特别是ET、MRMRe、Boruta和LASSO,总体性能最高。重要的是,不同数据集的性能差异很大,一些投影方法,如NMF,在个别数据集上偶尔会优于所有选择方法,表明它们具有潜在的效用。然而,所有数据集上选择方法和投影方法之间的平均差异可以忽略不计且无统计学意义,这表明仅从方法学考虑,两者的表现相似。这些发现支持了这样一种观点,即在典型的放射组学研究中,选择方法应仍然是主要方法,但也强调了考虑投影方法以实现最高性能的重要性。