Department of Digital Health, Samsung Advanced Institute of Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea.
Medical AI Research Center, Data Science Research Institute, Research Institute for Future Medicine, Samsung Medical Center, Seoul, Republic of Korea.
Sci Rep. 2024 Nov 26;14(1):29297. doi: 10.1038/s41598-024-80863-8.
There are several important challenges in radiomics research; one of them is feature selection. Since many quantitative features are non-informative, feature selection becomes essential. Feature selection methods have been mixed with filter, wrapper, and embedded methods without a rule of thumb. This study aims to develop a framework for optimal feature selection in radiomics research. We developed the framework that the optimal features were selected to quickly through controlling relevance and redundancy among features. A 'FeatureMap' was generated containing information for each step and used as a platform. Through this framework, we can explore the optimal combination of radiomics features and evaluate the predictive performance using only selected features. We assessed the framework using four real datasets. The FeatureMap generated 6 combinations, with the number of features selected varying for each combination. The predictive models obtained high performances; the highest test area under the curves (AUCs) were 0.792, 0.820, 0.846 and 0.738 in the cross-validation method, respectively. We developed a flexible framework for feature selection methods in radiomics research and assessed its usefulness using various real-world data. Our framework can assist clinicians in efficiently developing predictive models based on radiomics.
在放射组学研究中有几个重要的挑战;其中之一是特征选择。由于许多定量特征是无信息的,因此特征选择变得至关重要。特征选择方法已经与过滤、包装和嵌入式方法混合在一起,但没有一个经验法则。本研究旨在开发一个放射组学研究中最优特征选择的框架。我们开发了一个框架,通过控制特征之间的相关性和冗余性,快速选择最优特征。生成了一个包含每个步骤信息的“FeatureMap”,并用作平台。通过这个框架,我们可以探索放射组学特征的最佳组合,并仅使用选择的特征来评估预测性能。我们使用四个真实数据集评估了该框架。FeatureMap 生成了 6 种组合,每种组合选择的特征数量不同。获得的预测模型表现出很高的性能;在交叉验证方法中,最高的测试曲线下面积(AUC)分别为 0.792、0.820、0.846 和 0.738。我们开发了一个灵活的框架,用于放射组学研究中的特征选择方法,并使用各种真实数据评估了其有用性。我们的框架可以帮助临床医生根据放射组学数据有效地开发预测模型。