Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN 55414, USA.
Department of Computer Science, The University of Illinois at Chicago, Chicago, IL 60612, USA.
Biostatistics. 2023 Apr 14;24(2):295-308. doi: 10.1093/biostatistics/kxab022.
Support vector regression (SVR) is particularly beneficial when the outcome and predictors are nonlinearly related. However, when many covariates are available, the method's flexibility can lead to overfitting and an overall loss in predictive accuracy. To overcome this drawback, we develop a feature selection method for SVR based on a genetic algorithm that iteratively searches across potential subsets of covariates to find those that yield the best performance according to a user-defined fitness function. We evaluate the performance of our feature selection method for SVR, comparing it to alternate methods including LASSO and random forest, in a simulation study. We find that our method yields higher predictive accuracy than SVR without feature selection. Our method outperforms LASSO when the relationship between covariates and outcome is nonlinear. Random forest performs equivalently to our method in some scenarios, but more poorly when covariates are correlated. We apply our method to predict donor kidney function 1 year after transplant using data from the United Network for Organ Sharing national registry.
支持向量回归(Support Vector Regression,SVR)在结局和预测因子呈非线性相关时特别有用。然而,当有许多协变量可用时,该方法的灵活性可能导致过拟合和整体预测准确性的损失。为了克服这一缺点,我们开发了一种基于遗传算法的 SVR 特征选择方法,该方法可以根据用户定义的适应度函数,迭代地搜索潜在的协变量子集,以找到表现最佳的子集。我们在模拟研究中评估了我们的 SVR 特征选择方法的性能,并将其与 LASSO 和随机森林等替代方法进行了比较。我们发现,与没有特征选择的 SVR 相比,我们的方法具有更高的预测准确性。当协变量与结局之间的关系是非线性时,我们的方法优于 LASSO。在某些情况下,随机森林的性能与我们的方法相当,但在协变量相关时,表现较差。我们应用我们的方法来预测美国器官共享网络全国登记处的数据中移植后 1 年供体肾脏的功能。