Suppr超能文献

回归模型选择的比较。

Comparison of model selection for regression.

作者信息

Cherkassky Vladimir, Ma Yunqian

机构信息

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA.

出版信息

Neural Comput. 2003 Jul;15(7):1691-714. doi: 10.1162/089976603321891864.

Abstract

We discuss empirical comparison of analytical methods for model selection. Currently, there is no consensus on the best method for finite-sample estimation problems, even for the simple case of linear estimators. This article presents empirical comparisons between classical statistical methods - Akaike information criterion (AIC) and Bayesian information criterion (BIC) - and the structural risk minimization (SRM) method, based on Vapnik-Chervonenkis (VC) theory, for regression problems. Our study is motivated by empirical comparisons in Hastie, Tibshirani, and Friedman (2001), which claims that the SRM method performs poorly for model selection and suggests that AIC yields superior predictive performance. Hence, we present empirical comparisons for various data sets and different types of estimators (linear, subset selection, and k-nearest neighbor regression). Our results demonstrate the practical advantages of VC-based model selection; it consistently outperforms AIC for all data sets. In our study, SRM and BIC methods show similar predictive performance. This discrepancy (between empirical results obtained using the same data) is caused by methodological drawbacks in Hastie et al. (2001), especially in their loose interpretation and application of SRM method. Hence, we discuss methodological issues important for meaningful comparisons and practical application of SRM method. We also point out the importance of accurate estimation of model complexity (VC-dimension) for empirical comparisons and propose a new practical estimate of model complexity for k-nearest neighbors regression.

摘要

我们讨论了用于模型选择的分析方法的实证比较。目前,对于有限样本估计问题的最佳方法尚无共识,即使是在线性估计量的简单情况下也是如此。本文基于Vapnik-Chervonenkis(VC)理论,针对回归问题,给出了经典统计方法——赤池信息准则(AIC)和贝叶斯信息准则(BIC)——与结构风险最小化(SRM)方法之间的实证比较。我们的研究受到Hastie、Tibshirani和Friedman(2001)中实证比较的启发,该文献称SRM方法在模型选择方面表现不佳,并表明AIC具有更好的预测性能。因此,我们针对各种数据集和不同类型的估计量(线性、子集选择和k近邻回归)进行了实证比较。我们的结果证明了基于VC的模型选择的实际优势;在所有数据集中,它始终优于AIC。在我们的研究中,SRM和BIC方法显示出相似的预测性能。(使用相同数据获得的实证结果之间的)这种差异是由Hastie等人(2001)的方法缺陷造成的,尤其是他们对SRM方法的宽松解释和应用。因此,我们讨论了对SRM方法进行有意义的比较和实际应用很重要的方法问题。我们还指出了准确估计模型复杂度(VC维数)对于实证比较的重要性,并提出了一种针对k近邻回归的模型复杂度的新的实际估计方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验