Suppr超能文献

交叉验证在自适应套索校准中的应用。

On the use of cross-validation for the calibration of the adaptive lasso.

机构信息

Univ Lyon, Univ Eiffel, IFSTTAR, Univ Lyon 1, UMRESTTE, Bron, France.

Institut Camille Jordan, Université Claude Bernard Lyon 1, Lyon, France.

出版信息

Biom J. 2023 Jun;65(5):e2200047. doi: 10.1002/bimj.202200047. Epub 2023 Mar 23.

Abstract

Cross-validation is the standard method for hyperparameter tuning, or calibration, of machine learning algorithms. The adaptive lasso is a popular class of penalized approaches based on weighted L -norm penalties, with weights derived from an initial estimate of the model parameter. Although it violates the paramount principle of cross-validation, according to which no information from the hold-out test set should be used when constructing the model on the training set, a "naive" cross-validation scheme is often implemented for the calibration of the adaptive lasso. The unsuitability of this naive cross-validation scheme in this context has not been well documented in the literature. In this work, we recall why the naive scheme is theoretically unsuitable and how proper cross-validation should be implemented in this particular context. Using both synthetic and real-world examples and considering several versions of the adaptive lasso, we illustrate the flaws of the naive scheme in practice. In particular, we show that it can lead to the selection of adaptive lasso estimates that perform substantially worse than those selected via a proper scheme in terms of both support recovery and prediction error. In other words, our results show that the theoretical unsuitability of the naive scheme translates into suboptimality in practice, and call for abandoning it.

摘要

交叉验证是机器学习算法的超参数调整或校准的标准方法。自适应套索是一种基于加权 L-范数惩罚的流行惩罚方法,权重来自于模型参数的初始估计。尽管它违反了交叉验证的首要原则,即不应该在训练集上构建模型时使用来自验证集的信息,但在自适应套索的校准中,通常会实现一种“朴素”的交叉验证方案。在文献中,没有很好地记录这种朴素交叉验证方案在这种情况下的不适用性。在这项工作中,我们回顾了为什么朴素方案在理论上是不合适的,以及在这种特殊情况下应该如何正确实施交叉验证。我们使用合成和真实世界的例子,并考虑了自适应套索的几个版本,说明了朴素方案在实践中的缺陷。特别是,我们表明它可能导致选择自适应套索估计,这些估计在支持恢复和预测误差方面的性能明显逊于通过适当方案选择的估计。换句话说,我们的结果表明,朴素方案的理论不适用性转化为实践中的次优性,并呼吁放弃它。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验