Suppr超能文献

对样本外预测进行自抽样以实现高效且准确的交叉验证。

Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation.

作者信息

Tsamardinos Ioannis, Greasidou Elissavet, Borboudakis Giorgos

机构信息

Computer Science Department, University of Crete and Gnosis Data Analysis PC, Heraklion, Greece.

出版信息

Mach Learn. 2018;107(12):1895-1922. doi: 10.1007/s10994-018-5714-4. Epub 2018 May 9.

Abstract

Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV's main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation (Varma and Simon in BMC Bioinform 7(1):91, 2006) and a method by Tibshirani and Tibshirani (Ann Appl Stat 822-829, 2009), BBC-CV is computationally more efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based statistical criterion we stop training of models on new folds of inferior (with high probability) configurations. We name the method Bootstrap Bias Corrected with Dropping CV (BBCD-CV) that is both efficient and provides accurate performance estimates.

摘要

一般来说,交叉验证(CV)和样本外性能估计协议通常用于:(a)选择算法和超参数值的最优组合(称为配置)以生成最终预测模型;(b)估计最终模型的预测性能。然而,最佳配置的交叉验证性能存在乐观偏差。我们提出了一种有效的自举方法来校正这种偏差,称为自举偏差校正交叉验证(BBC-CV)。BBC-CV的主要思想是对在每个配置的样本外预测上选择性能最佳配置的整个过程进行自举,而无需对模型进行额外训练。与其他方法相比,即嵌套交叉验证(Varma和Simon,《BMC生物信息学》,2006年,7(1):91)以及Tibshirani和Tibshirani提出的一种方法(《应用统计年鉴》,822 - 829页,2009年),BBC-CV在计算上更高效,方差和偏差更小,并且适用于任何性能指标(准确率、AUC、一致性指数、均方误差)。随后,我们再次采用对样本外预测进行自举的思想来加速交叉验证过程。具体而言,使用基于自举的统计标准,我们停止在(大概率)较差配置的新折叠上对模型进行训练。我们将这种方法命名为带丢弃交叉验证的自举偏差校正(BBCD-CV),它既高效又能提供准确的性能估计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cb6/6191021/927371a4f56e/10994_2018_5714_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验