Suppr超能文献

交叉验证袋装学习

Cross-Validated Bagged Learning.

作者信息

Petersen Maya L, Molinaro Annette M, Sinisi Sandra E, van der Laan Mark J

机构信息

Division of Biostatistics, University of California, Berkeley, School of Public Health, Earl Warren Hall 7360 Berkeley, California 94720-7360, phone: 510.642.3241 fax: 510.643.5163.

出版信息

J Multivar Anal. 2008 Mar;25(2):260-266. doi: 10.1016/j.jmva.2007.07.004.

Abstract

Many applications aim to learn a high dimensional parameter of a data generating distribution based on a sample of independent and identically distributed observations. For example, the goal might be to estimate the conditional mean of an outcome given a list of input variables. In this prediction context, bootstrap aggregating (bagging) has been introduced as a method to reduce the variance of a given estimator at little cost to bias. Bagging involves applying an estimator to multiple bootstrap samples, and averaging the result across bootstrap samples. In order to address the curse of dimensionality, a common practice has been to apply bagging to estimators which themselves use cross-validation, thereby using cross-validation within a bootstrap sample to select fine-tuning parameters trading off bias and variance of the bootstrap sample-specific candidate estimators. In this article we point out that in order to achieve the correct bias variance trade-off for the parameter of interest, one should apply the cross-validation selector externally to candidate bagged estimators indexed by these fine-tuning parameters. We use three simulations to compare the new cross-validated bagging method with bagging of cross-validated estimators and bagging of non-cross-validated estimators.

摘要

许多应用旨在基于独立同分布观测值的样本,学习数据生成分布的高维参数。例如,目标可能是在给定输入变量列表的情况下估计结果的条件均值。在这种预测背景下,自助聚合(装袋)已被引入作为一种以较小偏差代价降低给定估计器方差的方法。装袋涉及将估计器应用于多个自助样本,并对自助样本的结果进行平均。为了解决维度诅咒问题,一种常见的做法是将装袋应用于本身使用交叉验证的估计器,从而在自助样本内使用交叉验证来选择微调参数,以权衡自助样本特定候选估计器的偏差和方差。在本文中,我们指出,为了实现对感兴趣参数的正确偏差方差权衡,应该在外部将交叉验证选择器应用于由这些微调参数索引的候选装袋估计器。我们使用三个模拟来比较新的交叉验证装袋方法与交叉验证估计器的装袋以及非交叉验证估计器的装袋。

相似文献

1
Cross-Validated Bagged Learning.
J Multivar Anal. 2008 Mar;25(2):260-266. doi: 10.1016/j.jmva.2007.07.004.
3
Bagged filters for partially observed interacting systems.
J Am Stat Assoc. 2023;118(542):1078-1089. doi: 10.1080/01621459.2021.1974867. Epub 2021 Oct 4.
4
Collaborative double robust targeted maximum likelihood estimation.
Int J Biostat. 2010 May 17;6(1):Article 17. doi: 10.2202/1557-4679.1181.
5
Targeted estimation of nuisance parameters to obtain valid statistical inference.
Int J Biostat. 2014;10(1):29-57. doi: 10.1515/ijb-2012-0038.
6
Bagged random causal networks for interventional queries on observational biomedical datasets.
J Biomed Inform. 2021 Mar;115:103689. doi: 10.1016/j.jbi.2021.103689. Epub 2021 Feb 4.
7
Double Robust Efficient Estimators of Longitudinal Treatment Effects: Comparative Performance in Simulations and a Case Study.
Int J Biostat. 2019 Feb 26;15(2):/j/ijb.2019.15.issue-2/ijb-2017-0054/ijb-2017-0054.xml. doi: 10.1515/ijb-2017-0054.
8
Bagging tree classifiers for laser scanning images: a data- and simulation-based strategy.
Artif Intell Med. 2003 Jan;27(1):65-79. doi: 10.1016/s0933-3657(02)00085-4.
10
Selecting Shrinkage Parameters for Effect Estimation: The Multi-Ethnic Study of Atherosclerosis.
Am J Epidemiol. 2018 Feb 1;187(2):358-365. doi: 10.1093/aje/kwx225.

引用本文的文献

1
Formulation Graphs for Mapping Structure-Composition of Battery Electrolytes to Device Performance.
J Chem Inf Model. 2023 Nov 27;63(22):6998-7010. doi: 10.1021/acs.jcim.3c01030. Epub 2023 Nov 10.
2
Machine learning-based prediction of motor status in glioma patients using diffusion MRI metrics along the corticospinal tract.
Brain Commun. 2022 May 27;4(3):fcac141. doi: 10.1093/braincomms/fcac141. eCollection 2022.
3
SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts.
Comput Struct Biotechnol J. 2022 Apr 4;20:1567-1579. doi: 10.1016/j.csbj.2022.03.036. eCollection 2022.

本文引用的文献

1
Super learning: an application to the prediction of HIV-1 drug resistance.
Stat Appl Genet Mol Biol. 2007;6:Article7. doi: 10.2202/1544-6115.1240. Epub 2007 Feb 23.
2
Multiple testing and data adaptive regression: an application to HIV-1 sequence data.
Stat Appl Genet Mol Biol. 2005;4:Article8. doi: 10.2202/1544-6115.1110. Epub 2005 Apr 18.
3
Asymptotic optimality of likelihood-based cross-validation.
Stat Appl Genet Mol Biol. 2004;3:Article4. doi: 10.2202/1544-6115.1036. Epub 2004 Mar 22.
4
Deletion/substitution/addition algorithm in learning with applications in genomics.
Stat Appl Genet Mol Biol. 2004;3:Article18. doi: 10.2202/1544-6115.1069. Epub 2004 Aug 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验