交叉验证袋装学习

Cross-Validated Bagged Learning.

作者信息

Petersen Maya L, Molinaro Annette M, Sinisi Sandra E, van der Laan Mark J

机构信息

Division of Biostatistics, University of California, Berkeley, School of Public Health, Earl Warren Hall 7360 Berkeley, California 94720-7360, phone: 510.642.3241 fax: 510.643.5163.

出版信息

J Multivar Anal. 2008 Mar;25(2):260-266. doi: 10.1016/j.jmva.2007.07.004.

DOI:10.1016/j.jmva.2007.07.004

PMID:19255599

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2367370/

Abstract

Many applications aim to learn a high dimensional parameter of a data generating distribution based on a sample of independent and identically distributed observations. For example, the goal might be to estimate the conditional mean of an outcome given a list of input variables. In this prediction context, bootstrap aggregating (bagging) has been introduced as a method to reduce the variance of a given estimator at little cost to bias. Bagging involves applying an estimator to multiple bootstrap samples, and averaging the result across bootstrap samples. In order to address the curse of dimensionality, a common practice has been to apply bagging to estimators which themselves use cross-validation, thereby using cross-validation within a bootstrap sample to select fine-tuning parameters trading off bias and variance of the bootstrap sample-specific candidate estimators. In this article we point out that in order to achieve the correct bias variance trade-off for the parameter of interest, one should apply the cross-validation selector externally to candidate bagged estimators indexed by these fine-tuning parameters. We use three simulations to compare the new cross-validated bagging method with bagging of cross-validated estimators and bagging of non-cross-validated estimators.

摘要

许多应用旨在基于独立同分布观测值的样本，学习数据生成分布的高维参数。例如，目标可能是在给定输入变量列表的情况下估计结果的条件均值。在这种预测背景下，自助聚合（装袋）已被引入作为一种以较小偏差代价降低给定估计器方差的方法。装袋涉及将估计器应用于多个自助样本，并对自助样本的结果进行平均。为了解决维度诅咒问题，一种常见的做法是将装袋应用于本身使用交叉验证的估计器，从而在自助样本内使用交叉验证来选择微调参数，以权衡自助样本特定候选估计器的偏差和方差。在本文中，我们指出，为了实现对感兴趣参数的正确偏差方差权衡，应该在外部将交叉验证选择器应用于由这些微调参数索引的候选装袋估计器。我们使用三个模拟来比较新的交叉验证装袋方法与交叉验证估计器的装袋以及非交叉验证估计器的装袋。

相似文献

Cross-Validated Bagged Learning.交叉验证袋装学习

J Multivar Anal. 2008 Mar;25(2):260-266. doi: 10.1016/j.jmva.2007.07.004.

Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.随机森林的置信区间：刀切法和无穷小刀切法

J Mach Learn Res. 2014 Jan;15(1):1625-1651.

Bagged filters for partially observed interacting systems.用于部分观测交互系统的袋装滤波器。

J Am Stat Assoc. 2023;118(542):1078-1089. doi: 10.1080/01621459.2021.1974867. Epub 2021 Oct 4.

Collaborative double robust targeted maximum likelihood estimation.协作双稳健靶向最大似然估计

Int J Biostat. 2010 May 17;6(1):Article 17. doi: 10.2202/1557-4679.1181.

Targeted estimation of nuisance parameters to obtain valid statistical inference.对干扰参数进行有针对性的估计以获得有效的统计推断。

Int J Biostat. 2014;10(1):29-57. doi: 10.1515/ijb-2012-0038.

Bagged random causal networks for interventional queries on observational biomedical datasets.用于对观察性生物医学数据集进行干预性查询的袋装随机因果网络。

J Biomed Inform. 2021 Mar;115:103689. doi: 10.1016/j.jbi.2021.103689. Epub 2021 Feb 4.

Double Robust Efficient Estimators of Longitudinal Treatment Effects: Comparative Performance in Simulations and a Case Study.纵向治疗效果的双重稳健有效估计量：模拟中的比较性能及一个案例研究

Int J Biostat. 2019 Feb 26;15(2):/j/ijb.2019.15.issue-2/ijb-2017-0054/ijb-2017-0054.xml. doi: 10.1515/ijb-2017-0054.

Bagging tree classifiers for laser scanning images: a data- and simulation-based strategy.用于激光扫描图像的装袋树分类器：一种基于数据和模拟的策略。

Artif Intell Med. 2003 Jan;27(1):65-79. doi: 10.1016/s0933-3657(02)00085-4.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Selecting Shrinkage Parameters for Effect Estimation: The Multi-Ethnic Study of Atherosclerosis.选择用于效应估计的收缩参数：动脉粥样硬化的多民族研究。

Am J Epidemiol. 2018 Feb 1;187(2):358-365. doi: 10.1093/aje/kwx225.

引用本文的文献

Formulation Graphs for Mapping Structure-Composition of Battery Electrolytes to Device Performance.用于将电池电解质的结构-组成映射到器件性能的配方图。

J Chem Inf Model. 2023 Nov 27;63(22):6998-7010. doi: 10.1021/acs.jcim.3c01030. Epub 2023 Nov 10.

Machine learning-based prediction of motor status in glioma patients using diffusion MRI metrics along the corticospinal tract.基于机器学习，利用沿皮质脊髓束的扩散磁共振成像指标预测胶质瘤患者的运动状态。

Brain Commun. 2022 May 27;4(3):fcac141. doi: 10.1093/braincomms/fcac141. eCollection 2022.

SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts.SRIQ聚类：随机森林、QT聚类和K近邻概念的融合。

Comput Struct Biotechnol J. 2022 Apr 4;20:1567-1579. doi: 10.1016/j.csbj.2022.03.036. eCollection 2022.

The efficacy of F-FDG-PET-based radiomic and deep-learning features using a machine-learning approach to predict the pathological risk subtypes of thymic epithelial tumors.基于 F-FDG-PET 的放射组学和深度学习特征的功效，采用机器学习方法预测胸腺瘤的病理风险亚型。

Br J Radiol. 2022 Jun 1;95(1134):20211050. doi: 10.1259/bjr.20211050. Epub 2022 Mar 28.

Interplay between components of pupil-linked phasic arousal and its role in driving behavioral choice in Go/No-Go perceptual decision-making.瞳孔相关的相位性唤醒各成分之间的相互作用及其在“是/否”感知决策中驱动行为选择的作用。

Psychophysiology. 2020 Aug;57(8):e13565. doi: 10.1111/psyp.13565. Epub 2020 Mar 30.

本文引用的文献

Super learning: an application to the prediction of HIV-1 drug resistance.超级学习：在预测HIV-1耐药性方面的应用。

Stat Appl Genet Mol Biol. 2007;6:Article7. doi: 10.2202/1544-6115.1240. Epub 2007 Feb 23.

Multiple testing and data adaptive regression: an application to HIV-1 sequence data.多重检验与数据自适应回归：在HIV-1序列数据中的应用

Stat Appl Genet Mol Biol. 2005;4:Article8. doi: 10.2202/1544-6115.1110. Epub 2005 Apr 18.

Asymptotic optimality of likelihood-based cross-validation.基于似然的交叉验证的渐近最优性。

Stat Appl Genet Mol Biol. 2004;3:Article4. doi: 10.2202/1544-6115.1036. Epub 2004 Mar 22.

Deletion/substitution/addition algorithm in learning with applications in genomics.学习中的删除/替换/添加算法及其在基因组学中的应用

Stat Appl Genet Mol Biol. 2004;3:Article18. doi: 10.2202/1544-6115.1069. Epub 2004 Aug 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验