• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于似然的交叉验证的渐近最优性。

Asymptotic optimality of likelihood-based cross-validation.

作者信息

van der Laan Mark J, Dudoit Sandrine, Keles Sunduz

机构信息

Division of Biostatistics, School of Public Health, University of California, Berkeley, USA.

出版信息

Stat Appl Genet Mol Biol. 2004;3:Article4. doi: 10.2202/1544-6115.1036. Epub 2004 Mar 22.

DOI:10.2202/1544-6115.1036
PMID:16646820
Abstract

Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish a finite sample result for a general class of likelihood-based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold cross-validation). This result implies that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true density) as a benchmark model selector which is optimal for each given dataset and depends on the true density. Crucial conditions of our theorem are that the size of the validation sample converges to infinity, which excludes leave-one-out cross-validation, and that the candidate density estimates are bounded away from zero and infinity. We illustrate these asymptotic results and the practical performance of likelihood-based cross-validation for the purpose of bandwidth selection with a simulation study. Moreover, we use likelihood-based cross-validation in the context of regulatory motif detection in DNA sequences.

摘要

基于似然性的交叉验证是一种统计工具,用于从一组候选密度估计器中,根据来自真实密度的(n)个独立同分布观测值选择一个密度估计。一般的例子包括选择一个对最大似然估计器进行索引的模型,以及选择一个对非参数(如核)密度估计器进行索引的带宽。在本文中,我们为一类基于似然性的交叉验证程序(由所用样本分割类型索引,如(V)折交叉验证)建立了一个有限样本结果。该结果意味着,交叉验证选择器在渐近意义上(相对于到真实密度的库尔贝克 - 莱布勒距离)与一个基准模型选择器表现相同,该基准模型选择器对于每个给定数据集都是最优的,并且依赖于真实密度。我们定理的关键条件是验证样本的大小收敛到无穷大,这排除了留一法交叉验证,并且候选密度估计远离零和无穷大。我们通过模拟研究说明了这些渐近结果以及基于似然性的交叉验证在带宽选择方面的实际性能。此外,我们在DNA序列中调控基序检测的背景下使用基于似然性的交叉验证。

相似文献

1
Asymptotic optimality of likelihood-based cross-validation.基于似然的交叉验证的渐近最优性。
Stat Appl Genet Mol Biol. 2004;3:Article4. doi: 10.2202/1544-6115.1036. Epub 2004 Mar 22.
2
Maximum likelihood set for estimating a probability mass function.用于估计概率质量函数的最大似然集。
Neural Comput. 2005 Jul;17(7):1508-30. doi: 10.1162/0899766053723078.
3
A comparison of population size estimators under the truncated count model with and without allowance for contaminations.截断计数模型下考虑和不考虑污染因素时种群大小估计器的比较。
Biom J. 2008 Dec;50(6):1006-21. doi: 10.1002/bimj.200810484.
4
A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso.一种基于高度自适应套索的一般有效基于靶向最小损失的估计器。
Int J Biostat. 2017 Oct 12;13(2):/j/ijb.2017.13.issue-2/ijb-2015-0097/ijb-2015-0097.xml. doi: 10.1515/ijb-2015-0097.
5
Model-based multiplicity estimation of population size.基于模型的种群大小多重性估计
Stat Med. 2009 Jul 30;28(17):2230-52. doi: 10.1002/sim.3614.
6
Benchmarking protein classification algorithms via supervised cross-validation.通过监督交叉验证对蛋白质分类算法进行基准测试。
J Biochem Biophys Methods. 2008 Apr 24;70(6):1215-23. doi: 10.1016/j.jbbm.2007.05.011. Epub 2007 May 31.
7
An adjustment to improve the bivariate survivor function repaired NPMLE.一种用于改进双变量生存函数的调整修复了非参数最大似然估计。
Lifetime Data Anal. 2005 Sep;11(3):291-307. doi: 10.1007/s10985-005-2964-9.
8
Piecewise linear models with guaranteed closeness to the data.
IEEE Trans Pattern Anal Mach Intell. 2009 Aug;31(8):1525-31. doi: 10.1109/TPAMI.2009.13.
9
Parameters of a dose-response model are on the boundary: what happens with BMDL?剂量反应模型的参数处于边界状态:基准剂量下限(BMDL)会怎样?
Risk Anal. 2009 Jan;29(1):18-25. doi: 10.1111/j.1539-6924.2008.01125.x. Epub 2008 Sep 18.
10
Choice of prognostic estimators in joint models by estimating differences of expected conditional Kullback-Leibler risks.通过估计期望条件库尔贝克-莱布勒风险的差异来选择联合模型中的预后估计量。
Biometrics. 2012 Jun;68(2):380-7. doi: 10.1111/j.1541-0420.2012.01753.x. Epub 2012 May 11.

引用本文的文献

1
A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology.高维生物学中数据自适应半参数估计的适度统计的推广。
Stat Methods Med Res. 2023 Mar;32(3):539-554. doi: 10.1177/09622802221146313. Epub 2022 Dec 26.
2
The Association of Teamlets and Teams with Physician Burnout and Patient Outcomes.团队和团队成员与医生倦怠和患者结局的关系。
J Gen Intern Med. 2023 May;38(6):1384-1392. doi: 10.1007/s11606-022-07894-7. Epub 2022 Nov 28.
3
Assessing trends in vaccine efficacy by pathogen genetic distance.
通过病原体遗传距离评估疫苗效力的趋势。
J Soc Fr Statistique (2009). 2020 Jul;161(1):164-175.
4
Estimating and Testing Vaccine Sieve Effects Using Machine Learning.使用机器学习估计和测试疫苗筛选效果
J Am Stat Assoc. 2019;114(527):1038-1049. doi: 10.1080/01621459.2018.1529594. Epub 2019 Apr 3.
5
Generalized Score Functions for Causal Discovery.用于因果发现的广义评分函数
KDD. 2018 Aug;2018:1551-1560. doi: 10.1145/3219819.3220104.
6
A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso.一种基于高度自适应套索的一般有效基于靶向最小损失的估计器。
Int J Biostat. 2017 Oct 12;13(2):/j/ijb.2017.13.issue-2/ijb-2015-0097/ijb-2015-0097.xml. doi: 10.1515/ijb-2015-0097.
7
Estimating the Comparative Effectiveness of Feeding Interventions in the Pediatric Intensive Care Unit: A Demonstration of Longitudinal Targeted Maximum Likelihood Estimation.评估儿科重症监护病房喂养干预措施的相对有效性:纵向靶向最大似然估计的实证研究
Am J Epidemiol. 2017 Dec 15;186(12):1370-1379. doi: 10.1093/aje/kwx213.
8
A Case Study of the Impact of Data-Adaptive Versus Model-Based Estimation of the Propensity Scores on Causal Inferences from Three Inverse Probability Weighting Estimators.数据自适应与基于模型的倾向得分估计对三种逆概率加权估计器因果推断影响的案例研究
Int J Biostat. 2016 May 1;12(1):131-55. doi: 10.1515/ijb-2015-0028.
9
Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury.连续治疗效果评估:一种机器学习方法及其在创伤性脑损伤治疗中的应用
Health Econ. 2015 Sep;24(9):1213-28. doi: 10.1002/hec.3189. Epub 2015 Jun 8.
10
Semiparametric Estimation of the Impacts of Longitudinal Interventions on Adolescent Obesity using Targeted Maximum-Likelihood: Accessible Estimation with the ltmle Package.使用靶向最大似然法对纵向干预对青少年肥胖影响的半参数估计:使用ltmle软件包进行可及估计
J Causal Inference. 2014 Mar;2(1):95-108. doi: 10.1515/jci-2013-0025.