• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用学习曲线交叉验证进行快速而有效的模型选择。

Fast and Informative Model Selection Using Learning Curve Cross-Validation.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9669-9680. doi: 10.1109/TPAMI.2023.3251957. Epub 2023 Jun 30.

DOI:10.1109/TPAMI.2023.3251957
PMID:37028368
Abstract

Common cross-validation (CV) methods like k-fold cross-validation or Monte Carlo cross-validation estimate the predictive performance of a learner by repeatedly training it on a large portion of the given data and testing it on the remaining data. These techniques have two major drawbacks. First, they can be unnecessarily slow on large datasets. Second, beyond an estimation of the final performance, they give almost no insights into the learning process of the validated algorithm. In this article, we present a new approach for validation based on learning curves (LCCV). Instead of creating train-test splits with a large portion of training data, LCCV iteratively increases the number of instances used for training. In the context of model selection, it discards models that are unlikely to become competitive. In a series of experiments on 75 datasets, we could show that in over 90% of the cases using LCCV leads to the same performance as using 5/10-fold CV while substantially reducing the runtime (median runtime reductions of over 50%); the performance using LCCV never deviated from CV by more than 2.5%. We also compare it to a racing-based method and successive halving, a multi-armed bandit method. Additionally, it provides important insights, which for example allows assessing the benefits of acquiring more data.

摘要

常见的交叉验证(CV)方法,如 k 折交叉验证或蒙特卡罗交叉验证,通过在给定数据的大部分上反复训练学习者,并在剩余数据上测试来估计学习者的预测性能。这些技术有两个主要缺点。首先,它们在大型数据集上可能会非常缓慢。其次,除了最终性能的估计之外,它们几乎没有提供关于验证算法学习过程的任何见解。在本文中,我们提出了一种基于学习曲线(LCCV)的新验证方法。LCCV 不是使用大部分训练数据创建训练-测试分割,而是迭代地增加用于训练的实例数量。在模型选择的上下文中,它会丢弃不太可能具有竞争力的模型。在对 75 个数据集进行的一系列实验中,我们证明在超过 90%的情况下,使用 LCCV 导致的性能与使用 5/10 折 CV 相同,同时大大减少了运行时间(中位数运行时间减少超过 50%);使用 LCCV 的性能从未偏离 CV 超过 2.5%。我们还将其与基于竞赛的方法和连续减半的多臂赌博机方法进行了比较。此外,它提供了重要的见解,例如可以评估获取更多数据的好处。

相似文献

1
Fast and Informative Model Selection Using Learning Curve Cross-Validation.使用学习曲线交叉验证进行快速而有效的模型选择。
IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9669-9680. doi: 10.1109/TPAMI.2023.3251957. Epub 2023 Jun 30.
2
Impact of train/test sample regimen on performance estimate stability of machine learning in cardiovascular imaging.机器学习在心血管成像中,训练/测试样本方案对性能估计稳定性的影响。
Sci Rep. 2021 Jul 14;11(1):14490. doi: 10.1038/s41598-021-93651-5.
3
Channel selection and classification of electroencephalogram signals: an artificial neural network and genetic algorithm-based approach.脑电信号的通道选择与分类:基于人工神经网络和遗传算法的方法。
Artif Intell Med. 2012 Jun;55(2):117-26. doi: 10.1016/j.artmed.2012.02.001. Epub 2012 Apr 12.
4
Estimation of drug exposure by machine learning based on simulations from published pharmacokinetic models: The example of tacrolimus.基于已发表药代动力学模型模拟的机器学习估算药物暴露量:以他克莫司为例。
Pharmacol Res. 2021 May;167:105578. doi: 10.1016/j.phrs.2021.105578. Epub 2021 Mar 26.
5
Bias in error estimation when using cross-validation for model selection.在使用交叉验证进行模型选择时误差估计中的偏差。
BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.
6
Improved cytokine-receptor interaction prediction by exploiting the negative sample space.利用负样本空间提高细胞因子-受体相互作用预测
BMC Bioinformatics. 2020 Oct 31;21(1):493. doi: 10.1186/s12859-020-03835-5.
7
Outcome prediction based on microarray analysis: a critical perspective on methods.基于微阵列分析的结果预测:对方法的批判性观点
BMC Bioinformatics. 2009 Feb 7;10:53. doi: 10.1186/1471-2105-10-53.
8
Monte Carlo cross-validation for a study with binary outcome and limited sample size.用于具有二项结局和有限样本量的研究的蒙特卡罗交叉验证。
BMC Med Inform Decis Mak. 2022 Oct 17;22(1):270. doi: 10.1186/s12911-022-02016-z.
9
Validation of differential gene expression algorithms: application comparing fold-change estimation to hypothesis testing.差异基因表达算法的验证:应用比较折叠变化估计与假设检验。
BMC Bioinformatics. 2010 Jan 28;11:63. doi: 10.1186/1471-2105-11-63.
10
Accurate definition of control strategies using cross validated stepwise regression and Monte Carlo simulation.使用交叉验证逐步回归和蒙特卡罗模拟对控制策略进行准确界定。
J Biotechnol. 2019;306S:100006. doi: 10.1016/j.btecx.2019.100006. Epub 2019 Apr 28.

引用本文的文献

1
Deep Learning-Based DNA Methylation Detection in Cervical Cancer Using the One-Hot Character Representation Technique.基于深度学习的宫颈癌DNA甲基化检测:使用独热字符表示技术
Diagnostics (Basel). 2025 Sep 7;15(17):2263. doi: 10.3390/diagnostics15172263.
2
Predicting intraoperative blood loss risk in severe lumbar disc herniation patients undergoing PLIF: a multicenter cohort study using ensemble learning.预测接受后路腰椎椎间融合术的严重腰椎间盘突出症患者术中失血风险:一项使用集成学习的多中心队列研究
Int J Surg. 2025 Sep 1;111(9):5904-5913. doi: 10.1097/JS9.0000000000002730. Epub 2025 Jun 19.
3
Digital image enhancement using deep learning algorithm in 3D heads-up vitreoretinal surgery.
3D 抬头式玻璃体视网膜手术中基于深度学习算法的数字图像增强
Sci Rep. 2025 May 26;15(1):18429. doi: 10.1038/s41598-025-98801-7.
4
Machine learning and multi-omics integration: advancing cardiovascular translational research and clinical practice.机器学习与多组学整合:推动心血管转化研究与临床实践
J Transl Med. 2025 Apr 2;23(1):388. doi: 10.1186/s12967-025-06425-2.
5
Fast binary logistic regression.快速二元逻辑回归
PeerJ Comput Sci. 2025 Jan 30;11:e2579. doi: 10.7717/peerj-cs.2579. eCollection 2025.
6
Supervised Deep Learning for Detecting and Locating Passive Seismic Events Recorded with DAS: A Case Study.用于检测和定位分布式声学传感记录的被动地震事件的监督深度学习:一个案例研究
Sensors (Basel). 2024 Oct 30;24(21):6978. doi: 10.3390/s24216978.
7
A preliminary prediction model of pediatric Mycoplasma pneumoniae pneumonia based on routine blood parameters by using machine learning method.基于机器学习方法的基于常规血液参数的小儿肺炎支原体肺炎初步预测模型。
BMC Infect Dis. 2024 Jul 18;24(1):707. doi: 10.1186/s12879-024-09613-5.
8
Clinical predictors of severe radiation pneumonitis in patients undergoing thoracic radiotherapy for lung cancer.肺癌胸部放疗患者严重放射性肺炎的临床预测因素。
Transl Lung Cancer Res. 2024 May 31;13(5):1069-1083. doi: 10.21037/tlcr-24-328. Epub 2024 May 29.
9
Investigating the clinical role and prognostic value of genes related to insulin-like growth factor signaling pathway in thyroid cancer.探讨与胰岛素样生长因子信号通路相关基因在甲状腺癌中的临床作用和预后价值。
Aging (Albany NY). 2024 Feb 7;16(3):2934-2952. doi: 10.18632/aging.205524.