• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分箱导致数据集偏移对 k 折交叉验证的影响研究。

Study on the impact of partition-induced dataset shift on k-fold cross-validation.

出版信息

IEEE Trans Neural Netw Learn Syst. 2012 Aug;23(8):1304-12. doi: 10.1109/TNNLS.2012.2199516.

DOI:10.1109/TNNLS.2012.2199516
PMID:24807526
Abstract

Cross-validation is a very commonly employed technique used to evaluate classifier performance. However, it can potentially introduce dataset shift, a harmful factor that is often not taken into account and can result in inaccurate performance estimation. This paper analyzes the prevalence and impact of partition-induced covariate shift on different k-fold cross-validation schemes. From the experimental results obtained, we conclude that the degree of partition-induced covariate shift depends on the cross-validation scheme considered. In this way, worse schemes may harm the correctness of a single-classifier performance estimation and also increase the needed number of repetitions of cross-validation to reach a stable performance estimation.

摘要

交叉验证是一种常用于评估分类器性能的技术。然而,它可能会引入数据偏移,这是一个经常被忽视的有害因素,可能导致不准确的性能估计。本文分析了分区引起的协变量偏移对不同 k 折交叉验证方案的普遍性和影响。从得到的实验结果中,我们得出结论,分区引起的协变量偏移的程度取决于所考虑的交叉验证方案。这样,较差的方案可能会损害单个分类器性能估计的正确性,并增加达到稳定性能估计所需的交叉验证重复次数。

相似文献

1
Study on the impact of partition-induced dataset shift on k-fold cross-validation.分箱导致数据集偏移对 k 折交叉验证的影响研究。
IEEE Trans Neural Netw Learn Syst. 2012 Aug;23(8):1304-12. doi: 10.1109/TNNLS.2012.2199516.
2
Analysis of input set characteristics and variances on k-fold cross validation for a Recurrent Neural Network model on waste disposal rate estimation.基于循环神经网络模型的垃圾处理率估计中,对k折交叉验证的输入集特征和方差的分析。
J Environ Manage. 2022 Mar 11;311:114869. doi: 10.1016/j.jenvman.2022.114869.
3
Sensitivity analysis of kappa-fold cross validation in prediction error estimation.kappa 折叠交叉验证在预测误差估计中的敏感性分析。
IEEE Trans Pattern Anal Mach Intell. 2010 Mar;32(3):569-75. doi: 10.1109/TPAMI.2009.187.
4
Asymptotic optimality of likelihood-based cross-validation.基于似然的交叉验证的渐近最优性。
Stat Appl Genet Mol Biol. 2004;3:Article4. doi: 10.2202/1544-6115.1036. Epub 2004 Mar 22.
5
Classifier models and architectures for EEG-based neonatal seizure detection.用于基于脑电图的新生儿癫痫发作检测的分类器模型和架构。
Physiol Meas. 2008 Oct;29(10):1157-78. doi: 10.1088/0967-3334/29/10/002. Epub 2008 Sep 18.
6
A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM.基于 PSSM 的不同模式伪氨基酸组成融合的蛋白质折叠分类器。
Comput Biol Chem. 2011 Feb;35(1):1-9. doi: 10.1016/j.compbiolchem.2010.12.001. Epub 2010 Dec 17.
7
Understanding covariate shift in model performance.理解模型性能中的协变量偏移。
F1000Res. 2016 Apr 7;5. doi: 10.12688/f1000research.8317.3. eCollection 2016.
8
Mixture classification model based on clinical markers for breast cancer prognosis.基于临床标志物的乳腺癌预后混合分类模型。
Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.
9
A novel evaluation method for extrapolated retention factor in determination of n-octanol/water partition coefficient of halogenated organic pollutants by reversed-phase high performance liquid chromatography.反相高效液相色谱法测定卤代有机污染物正辛醇/水分配系数的外推保留因子的新评价方法。
Anal Chim Acta. 2012 Feb 3;713:130-5. doi: 10.1016/j.aca.2011.11.020. Epub 2011 Nov 19.
10
Benchmarking protein classification algorithms via supervised cross-validation.通过监督交叉验证对蛋白质分类算法进行基准测试。
J Biochem Biophys Methods. 2008 Apr 24;70(6):1215-23. doi: 10.1016/j.jbbm.2007.05.011. Epub 2007 May 31.

引用本文的文献

1
The Association Between Neonatal Respiratory Distress Syndrome and Plasma IgG N-Glycosylation: A Case-Control Study.新生儿呼吸窘迫综合征与血浆IgG N-糖基化之间的关联:一项病例对照研究。
J Inflamm Res. 2025 May 21;18:6439-6451. doi: 10.2147/JIR.S524188. eCollection 2025.
2
Comparison of principal component analysis algorithms for imputation in agrometeorological data in high dimension and reduced sample size.高维小样本农业气象数据插补的主成分分析算法比较
PLoS One. 2024 Dec 31;19(12):e0315574. doi: 10.1371/journal.pone.0315574. eCollection 2024.
3
Automatic 3D pelvimetry framework in CT images and its validation.
CT 图像中自动三维骨盆测量框架及其验证。
Sci Rep. 2024 Sep 13;14(1):21431. doi: 10.1038/s41598-024-72123-6.
4
Design optimization of large-scale bifacial photovoltaic module frame using deep learning surrogate model.基于深度学习代理模型的大规模双面光伏组件框架设计优化
Sci Rep. 2024 Jun 25;14(1):14592. doi: 10.1038/s41598-024-64594-4.
5
Integrating attention mechanism and multi-scale feature extraction for fall detection.融合注意力机制与多尺度特征提取进行跌倒检测。
Heliyon. 2024 May 21;10(10):e31614. doi: 10.1016/j.heliyon.2024.e31614. eCollection 2024 May 30.
6
Performance of deep learning algorithms to distinguish high-grade glioma from low-grade glioma: A systematic review and meta-analysis.深度学习算法区分高级别胶质瘤与低级别胶质瘤的性能:一项系统评价与荟萃分析。
iScience. 2023 May 5;26(6):106815. doi: 10.1016/j.isci.2023.106815. eCollection 2023 Jun 16.
7
A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning.分层交叉验证与基于分布平衡的分层交叉验证在不平衡学习中的应用比较研究。
Sensors (Basel). 2023 Feb 20;23(4):2333. doi: 10.3390/s23042333.
8
Application of machine learning in the prediction of deficient mismatch repair in patients with colorectal cancer based on routine preoperative characterization.基于常规术前特征的机器学习在结直肠癌患者错配修复缺陷预测中的应用。
Front Oncol. 2022 Dec 22;12:1049305. doi: 10.3389/fonc.2022.1049305. eCollection 2022.
9
Novel Human Artificial Intelligence Hybrid Framework Pinpoints Thyroid Nodule Malignancy and Identifies Overlooked Second-Order Ultrasonographic Features.新型人类人工智能混合框架可精准识别甲状腺结节恶性肿瘤并发现被忽视的二级超声特征。
Cancers (Basel). 2022 Sep 13;14(18):4440. doi: 10.3390/cancers14184440.
10
Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features.基于非血糖相关特征的墨西哥人群2型糖尿病检测的硬投票集成方法
Healthcare (Basel). 2022 Jul 22;10(8):1362. doi: 10.3390/healthcare10081362.