IEEE Trans Neural Netw Learn Syst. 2012 Aug;23(8):1304-12. doi: 10.1109/TNNLS.2012.2199516.
Cross-validation is a very commonly employed technique used to evaluate classifier performance. However, it can potentially introduce dataset shift, a harmful factor that is often not taken into account and can result in inaccurate performance estimation. This paper analyzes the prevalence and impact of partition-induced covariate shift on different k-fold cross-validation schemes. From the experimental results obtained, we conclude that the degree of partition-induced covariate shift depends on the cross-validation scheme considered. In this way, worse schemes may harm the correctness of a single-classifier performance estimation and also increase the needed number of repetitions of cross-validation to reach a stable performance estimation.
交叉验证是一种常用于评估分类器性能的技术。然而,它可能会引入数据偏移,这是一个经常被忽视的有害因素,可能导致不准确的性能估计。本文分析了分区引起的协变量偏移对不同 k 折交叉验证方案的普遍性和影响。从得到的实验结果中,我们得出结论,分区引起的协变量偏移的程度取决于所考虑的交叉验证方案。这样,较差的方案可能会损害单个分类器性能估计的正确性,并增加达到稳定性能估计所需的交叉验证重复次数。