分布偏差会影响留一法交叉验证。

Distributional bias compromises leave-one-out cross-validation.

作者信息

Austin George I, Pe'er Itsik, Korem Tal

出版信息

ArXiv. 2025 Mar 24:arXiv:2406.01652v2.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11177965/

Abstract

Cross-validation is a common method for estimating the predictive performance of machine learning models. In a data-scarce regime, where one typically wishes to maximize the number of instances used for training the model, an approach called "leave-one-out cross-validation" is often used. In this design, a separate model is built for predicting each data instance after training on all other instances. Since this results in a single test instance available per model trained, predictions are aggregated across the entire dataset to calculate common performance metrics such as the area under the receiver operating characteristic or R2 scores. In this work, we demonstrate that this approach creates a negative correlation between the average label of each training fold and the label of its corresponding test instance, a phenomenon that we term distributional bias. As machine learning models tend to regress to the mean of their training data, this distributional bias tends to negatively impact performance evaluation and hyperparameter optimization. We show that this effect generalizes to leave-P-out cross-validation and persists across a wide range of modeling and evaluation approaches, and that it can lead to a bias against stronger regularization. To address this, we propose a generalizable rebalanced cross-validation approach that corrects for distributional bias for both classification and regression. We demonstrate that our approach improves cross-validation performance evaluation in synthetic simulations, across machine learning benchmarks, and in several published leave-one-out analyses.

摘要

交叉验证是估计机器学习模型预测性能的常用方法。在数据稀缺的情况下，人们通常希望最大化用于训练模型的实例数量，此时常使用一种称为“留一法交叉验证”的方法。在这种设计中，在对所有其他实例进行训练后，为预测每个数据实例构建一个单独的模型。由于每个训练模型只有一个可用的测试实例，因此要在整个数据集上汇总预测结果，以计算常见的性能指标，如受试者工作特征曲线下面积或R2分数。在这项工作中，我们证明了这种方法会在每个训练折的平均标签与其相应测试实例的标签之间产生负相关，我们将这种现象称为分布偏差。由于机器学习模型倾向于回归到其训练数据的均值，这种分布偏差往往会对性能评估和超参数优化产生负面影响。我们表明，这种效应适用于留P法交叉验证，并且在广泛的建模和评估方法中都存在，而且它可能导致对更强正则化的偏差。为了解决这个问题，我们提出了一种可推广的重新平衡交叉验证方法，该方法可以校正分类和回归中的分布偏差。我们证明，我们的方法在合成模拟、跨机器学习基准以及在一些已发表的留一法分析中都能提高交叉验证性能评估。

相似文献

Distributional bias compromises leave-one-out cross-validation.分布偏差会影响留一法交叉验证。

ArXiv. 2025 Mar 24:arXiv:2406.01652v2.

Tournament leave-pair-out cross-validation for receiver operating characteristic analysis.基于留一法的受试者工作特征分析的比赛验证。

Stat Methods Med Res. 2019 Oct-Nov;28(10-11):2975-2991. doi: 10.1177/0962280218795190. Epub 2018 Aug 20.

Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型，对于使用可穿戴设备进行压力预测具有良好的泛化能力。

J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data.利用牛奶近红外光谱数据评估机器学习方法和变量选择方法在荷斯坦奶牛中预测难以测量性状的性能。

J Dairy Sci. 2021 Jul;104(7):8107-8121. doi: 10.3168/jds.2020-19861. Epub 2021 Apr 15.

Stratification bias in low signal microarray studies.低信号微阵列研究中的分层偏差。

BMC Bioinformatics. 2007 Sep 2;8:326. doi: 10.1186/1471-2105-8-326.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型：基于多中心队列研究的开发与验证研究

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

Issues in performance evaluation for host-pathogen protein interaction prediction.宿主-病原体蛋白质相互作用预测的性能评估问题

J Bioinform Comput Biol. 2016 Jun;14(3):1650011. doi: 10.1142/S0219720016500116. Epub 2016 Jan 14.

Improving Surgical Site Infection Prediction Using Machine Learning: Addressing Challenges of Highly Imbalanced Data.使用机器学习改善手术部位感染预测：应对高度不平衡数据的挑战。

Diagnostics (Basel). 2025 Feb 19;15(4):501. doi: 10.3390/diagnostics15040501.

Comparison of Artificial Intelligence Techniques to Evaluate Performance of a Classifier for Automatic Grading of Prostate Cancer From Digitized Histopathologic Images.比较人工智能技术评估分类器在从数字化组织病理学图像自动分级前列腺癌方面的性能。

JAMA Netw Open. 2019 Mar 1;2(3):e190442. doi: 10.1001/jamanetworkopen.2019.0442.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者？

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

本文引用的文献

Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能：比较混合方法研究。

J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.

Meta-analysis reveals the vaginal microbiome is a better predictor of earlier than later preterm birth.荟萃分析显示，阴道微生物组是预测早产早于早产的更好指标。

BMC Biol. 2023 Sep 25;21(1):199. doi: 10.1186/s12915-023-01702-2.

Large language models in medicine.医学中的大型语言模型。

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

Systemic antibody responses against human microbiota flagellins are overrepresented in chronic fatigue syndrome patients.慢性疲劳综合征患者体内针对人体微生物菌群鞭毛蛋白的系统性抗体反应较为常见。

Sci Adv. 2022 Sep 23;8(38):eabq2422. doi: 10.1126/sciadv.abq2422.

T cell characteristics associated with toxicity to immune checkpoint blockade in patients with melanoma.与黑色素瘤患者免疫检查点阻断毒性相关的 T 细胞特征。

Nat Med. 2022 Feb;28(2):353-362. doi: 10.1038/s41591-021-01623-z. Epub 2022 Jan 13.

Navigating the pitfalls of applying machine learning in genomics.在基因组学中应用机器学习的陷阱。

Nat Rev Genet. 2022 Mar;23(3):169-181. doi: 10.1038/s41576-021-00434-9. Epub 2021 Nov 26.

COVID-19 cough classification using machine learning and global smartphone recordings.利用机器学习和全球智能手机记录对 COVID-19 咳嗽进行分类。

Comput Biol Med. 2021 Aug;135:104572. doi: 10.1016/j.compbiomed.2021.104572. Epub 2021 Jun 17.

The vaginal microbiome and preterm birth.阴道微生物组与早产。

Nat Med. 2019 Jun;25(6):1012-1021. doi: 10.1038/s41591-019-0450-2. Epub 2019 May 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验