通过捷径外壳学习减轻数据偏差并确保对人工智能模型进行可靠评估。

Mitigating data bias and ensuring reliable evaluation of AI models with shortcut hull learning.

作者信息

Zhou Wenhao, Liu Faqiang, Zheng Hao, Zhao Rong

机构信息

Center for Brain-Inspired Computing Research (CBICR), Tsinghua University, Beijing, China.

Department of Precision Instruments, Tsinghua University, Beijing, China.

出版信息

Nat Commun. 2025 Jul 1;16(1):5513. doi: 10.1038/s41467-025-60801-6.

DOI:10.1038/s41467-025-60801-6

PMID:40595602

Abstract

Shortcut learning poses a significant challenge to both the interpretability and robustness of artificial intelligence, arising from dataset biases that lead models to exploit unintended correlations, or shortcuts, which undermine performance evaluations. Addressing these inherent biases is particularly difficult due to the complex, high-dimensional nature of data. Here, we introduce shortcut hull learning, a diagnostic paradigm that unifies shortcut representations in probability space and utilizes diverse models with different inductive biases to efficiently learn and identify shortcuts. This paradigm establishes a comprehensive, shortcut-free evaluation framework, validated by developing a shortcut-free topological dataset to assess deep neural networks' global capabilities, enabling a shift from Minsky and Papert's representational analysis to an empirical investigation of learning capacity. Unexpectedly, our experimental results suggest that under this framework, convolutional models-typically considered weak in global capabilities-outperform transformer-based models, challenging prevailing beliefs. By enabling robust and bias-free evaluation, our framework uncovers the true model capabilities beyond architectural preferences, offering a foundation for advancing AI interpretability and reliability.

摘要

捷径学习对人工智能的可解释性和鲁棒性都构成了重大挑战，这源于数据集偏差，这些偏差会导致模型利用非预期的相关性或捷径，从而破坏性能评估。由于数据具有复杂的高维性质，解决这些内在偏差特别困难。在这里，我们引入捷径壳学习，这是一种诊断范式，它在概率空间中统一捷径表示，并利用具有不同归纳偏差的多种模型来有效学习和识别捷径。这种范式建立了一个全面的、无捷径的评估框架，通过开发一个无捷径的拓扑数据集来评估深度神经网络的全局能力进行验证，从而实现从明斯基和佩珀特的表征分析到学习能力实证研究的转变。出乎意料的是，我们的实验结果表明，在这个框架下，通常被认为在全局能力方面较弱的卷积模型优于基于Transformer的模型，这挑战了普遍的观念。通过实现强大且无偏差的评估，我们的框架揭示了超越架构偏好的真实模型能力，为推进人工智能的可解释性和可靠性提供了基础。

相似文献

Mitigating data bias and ensuring reliable evaluation of AI models with shortcut hull learning.通过捷径外壳学习减轻数据偏差并确保对人工智能模型进行可靠评估。

Nat Commun. 2025 Jul 1;16(1):5513. doi: 10.1038/s41467-025-60801-6.

A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。

Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

Shortcut learning leads to sex bias in deep learning models for photoacoustic tomography.捷径学习导致光声断层扫描深度学习模型中的性别偏见。

Int J Comput Assist Radiol Surg. 2025 May 9. doi: 10.1007/s11548-025-03370-9.

Artificial intelligence for detecting keratoconus.人工智能在圆锥角膜检测中的应用。

Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.

A Comprehensive Drift-Adaptive Framework for Sustaining Model Performance in COVID-19 Detection From Dynamic Cough Audio Data: Model Development and Validation.一种用于在动态咳嗽音频数据的COVID-19检测中维持模型性能的综合漂移自适应框架：模型开发与验证

J Med Internet Res. 2025 Jun 3;27:e66919. doi: 10.2196/66919.

Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标：模型开发与评估研究

JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.

Application of a methodological framework for the development and multicenter validation of reliable artificial intelligence in embryo evaluation.一种用于胚胎评估中可靠人工智能开发和多中心验证的方法框架的应用。

Reprod Biol Endocrinol. 2025 Jan 31;23(1):16. doi: 10.1186/s12958-025-01351-w.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Improving reliability of movement assessment in Parkinson's disease using computer vision-based automated severity estimation.利用基于计算机视觉的自动严重程度估计提高帕金森病运动评估的可靠性。

J Parkinsons Dis. 2025 Mar;15(2):349-360. doi: 10.1177/1877718X241312605. Epub 2025 Feb 13.

Artificial intelligence-assisted detection of nasopharyngeal carcinoma on endoscopic images: a national, multicentre, model development and validation study.人工智能辅助内镜图像检测鼻咽癌：一项全国性、多中心的模型开发与验证研究。

Lancet Digit Health. 2025 Jun;7(6):100869. doi: 10.1016/j.landig.2025.03.001. Epub 2025 Jun 20.

本文引用的文献

A Turing test of whether AI chatbots are behaviorally similar to humans.人工智能聊天机器人是否在行为上与人类相似的图灵测试。

Proc Natl Acad Sci U S A. 2024 Feb 27;121(9):e2313925121. doi: 10.1073/pnas.2313925121. Epub 2024 Feb 22.

Solving olympiad geometry without human demonstrations.无需人类演示即可解决奥林匹克几何问题。

Nature. 2024 Jan;625(7995):476-482. doi: 10.1038/s41586-023-06747-5. Epub 2024 Jan 17.

The neuroconnectionist research programme.神经连接主义研究计划。

Nat Rev Neurosci. 2023 Jul;24(7):431-450. doi: 10.1038/s41583-023-00705-w. Epub 2023 May 30.

Are Deep Neural Networks Adequate Behavioral Models of Human Visual Perception?深度神经网络是否足以作为人类视觉感知的行为模型？

Annu Rev Vis Sci. 2023 Sep 15;9:501-524. doi: 10.1146/annurev-vision-120522-031739. Epub 2023 Mar 31.

Competition-level code generation with AlphaCode.使用 AlphaCode 进行竞赛级别的代码生成。

Science. 2022 Dec 9;378(6624):1092-1097. doi: 10.1126/science.abq1158. Epub 2022 Dec 8.

Deep problems with neural network models of human vision.人类视觉神经网络模型的深层问题。

Behav Brain Sci. 2022 Dec 1;46:e385. doi: 10.1017/S0140525X22002813.

Discovering faster matrix multiplication algorithms with reinforcement learning.用强化学习发现更快的矩阵乘法算法。

Nature. 2022 Oct;610(7930):47-53. doi: 10.1038/s41586-022-05172-4. Epub 2022 Oct 5.

Can deep convolutional neural networks support relational reasoning in the same-different task?深度卷积神经网络能否支持相同-不同任务中的关系推理？

J Vis. 2022 Sep 2;22(10):11. doi: 10.1167/jov.22.10.11.

Advancing mathematics by guiding human intuition with AI.用人工智能引导人类直觉推动数学发展。

Nature. 2021 Dec;600(7887):70-74. doi: 10.1038/s41586-021-04086-x. Epub 2021 Dec 1.

A failure to learn object shape geometry: Implications for convolutional neural networks as plausible models of biological vision.未能学习物体形状几何：对卷积神经网络作为生物视觉合理模型的影响。

Vision Res. 2021 Dec;189:81-92. doi: 10.1016/j.visres.2021.09.004. Epub 2021 Oct 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过捷径外壳学习减轻数据偏差并确保对人工智能模型进行可靠评估。

Mitigating data bias and ensuring reliable evaluation of AI models with shortcut hull learning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献