在自主系统中利用认知偏差：黑箱推理攻击框架

Weaponizing cognitive bias in autonomous systems: a framework for black-box inference attacks.

作者信息

Chu Shiyong, Chen Yuwei

机构信息

Aviation Industry Development Research Center of China, Beijing, China.

出版信息

Front Artif Intell. 2025 Aug 20;8:1623573. doi: 10.3389/frai.2025.1623573. eCollection 2025.

DOI:10.3389/frai.2025.1623573

PMID:40910117

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12405252/

Abstract

Autonomous systems operating in high-dimensional environments increasingly rely on prioritization heuristics to allocate attention and assess risk, yet these mechanisms can introduce cognitive biases such as salience, spatial framing, and temporal familiarity that influence decision-making without altering the input or accessing internal states. This study presents Priority Inversion via Operational Reasoning (PRIOR), a black-box, non-perturbative diagnostic framework that employs structurally biased but semantically neutral scenario cues to probe inference-level vulnerabilities without modifying pixel-level, statistical, or surface semantic properties. Given the limited accessibility of embodied vision-based systems, we evaluate PRIOR using large language models (LLMs) as abstract reasoning proxies to simulate cognitive prioritization in constrained textual surveillance scenarios inspired by Unmanned Aerial Vehicle (UAV) operations. Controlled experiments demonstrate that minimal structural cues can consistently induce priority inversions across multiple models, and joint analysis of model justifications and confidence estimates reveals systematic distortions in inferred threat relevance even when inputs are symmetrical. These findings expose the fragility of inference-level reasoning in black-box systems and motivate the development of evaluation strategies that extend beyond output correctness to interrogate internal prioritization logic, with implications for dynamic, embodied, and visually grounded agents in real-world deployments.

摘要

在高维环境中运行的自主系统越来越依赖于优先级启发式方法来分配注意力和评估风险，然而这些机制可能会引入认知偏差，如显著性、空间框架和时间熟悉度，这些偏差会影响决策，而不会改变输入或访问内部状态。本研究提出了通过操作推理实现优先级反转（PRIOR），这是一种黑盒、非扰动性诊断框架，它使用结构上有偏差但语义上中立的场景线索来探测推理层面的漏洞，而不修改像素级、统计或表面语义属性。鉴于基于视觉的实体系统的可访问性有限，我们使用大语言模型（LLM）作为抽象推理代理来评估PRIOR，以模拟受无人机（UAV）操作启发的受限文本监视场景中的认知优先级。对照实验表明，最小的结构线索可以在多个模型中持续诱导优先级反转，对模型理由和置信度估计的联合分析揭示了即使输入对称时，推断威胁相关性中的系统性扭曲。这些发现揭示了黑盒系统中推理层面推理的脆弱性，并推动了评估策略的发展，这些策略超越了输出正确性，以审视内部优先级逻辑，对现实世界部署中的动态、实体和视觉基础智能体具有启示意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a822/12405252/d4faec283a7a/frai-08-1623573-g0001.jpg

相似文献

Weaponizing cognitive bias in autonomous systems: a framework for black-box inference attacks.在自主系统中利用认知偏差：黑箱推理攻击框架

Front Artif Intell. 2025 Aug 20;8:1623573. doi: 10.3389/frai.2025.1623573. eCollection 2025.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Short-Term Memory Impairment短期记忆障碍

Integrated neural network framework for multi-object detection and recognition using UAV imagery.用于使用无人机图像进行多目标检测与识别的集成神经网络框架。

Front Neurorobot. 2025 Jul 30;19:1643011. doi: 10.3389/fnbot.2025.1643011. eCollection 2025.

Healthcare workers' informal uses of mobile phones and other mobile devices to support their work: a qualitative evidence synthesis.医护人员非正规使用手机和其他移动设备来支持工作：定性证据综合评价。

Cochrane Database Syst Rev. 2024 Aug 27;8(8):CD015705. doi: 10.1002/14651858.CD015705.pub2.

Evaluating the o1 reasoning large language model for cognitive bias: a vignette study.评估用于认知偏差的o1推理大语言模型：一项 vignette 研究。

Crit Care. 2025 Aug 21;29(1):376. doi: 10.1186/s13054-025-05591-5.

Evaluating the Reasoning Capabilities of Large Language Models for Medical Coding and Hospital Readmission Risk Stratification: Zero-Shot Prompting Approach.评估大型语言模型在医学编码和医院再入院风险分层方面的推理能力：零样本提示方法。

J Med Internet Res. 2025 Jul 30;27:e74142. doi: 10.2196/74142.

The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历：系统检索与综述

Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.

Psychological interventions for adults who have sexually offended or are at risk of offending.针对有性犯罪行为或有性犯罪风险的成年人的心理干预措施。

Cochrane Database Syst Rev. 2012 Dec 12;12(12):CD007507. doi: 10.1002/14651858.CD007507.pub2.

Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。

Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.

本文引用的文献

"Shortcuts" Causing Bias in Radiology Artificial Intelligence: Causes, Evaluation, and Mitigation.“捷径”导致放射科人工智能产生偏见：原因、评估和缓解。

J Am Coll Radiol. 2023 Sep;20(9):842-851. doi: 10.1016/j.jacr.2023.06.025. Epub 2023 Jul 27.

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets.计算机视觉深度多模态学习综述：进展、趋势、应用及数据集

Vis Comput. 2022;38(8):2939-2970. doi: 10.1007/s00371-021-02166-7. Epub 2021 Jun 10.

Human decision-making biases in the moral dilemmas of autonomous vehicles.自动驾驶车辆道德困境中的人类决策偏见。

Sci Rep. 2019 Sep 11;9(1):13080. doi: 10.1038/s41598-019-49411-7.

Implicit Bias Is Behavior: A Functional-Cognitive Perspective on Implicit Bias.内隐偏见即行为：内隐偏见的功能认知透视。

Perspect Psychol Sci. 2019 Sep;14(5):835-840. doi: 10.1177/1745691619855638. Epub 2019 Aug 2.

Biases and Heuristics in Decision Making and Their Impact on Autonomy.决策中的偏见和启发式及其对自主性的影响。

Am J Bioeth. 2016 May;16(5):5-15. doi: 10.1080/15265161.2016.1159750.

Framing spatial cognition: neural representations of proximal and distal frames of reference and their roles in navigation.框架空间认知：近端和远端参照系的神经表示及其在导航中的作用。

Physiol Rev. 2011 Oct;91(4):1245-79. doi: 10.1152/physrev.00021.2010.

Effect of distorted visual feedback on the sense of agency.视觉反馈失真对能动感的影响。

Behav Neurol. 2008;19(1-2):53-7. doi: 10.1155/2008/425267.

Judgment under Uncertainty: Heuristics and Biases.《不确定性下的判断：启发式与偏差》

Science. 1974 Sep 27;185(4157):1124-31. doi: 10.1126/science.185.4157.1124.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在自主系统中利用认知偏差：黑箱推理攻击框架

Weaponizing cognitive bias in autonomous systems: a framework for black-box inference attacks.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献