• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器学习错误对人类决策的影响:模型精度、错误类型和错误重要性的操纵。

Effects of machine learning errors on human decision-making: manipulations of model accuracy, error types, and error importance.

机构信息

Sandia National Laboratories, Mail Stop 1327, P.O. Box 5800, Albuquerque, NM, 87185-1327, USA.

出版信息

Cogn Res Princ Implic. 2024 Aug 26;9(1):56. doi: 10.1186/s41235-024-00586-2.

DOI:10.1186/s41235-024-00586-2
PMID:39183209
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11345344/
Abstract

This study addressed the cognitive impacts of providing correct and incorrect machine learning (ML) outputs in support of an object detection task. The study consisted of five experiments that manipulated the accuracy and importance of mock ML outputs. In each of the experiments, participants were given the T and L task with T-shaped targets and L-shaped distractors. They were tasked with categorizing each image as target present or target absent. In Experiment 1, they performed this task without the aid of ML outputs. In Experiments 2-5, they were shown images with bounding boxes, representing the output of an ML model. The outputs could be correct (hits and correct rejections), or they could be erroneous (false alarms and misses). Experiment 2 manipulated the overall accuracy of these mock ML outputs. Experiment 3 manipulated the proportion of different types of errors. Experiments 4 and 5 manipulated the importance of specific types of stimuli or model errors, as well as the framing of the task in terms of human or model performance. These experiments showed that model misses were consistently harder for participants to detect than model false alarms. In general, as the model's performance increased, human performance increased as well, but in many cases the participants were more likely to overlook model errors when the model had high accuracy overall. Warning participants to be on the lookout for specific types of model errors had very little impact on their performance. Overall, our results emphasize the importance of considering human cognition when determining what level of model performance and types of model errors are acceptable for a given task.

摘要

这项研究探讨了在支持目标检测任务时提供正确和错误的机器学习 (ML) 输出对认知的影响。该研究由五个实验组成,这些实验操纵了模拟 ML 输出的准确性和重要性。在每个实验中,参与者都要完成 T 和 L 任务,其中 T 形目标和 L 形干扰物。他们的任务是对每张图像进行分类,判断是否存在目标。在实验 1 中,他们在没有 ML 输出辅助的情况下完成了这项任务。在实验 2-5 中,他们观看了带有边界框的图像,这些边界框代表了 ML 模型的输出。这些输出可以是正确的(命中和正确拒绝),也可以是错误的(误报和漏报)。实验 2 操纵了这些模拟 ML 输出的整体准确性。实验 3 操纵了不同类型错误的比例。实验 4 和 5 操纵了特定类型的刺激或模型错误的重要性,以及任务在人类或模型性能方面的表述方式。这些实验表明,模型漏报比模型误报更难被参与者发现。一般来说,随着模型性能的提高,人类的表现也会提高,但在许多情况下,当模型整体准确率较高时,参与者更有可能忽略模型错误。警告参与者要注意特定类型的模型错误对他们的表现影响很小。总的来说,我们的研究结果强调了在确定给定任务中可接受的模型性能水平和模型错误类型时,考虑人类认知的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/5312bea4dc37/41235_2024_586_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/260574cadc71/41235_2024_586_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/cd425b3ff1b2/41235_2024_586_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/6bcd7d0dc87e/41235_2024_586_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/6885fd3cd620/41235_2024_586_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/737ada974b56/41235_2024_586_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/1900a5cd308a/41235_2024_586_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/4aba0865eb16/41235_2024_586_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/44d804d88c4f/41235_2024_586_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/35f204352eaf/41235_2024_586_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/9eb4b3d85d69/41235_2024_586_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/2ba531d3b3ca/41235_2024_586_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/0c79e54f1e2c/41235_2024_586_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/adeb3ae99931/41235_2024_586_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/2c1db01a5421/41235_2024_586_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/0ec10d346604/41235_2024_586_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/e43a9c194962/41235_2024_586_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/5312bea4dc37/41235_2024_586_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/260574cadc71/41235_2024_586_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/cd425b3ff1b2/41235_2024_586_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/6bcd7d0dc87e/41235_2024_586_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/6885fd3cd620/41235_2024_586_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/737ada974b56/41235_2024_586_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/1900a5cd308a/41235_2024_586_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/4aba0865eb16/41235_2024_586_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/44d804d88c4f/41235_2024_586_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/35f204352eaf/41235_2024_586_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/9eb4b3d85d69/41235_2024_586_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/2ba531d3b3ca/41235_2024_586_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/0c79e54f1e2c/41235_2024_586_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/adeb3ae99931/41235_2024_586_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/2c1db01a5421/41235_2024_586_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/0ec10d346604/41235_2024_586_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/e43a9c194962/41235_2024_586_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ae/11345344/5312bea4dc37/41235_2024_586_Fig17_HTML.jpg

相似文献

1
Effects of machine learning errors on human decision-making: manipulations of model accuracy, error types, and error importance.机器学习错误对人类决策的影响:模型精度、错误类型和错误重要性的操纵。
Cogn Res Princ Implic. 2024 Aug 26;9(1):56. doi: 10.1186/s41235-024-00586-2.
2
Effects of response bias and judgment framing on operator use of an automated aid in a target detection task.在目标检测任务中,反应偏差和判断框架对操作人员使用自动化辅助工具的影响。
J Exp Psychol Appl. 2011 Dec;17(4):320-31. doi: 10.1037/a0024243. Epub 2011 Jun 27.
3
Decision processes in visual search as a function of target prevalence.视觉搜索中的决策过程作为目标出现概率的函数。
J Exp Psychol Hum Percept Perform. 2016 Sep;42(9):1466-76. doi: 10.1037/xhp0000248. Epub 2016 May 5.
4
Memory error speed predicts subsequent accuracy for recognition misses but not false alarms.记忆错误速度可以预测随后的识别漏报准确性,但不能预测误报准确性。
Memory. 2023 Nov;31(10):1340-1351. doi: 10.1080/09658211.2023.2265613. Epub 2023 Nov 21.
5
Effects of learning on somatosensory decision-making and experiences.学习对体感决策和体验的影响。
J Exp Psychol Gen. 2017 Nov;146(11):1631-1648. doi: 10.1037/xge0000364. Epub 2017 Oct 2.
6
Studying the dynamics of visual search behavior using RT hazard and micro-level speed-accuracy tradeoff functions: A role for recurrent object recognition and cognitive control processes.使用反应时风险和微观层面速度-准确性权衡函数研究视觉搜索行为的动态变化:循环物体识别和认知控制过程的作用。
Atten Percept Psychophys. 2020 Feb;82(2):689-714. doi: 10.3758/s13414-019-01897-z.
7
A diffusion model account of the lexical decision task.词汇判断任务的扩散模型解释
Psychol Rev. 2004 Jan;111(1):159-82. doi: 10.1037/0033-295X.111.1.159.
8
[Inhibition and resource capacity during normal aging: a confrontation of the dorsal-ventral and frontal models in a modified version of negative priming].正常衰老过程中的抑制与资源容量:在负启动修正版中背腹侧模型与额叶模型的对比
Encephale. 2006 Mar-Apr;32(2 Pt 1):253-62. doi: 10.1016/s0013-7006(06)76152-8.
9
Error detection model developed using a multi-task convolutional neural network in patient-specific quality assurance for volumetric-modulated arc therapy.使用多任务卷积神经网络在容积调强弧形治疗的个体化质量保证中开发的误差检测模型。
Med Phys. 2021 Sep;48(9):4769-4783. doi: 10.1002/mp.15031. Epub 2021 Jul 29.
10
An fMRI and effective connectivity study investigating miss errors during advice utilization from human and machine agents.一项 fMRI 和有效连接研究,旨在调查在利用人类和机器代理的建议时出现的错误。
Soc Neurosci. 2017 Oct;12(5):570-581. doi: 10.1080/17470919.2016.1205131. Epub 2016 Jul 13.

引用本文的文献

1
Errors in visual search: How can we reduce them?视觉搜索中的错误:我们如何减少它们?
Atten Percept Psychophys. 2025 Jul;87(5):1471-1495. doi: 10.3758/s13414-025-03095-6. Epub 2025 Jun 13.

本文引用的文献

1
Framing the fallibility of Computer-Aided Detection aids cancer detection.计算机辅助检测的缺陷性分析有助于癌症检测。
Cogn Res Princ Implic. 2023 May 24;8(1):30. doi: 10.1186/s41235-023-00485-y.
2
Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.停止为高风险决策解释黑箱机器学习模型,转而使用可解释模型。
Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.
3
The optimal use of computer aided detection to find low prevalence cancers.
利用计算机辅助检测寻找低患病率癌症的最佳方法。
Cogn Res Princ Implic. 2022 Feb 4;7(1):13. doi: 10.1186/s41235-022-00361-1.
4
Studying visual search without an eye tracker: an assessment of artificial foveation.无需眼动追踪器的视觉搜索研究:人工中央凹聚焦评估。
Cogn Res Princ Implic. 2021 Jun 25;6(1):45. doi: 10.1186/s41235-021-00304-2.
5
Evaluating human versus machine learning performance in classifying research abstracts.评估人类与机器学习在研究摘要分类方面的性能。
Scientometrics. 2020;125(2):1197-1212. doi: 10.1007/s11192-020-03614-2. Epub 2020 Jul 18.
6
Worse in real life: An eye-tracking examination of the cost of CAD at low prevalence.实际情况更糟:在低患病率下 CAD 的代价:一项眼动追踪研究。
J Exp Psychol Appl. 2020 Dec;26(4):659-670. doi: 10.1037/xap0000277. Epub 2020 May 7.
7
Not All Information Is Equal: Effects of Disclosing Different Types of Likelihood Information on Trust, Compliance and Reliance, and Task Performance in Human-Automation Teaming.并非所有信息都是平等的:在人机协作中披露不同类型可能性信息对信任、遵从和依赖的影响,以及对任务绩效的影响。
Hum Factors. 2020 Sep;62(6):987-1001. doi: 10.1177/0018720819862916. Epub 2019 Jul 26.
8
Introduction matters: Manipulating trust in automation and reliance in automated driving.引言很重要:操纵对自动化的信任和对自动驾驶的依赖。
Appl Ergon. 2018 Jan;66:18-31. doi: 10.1016/j.apergo.2017.07.006. Epub 2017 Aug 12.
9
The influence of attention on value integration.注意力对价值整合的影响。
Atten Percept Psychophys. 2017 Aug;79(6):1615-1627. doi: 10.3758/s13414-017-1340-7.
10
Low prevalence search for cancers in mammograms: Evidence using laboratory experiments and computer aided detection.乳腺钼靶片中癌症的低患病率筛查:基于实验室实验和计算机辅助检测的证据
J Exp Psychol Appl. 2017 Dec;23(4):369-385. doi: 10.1037/xap0000132. Epub 2017 May 25.