Corvelo Benz Nina L, Gomez Rodriguez Manuel
Max Planck Institute for Software Systems, Kaiserslautern, 67663, Germany.
Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4056, Switzerland.
Sci Rep. 2025 Aug 9;15(1):29154. doi: 10.1038/s41598-025-12205-1.
Whenever an AI model is used to predict a relevant (binary) outcome in AI-assisted decision making, it is widely agreed that, together with each prediction, the model should provide an AI confidence value. However, it has been unclear why decision makers have often difficulties to develop a good sense on when to trust a prediction using AI confidence values. Very recently, Corvelo Benz and Gomez Rodriguez have argued that, for rational decision makers, the utility of AI-assisted decision making is inherently bounded by the degree of alignment between the AI confidence values and the decision maker's confidence on their own predictions. In this work, we empirically investigate to what extent the degree of alignment actually influences the utility of AI-assisted decision making. To this end, we design and run a large-scale human subject study ([Formula: see text]) where participants solve a simple decision making task-an online card game-assisted by an AI model with a steerable degree of alignment. Our results show a positive association between the degree of alignment and the utility of AI-assisted decision making. In addition, our results also show that post-processing the AI confidence values to achieve multicalibration with respect to the participants' confidence on their own predictions increases both the degree of alignment and the utility of AI-assisted decision making.
每当在人工智能辅助决策中使用人工智能模型来预测相关(二元)结果时,人们普遍认为,模型除了给出每个预测外,还应提供一个人工智能置信度值。然而,尚不清楚为何决策者在依据人工智能置信度值判断何时该信任某个预测时常常感到困难。最近,科韦洛·本茨和戈麦斯·罗德里格斯指出,对于理性决策者而言,人工智能辅助决策的效用本质上受到人工智能置信度值与决策者对自身预测的置信度之间的匹配程度的限制。在这项研究中,我们通过实证研究来探究匹配程度究竟在多大程度上影响人工智能辅助决策的效用。为此,我们设计并开展了一项大规模的人体实验研究([公式:见正文]),让参与者在一个由人工智能模型辅助的简单决策任务——一款在线纸牌游戏中进行决策,该人工智能模型的匹配程度是可调控的。我们的研究结果表明,匹配程度与人工智能辅助决策的效用之间存在正相关关系。此外,我们的研究结果还表明,对人工智能置信度值进行后处理,以实现与参与者对自身预测的置信度的多校准,会同时提高匹配程度和人工智能辅助决策的效用。