Department of Information Technology-IDLab, Ghent University-IMEC, Technologiepark Zwijnaarde 126, 9052, Ghent, Belgium.
Artificial Intelligence Lab, Computer Science Department, Vrije Universiteit Brussel, 1050, Brussels, Belgium.
Sci Rep. 2024 May 7;14(1):10460. doi: 10.1038/s41598-024-61153-9.
While autonomous artificial agents are assumed to perfectly execute the strategies they are programmed with, humans who design them may make mistakes. These mistakes may lead to a misalignment between the humans' intended goals and their agents' observed behavior, a problem of value alignment. Such an alignment problem may have particularly strong consequences when these autonomous systems are used in social contexts that involve some form of collective risk. By means of an evolutionary game theoretical model, we investigate whether errors in the configuration of artificial agents change the outcome of a collective-risk dilemma, in comparison to a scenario with no delegation. Delegation is here distinguished from no-delegation simply by the moment at which a mistake occurs: either when programming/choosing the agent (in case of delegation) or when executing the actions at each round of the game (in case of no-delegation). We find that, while errors decrease success rate, it is better to delegate and commit to a somewhat flawed strategy, perfectly executed by an autonomous agent, than to commit execution errors directly. Our model also shows that in the long-term, delegation strategies should be favored over no-delegation, if given the choice.
虽然自主人工智能代理被假设能够完美执行它们所编程的策略,但设计它们的人类可能会犯错。这些错误可能导致人类的预期目标与其代理的观察行为之间的不一致,这是一个价值对齐的问题。当这些自主系统被用于涉及某种形式的集体风险的社交环境中时,这种对齐问题可能会产生特别强烈的后果。通过一个进化博弈论模型,我们研究了在没有委托的情况下,与配置人工代理的错误相比,这种错误是否会改变集体风险困境的结果。这里的委托与非委托的区别仅仅在于错误发生的时刻:要么是在编程/选择代理时(如果是委托),要么是在游戏的每一轮执行行动时(如果是无委托)。我们发现,虽然错误会降低成功率,但最好是委托给一个自主代理来执行一个略有缺陷的策略,而不是直接犯执行错误。我们的模型还表明,如果有选择的话,从长远来看,应该优先选择委托策略而不是无委托策略。