Alon Nitay, Schulz Lion, Rosenschein Jeffrey S, Dayan Peter
Department of Computer Science, The Hebrew University of Jerusalem, Jerusalem, Israel.
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
Open Mind (Camb). 2023 Aug 20;7:608-624. doi: 10.1162/opmi_a_00097. eCollection 2023.
In complex situations involving communication, agents might attempt to mask their intentions, exploiting Shannon's theory of information as a theory of misinformation. Here, we introduce and analyze a simple multiagent reinforcement learning task where a buyer sends signals to a seller via its actions, and in which both agents are endowed with a recursive theory of mind. We show that this theory of mind, coupled with pure reward-maximization, gives rise to agents that selectively distort messages and become skeptical towards one another. Using information theory to analyze these interactions, we show how savvy buyers reduce mutual information between their preferences and actions, and how suspicious sellers learn to reinterpret or discard buyers' signals in a strategic manner.
在涉及交流的复杂情境中,智能体可能会试图掩盖其意图,将香农信息论当作错误信息论加以利用。在此,我们引入并分析一个简单的多智能体强化学习任务,其中买家通过其行动向卖家发送信号,且两个智能体都具备递归心理理论。我们表明,这种心理理论与纯粹的奖励最大化相结合,会导致智能体选择性地扭曲信息并相互产生怀疑。利用信息论来分析这些交互作用,我们展示了精明的买家如何减少其偏好与行动之间的互信息,以及多疑的卖家如何学会以策略性方式重新解读或摒弃买家的信号。