Department of Mathematics and Computer Science, Rutgers University.
Center for Perceptual Systems, University of Texas at Austin.
Cogn Sci. 2023 Apr;47(4):e13279. doi: 10.1111/cogs.13279.
The enormous scale of the available information and products on the Internet has necessitated the development of algorithms that intermediate between options and human users. These algorithms attempt to provide the user with relevant information. In doing so, the algorithms may incur potential negative consequences stemming from the need to select items about which it is uncertain to obtain information about users versus the need to select items about which it is certain to secure high ratings. This tension is an instance of the exploration-exploitation trade-off in the context of recommender systems. Because humans are in this interaction loop, the long-term trade-off behavior depends on human variability. Our goal is to characterize the trade-off behavior as a function of human variability fundamental to such human-algorithm interaction. To tackle the characterization, we first introduce a unifying model that smoothly transitions between active learning and recommending relevant information. The unifying model gives us access to a continuum of algorithms along the exploration-exploitation trade-off. We then present two experiments to measure the trade-off behavior under two very different levels of human variability. The experimental results inform a thorough simulation study in which we modeled and varied human variability systematically over a wide rage. The main result is that exploration-exploitation trade-off grows in severity as human variability increases, but there exists a regime of low variability where algorithms balanced in exploration and exploitation can largely overcome the trade-off.
互联网上可用信息和产品的巨大规模使得开发在选项和用户之间进行中介的算法成为必要。这些算法试图为用户提供相关信息。在这样做的过程中,算法可能会产生潜在的负面后果,这些后果源于需要选择不确定能够获取用户信息的项目与需要选择确定能够获得高评分的项目之间的权衡。这种紧张关系是推荐系统中探索-利用权衡的一个实例。由于人类处于这种交互循环中,长期的权衡行为取决于人类的可变性。我们的目标是将这种权衡行为作为人类与算法交互的基本人类可变性的函数来进行描述。为了解决这个问题,我们首先引入了一个统一的模型,该模型在主动学习和推荐相关信息之间平稳过渡。该统一模型使我们能够访问沿探索-利用权衡的一系列算法。然后,我们进行了两项实验,以在两种非常不同的人类可变性水平下测量权衡行为。实验结果为我们提供了一个深入的模拟研究,在该研究中,我们对人类可变性进行了系统建模和广泛的变化。主要结果是,随着人类可变性的增加,探索-利用权衡的严重性会增加,但在可变性较低的情况下,能够在探索和利用之间取得平衡的算法可以在很大程度上克服这种权衡。