Zheng Yongsen, Qin Jinghui, Wei Pengxu, Chen Ziliang, Lin Liang
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17123-17136. doi: 10.1109/TNNLS.2023.3299929. Epub 2024 Dec 2.
Popularity bias, as a long-standing problem in recommender systems (RSs), has been fully considered and explored for offline recommendation systems in most existing relevant researches, but very few studies have paid attention to eliminate such bias in online interactive recommendation scenarios. Bias amplification will become increasingly serious over time due to the existence of feedback loop between the user and the interactive system. However, existing methods have only investigated the causal relations among different factors statically without considering temporal dependencies inherent in the online interactive recommendation system, making them difficult to be adapted to online settings. To address these problems, we propose a novel counterfactual interactive policy learning (CIPL) method to eliminate popularity bias for online recommendation. It first scrutinizes the causal relations in the interactive recommender models and formulates a novel temporal causal graph (TCG) to guide the training and counterfactual inference of the causal interactive recommendation system. Concretely, TCG is used to estimate the causal relations of item popularity on prediction score when the user interacts with the system at each time during model training. Besides, it is also used to remove the negative effect of popularity bias in the test stage. To train the causal interactive recommendation system, we formulated our CIPL by the actor-critic framework with an online interactive environment simulator. We conduct extensive experiments on three public benchmarks and the experimental results demonstrate that our proposed method can achieve the new state-of-the-art performance.
流行度偏差作为推荐系统(RS)中一个长期存在的问题,在大多数现有相关研究中已针对离线推荐系统进行了充分考虑和探索,但很少有研究关注在在线交互推荐场景中消除这种偏差。由于用户与交互系统之间存在反馈回路,随着时间的推移,偏差放大将变得越来越严重。然而,现有方法仅静态地研究了不同因素之间的因果关系,而没有考虑在线交互推荐系统中固有的时间依赖性,这使得它们难以适用于在线环境。为了解决这些问题,我们提出了一种新颖的反事实交互策略学习(CIPL)方法来消除在线推荐中的流行度偏差。它首先仔细研究交互推荐模型中的因果关系,并构建一个新颖的时间因果图(TCG)来指导因果交互推荐系统的训练和反事实推理。具体而言,在模型训练期间,当用户每次与系统交互时,TCG用于估计项目流行度对预测分数的因果关系。此外,它还用于在测试阶段消除流行度偏差的负面影响。为了训练因果交互推荐系统,我们通过带有在线交互环境模拟器的演员-评论家框架来制定我们 的CIPL。我们在三个公共基准上进行了广泛的实验,实验结果表明我们提出的方法可以实现新的最优性能。