De Pessemier Toon, Willems Bruno, Martens Luc
Ghent University, Belgium, Imec, Belgium, Waves, iGent - Technologiepark 126, Ghent, 9052, Belgium.
Sci Rep. 2025 Jul 8;15(1):24493. doi: 10.1038/s41598-025-09708-2.
A key challenge in recommender systems is how to profile new users. A popular solution for this problem is to use active learning strategies. These strategies request ratings for a small set of carefully selected items to reveal the preferences of new users. In this paper, we propose a new decision tree-based algorithm for selecting these items. Treating the recommender system as a black box, the ratings collected from interviewing new users are passed on to the recommender system with the intention of improving its performance. Extensive offline evaluation with two data sets and various recommender algorithms shows that our algorithm does indeed improve the performance of the underlying recommender algorithm if users are able to rate most of the items that are presented to them during the interview. However, online evaluation with 50 real users could not prove that our algorithm does indeed have a positive impact on the performance of the underlying recommender algorithm. This reveals the discrepancy between offline and online evaluations of active learning techniques applied in the context of recommender systems. This is due to the fact that real users are not always able to rate the item selected by the active learning algorithm and therefore cannot provide the requested information, in contrast to many machine learning scenarios where the labeling of all samples is possible. Hence, further research is required to provide more certainty regarding the impact of active learning strategies on recommender algorithms.
推荐系统中的一个关键挑战是如何描述新用户的特征。针对这个问题,一种流行的解决方案是使用主动学习策略。这些策略要求对一小部分精心挑选的项目进行评分,以揭示新用户的偏好。在本文中,我们提出了一种基于决策树的新算法来选择这些项目。将推荐系统视为一个黑箱,从采访新用户中收集到的评分会传递给推荐系统,目的是提高其性能。使用两个数据集和各种推荐算法进行的广泛离线评估表明,如果用户能够对采访期间向他们展示的大多数项目进行评分,我们的算法确实可以提高基础推荐算法的性能。然而,对50名真实用户的在线评估无法证明我们的算法确实对基础推荐算法的性能有积极影响。这揭示了在推荐系统背景下应用的主动学习技术的离线评估和在线评估之间的差异。这是因为与许多可以对所有样本进行标注的机器学习场景不同,真实用户并不总是能够对主动学习算法选择的项目进行评分,因此无法提供所需信息。因此,需要进一步研究,以更确定主动学习策略对推荐算法的影响。