Du Nan, Liang Yingyu, Balcan Maria-Florina, Song Le
JMLR Workshop Conf Proc. 2014 Jun;32(2):2016-2024.
Can we learn the influence of a set of people in a social network from cascades of information diffusion? This question is often addressed by a two-stage approach: first learn a diffusion model, and then calculate the influence based on the learned model. Thus, the success of this approach relies heavily on the correctness of the diffusion model which is hard to verify for real world data. In this paper, we exploit the insight that the influence functions in many diffusion models are coverage functions, and propose a novel parameterization of such functions using a convex combination of random basis functions. Moreover, we propose an efficient maximum likelihood based algorithm to learn such functions directly from cascade data, and hence bypass the need to specify a particular diffusion model in advance. We provide both theoretical and empirical analysis for our approach, showing that the proposed approach can provably learn the influence function with low sample complexity, be robust to the unknown diffusion models, and significantly outperform existing approaches in both synthetic and real world data.
我们能否从信息传播的级联中了解社交网络中一组人的影响力?这个问题通常通过两阶段方法来解决:首先学习一个传播模型,然后基于所学模型计算影响力。因此,这种方法的成功很大程度上依赖于传播模型的正确性,而对于真实世界的数据来说,这很难验证。在本文中,我们利用了这样一种见解,即许多传播模型中的影响函数是覆盖函数,并使用随机基函数的凸组合为此类函数提出了一种新颖的参数化方法。此外,我们提出了一种基于最大似然的高效算法,可直接从级联数据中学习此类函数,从而无需事先指定特定的传播模型。我们为我们的方法提供了理论和实证分析,表明所提出的方法能够以低样本复杂度可证明地学习影响函数,对未知的传播模型具有鲁棒性,并且在合成数据和真实世界数据中均显著优于现有方法。