Wu Leilei, Yi Lingling, Ren Xiao-Long, Lü Linyuan
Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, China.
Department of Physics, University of Fribourg, CH-1700 Fribourg, Switzerland.
Entropy (Basel). 2023 Jun 9;25(6):916. doi: 10.3390/e25060916.
The ability to predict the size of information cascades in online social networks is crucial for various applications, including decision-making and viral marketing. However, traditional methods either rely on complicated time-varying features that are challenging to extract from multilingual and cross-platform content, or on network structures and properties that are often difficult to obtain. To address these issues, we conducted empirical research using data from two well-known social networking platforms, WeChat and Weibo. Our findings suggest that the information-cascading process is best described as an activate-decay dynamic process. Building on these insights, we developed an activate-decay (AD)-based algorithm that can accurately predict the long-term popularity of online content based solely on its early repost amount. We tested our algorithm using data from WeChat and Weibo, demonstrating that we could fit the evolution trend of content propagation and predict the longer-term dynamics of message forwarding from earlier data. We also discovered a close correlation between the peak forwarding amount of information and the total amount of dissemination. Finding the peak of the amount of information dissemination can significantly improve the prediction accuracy of our model. Our method also outperformed existing baseline methods for predicting the popularity of information.
预测在线社交网络中信息传播级联的规模,对于包括决策制定和病毒式营销在内的各种应用都至关重要。然而,传统方法要么依赖于难以从多语言和跨平台内容中提取的复杂时变特征,要么依赖于往往难以获取的网络结构和属性。为了解决这些问题,我们使用来自两个知名社交网络平台微信和微博的数据进行了实证研究。我们的研究结果表明,信息传播过程最好被描述为一个激活-衰减动态过程。基于这些见解,我们开发了一种基于激活-衰减(AD)的算法,该算法仅根据在线内容的早期转发量就能准确预测其长期受欢迎程度。我们使用微信和微博的数据对算法进行了测试,结果表明我们能够拟合内容传播的演变趋势,并从早期数据预测消息转发的长期动态。我们还发现信息的峰值转发量与传播总量之间存在密切关联。找到信息传播量的峰值可以显著提高我们模型的预测准确性。我们的方法在预测信息受欢迎程度方面也优于现有的基线方法。