Hu Ying, Hu Changjun, Fu Shushen, Fang Mingzhe, Xu Wenwen
Department of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China.
PLoS One. 2017 Jan 3;12(1):e0168749. doi: 10.1371/journal.pone.0168749. eCollection 2017.
The popularity of online information generally experiences a rising and falling evolution. This paper considers the "burst", "peak", and "fade" key events together as a representative summary of popularity evolution. We propose a novel prediction task-predicting when popularity undergoes these key events. It is of great importance to know when these three key events occur, because doing so helps recommendation systems, online marketing, and containment of rumors. However, it is very challenging to solve this new prediction task due to two issues. First, popularity evolution has high variation and can follow various patterns, so how can we identify "burst", "peak", and "fade" in different patterns of popularity evolution? Second, these events usually occur in a very short time, so how can we accurately yet promptly predict them? In this paper we address these two issues. To handle the first one, we use a simple moving average to smooth variation, and then a universal method is presented for different patterns to identify the key events in popularity evolution. To deal with the second one, we extract different types of features that may have an impact on the key events, and then a correlation analysis is conducted in the feature selection step to remove irrelevant and redundant features. The remaining features are used to train a machine learning model. The feature selection step improves prediction accuracy, and in order to emphasize prediction promptness, we design a new evaluation metric which considers both accuracy and promptness to evaluate our prediction task. Experimental and comparative results show the superiority of our prediction solution.
在线信息的流行度通常会经历一个起伏演变的过程。本文将“爆发”“峰值”和“衰落”这几个关键事件视为流行度演变的一个代表性总结。我们提出了一项新颖的预测任务——预测流行度何时经历这些关键事件。了解这三个关键事件何时发生非常重要,因为这样有助于推荐系统、网络营销以及谣言控制。然而,由于两个问题,解决这项新的预测任务极具挑战性。首先,流行度演变具有高度的变化性,且可以遵循各种模式,那么我们如何在不同的流行度演变模式中识别“爆发”“峰值”和“衰落”呢?其次,这些事件通常在很短的时间内发生,那么我们如何准确且及时地预测它们呢?在本文中,我们解决这两个问题。为处理第一个问题,我们使用简单移动平均来平滑变化,然后针对不同模式提出一种通用方法来识别流行度演变中的关键事件。为处理第二个问题,我们提取可能对关键事件有影响的不同类型特征,然后在特征选择步骤中进行相关性分析以去除不相关和冗余的特征。剩余的特征用于训练机器学习模型。特征选择步骤提高了预测准确性,并且为了强调预测及时性,我们设计了一种新的评估指标来同时考虑准确性和及时性,以评估我们的预测任务。实验和对比结果显示了我们预测解决方案的优越性。