Kwon Sejeong, Cha Meeyoung, Jung Kyomin
Graduate School of Culture Technology, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea.
Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea.
PLoS One. 2017 Jan 12;12(1):e0168344. doi: 10.1371/journal.pone.0168344. eCollection 2017.
This study determines the major difference between rumors and non-rumors and explores rumor classification performance levels over varying time windows-from the first three days to nearly two months. A comprehensive set of user, structural, linguistic, and temporal features was examined and their relative strength was compared from near-complete date of Twitter. Our contribution is at providing deep insight into the cumulative spreading patterns of rumors over time as well as at tracking the precise changes in predictive powers across rumor features. Statistical analysis finds that structural and temporal features distinguish rumors from non-rumors over a long-term window, yet they are not available during the initial propagation phase. In contrast, user and linguistic features are readily available and act as a good indicator during the initial propagation phase. Based on these findings, we suggest a new rumor classification algorithm that achieves competitive accuracy over both short and long time windows. These findings provide new insights for explaining rumor mechanism theories and for identifying features of early rumor detection.
本研究确定了谣言与非谣言之间的主要差异,并探讨了在从最初三天到近两个月的不同时间窗口内谣言分类的性能水平。我们检查了一套全面的用户、结构、语言和时间特征,并从近乎完整的推特数据中比较了它们的相对强度。我们的贡献在于深入洞察谣言随时间的累积传播模式,以及追踪谣言特征预测能力的精确变化。统计分析发现,结构和时间特征在长期窗口内可区分谣言与非谣言,但在初始传播阶段无法获取。相比之下,用户和语言特征随时可用,且在初始传播阶段是很好的指标。基于这些发现,我们提出了一种新的谣言分类算法,该算法在短期和长期时间窗口内均能实现具有竞争力的准确率。这些发现为解释谣言机制理论和识别早期谣言检测特征提供了新的见解。