Graduate School of Engineering Information and Systems, University of Tsukuba, Tsukuba, Ibaraki, Japan.
Institute of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan.
PLoS One. 2023 Oct 16;18(10):e0293032. doi: 10.1371/journal.pone.0293032. eCollection 2023.
Analyzing the dynamics of information diffusion cascades and accurately predicting their behavior holds significant importance in various applications. In this paper, we concentrate specifically on a recently introduced contrastive cascade graph learning framework, for the task of predicting cascade popularity. This framework follows a pre-training and fine-tuning paradigm to address cascade prediction tasks. In a previous study, the transferability of pre-trained models within the contrastive cascade graph learning framework was examined solely between two social media datasets. However, in our present study, we comprehensively evaluate the transferability of pre-trained models across 13 real datasets and six synthetic datasets. We construct several pre-trained models using real cascades and synthetic cascades generated by the independent cascade model and the Profile model. Then, we fine-tune these pre-trained models on real cascade datasets and evaluate their prediction accuracy based on the mean squared logarithmic error. The main findings derived from our results are as follows. (1) The pre-trained models exhibit transferability across diverse types of real datasets in different domains, encompassing different languages, social media platforms, and diffusion time scales. (2) Synthetic cascade data prove effective for pre-training purposes. The pre-trained models constructed with synthetic cascade data demonstrate comparable effectiveness to those constructed using real data. (3) Synthetic cascade data prove beneficial for fine-tuning the contrastive cascade graph learning models and training other state-of-the-art popularity prediction models. Models trained using a combination of real and synthetic cascades yield significantly lower mean squared logarithmic error compared to those trained solely on real cascades. Our findings affirm the effectiveness of synthetic cascade data in enhancing the accuracy of cascade popularity prediction.
分析信息扩散级联的动态并准确预测其行为在各种应用中具有重要意义。在本文中,我们专注于最近引入的对比级联图学习框架,用于预测级联流行度的任务。该框架采用预训练和微调范例来解决级联预测任务。在之前的研究中,仅在两个社交媒体数据集之间研究了对比级联图学习框架中预训练模型的可转移性。然而,在我们目前的研究中,我们全面评估了预训练模型在 13 个真实数据集和 6 个合成数据集之间的可转移性。我们使用真实级联和由独立级联模型和 Profile 模型生成的合成级联构建了几个预训练模型。然后,我们在真实级联数据集上微调这些预训练模型,并根据均方对数误差评估它们的预测准确性。从我们的结果中得出的主要发现如下。(1) 预训练模型在不同领域的不同类型的真实数据集之间具有可转移性,涵盖不同的语言、社交媒体平台和扩散时间尺度。(2) 合成级联数据对于预训练很有效。使用合成级联数据构建的预训练模型与使用真实数据构建的模型具有相当的有效性。(3) 合成级联数据有利于微调对比级联图学习模型和训练其他最新的流行度预测模型。与仅使用真实级联数据训练的模型相比,使用真实和合成级联数据组合训练的模型的均方对数误差显著降低。我们的研究结果证实了合成级联数据在提高级联流行度预测准确性方面的有效性。