Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
Department of Computing and Systems, Universidade Federal de Ouro Preto, João Monlevade, Minas Gerais, Brazil.
PLoS One. 2022 Sep 15;17(9):e0274218. doi: 10.1371/journal.pone.0274218. eCollection 2022.
Collective user behavior in social media applications often drives several important online and offline phenomena linked to the spread of opinions and information. Several studies have focused on the analysis of such phenomena using networks to model user interactions, represented by edges. However, only a fraction of edges contribute to the actual investigation. Even worse, the often large number of non-relevant edges may obfuscate the salient interactions, blurring the underlying structures and user communities that capture the collective behavior patterns driving the target phenomenon. To solve this issue, researchers have proposed several network backbone extraction techniques to obtain a reduced and representative version of the network that better explains the phenomenon of interest. Each technique has its specific assumptions and procedure to extract the backbone. However, the literature lacks a clear methodology to highlight such assumptions, discuss how they affect the choice of a method and offer validation strategies in scenarios where no ground truth exists. In this work, we fill this gap by proposing a principled methodology for comparing and selecting the most appropriate backbone extraction method given a phenomenon of interest. We characterize ten state-of-the-art techniques in terms of their assumptions, requirements, and other aspects that one must consider to apply them in practice. We present four steps to apply, evaluate and select the best method(s) to a given target phenomenon. We validate our approach using two case studies with different requirements: online discussions on Instagram and coordinated behavior in WhatsApp groups. We show that each method can produce very different backbones, underlying that the choice of an adequate method is of utmost importance to reveal valuable knowledge about the particular phenomenon under investigation.
社交媒体应用中的集体用户行为通常会引发几种与意见和信息传播相关的重要线上和线下现象。一些研究已经集中在使用网络对用户交互进行建模的分析上,这些交互由边表示。然而,只有一小部分边对实际调查有贡献。更糟糕的是,大量不相关的边可能会使重要的交互变得模糊不清,掩盖了捕捉驱动目标现象的集体行为模式的基本结构和用户社区。为了解决这个问题,研究人员已经提出了几种网络骨干提取技术,以获得一个简化且有代表性的网络版本,从而更好地解释感兴趣的现象。每种技术都有其特定的假设和提取骨干的过程。然而,文献中缺乏一种明确的方法来突出这些假设,讨论它们如何影响方法的选择,并在没有真实情况的情况下提供验证策略。在这项工作中,我们通过提出一种原则性的方法来填补这一空白,该方法可以根据感兴趣的现象比较和选择最合适的骨干提取方法。我们从假设、要求和其他方面对十种最先进的技术进行了描述,这些方面是在实践中应用它们必须考虑的。我们提出了四个步骤来应用、评估和选择最佳方法来解决给定的目标现象。我们使用两个具有不同要求的案例研究来验证我们的方法:Instagram 上的在线讨论和 WhatsApp 群组中的协调行为。我们表明,每种方法都可以产生非常不同的骨干,这表明选择适当的方法对于揭示有关特定研究现象的有价值的知识至关重要。