通过从噪声数据生成网络变体池实现的网络分析

Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data.

作者信息

Mandviwalla Aamir, Elsisy Amr, Atique Muhammad Saad, Kuzmin Konstantin, Gaiteri Chris, Szymanski Boleslaw K

机构信息

Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.

Network Science and Technology Center, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.

出版信息

Entropy (Basel). 2023 Jul 26;25(8):1118. doi: 10.3390/e25081118.

DOI:10.3390/e25081118

PMID:37628148

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10453411/

Abstract

Mapping network nodes and edges to communities and network functions is crucial to gaining a higher level of understanding of the network structure and functions. Such mappings are particularly challenging to design for covert social networks, which intentionally hide their structure and functions to protect important members from attacks or arrests. Here, we focus on correctly inferring the structures and functions of such networks, but our methodology can be broadly applied. Without the ground truth, knowledge about the allocation of nodes to communities and network functions, no single network based on the noisy data can represent all plausible communities and functions of the true underlying network. To address this limitation, we apply a generative model that randomly distorts the original network based on the noisy data, generating a pool of statistically equivalent networks. Each unique generated network is recorded, while each duplicate of the already recorded network just increases the repetition count of that network. We treat each such network as a variant of the ground truth with the probability of arising in the real world approximated by the ratio of the count of this network's duplicates plus one to the total number of all generated networks. Communities of variants with frequently occurring duplicates contain persistent patterns shared by their structures. Using Shannon entropy, we can find a variant that minimizes the uncertainty for operations planned on the network. Repeatedly generating new pools of networks from the best network of the previous step for several steps lowers the entropy of the best new variant. If the entropy is too high, the network operators can identify nodes, the monitoring of which can achieve the most significant reduction in entropy. Finally, we also present a heuristic for constructing a new variant, which is not randomly generated but has the lowest expected cost of operating on the distorted mappings of network nodes to communities and functions caused by noisy data.

摘要

将网络节点和边映射到社区及网络功能对于更深入理解网络结构和功能至关重要。对于隐蔽社交网络而言，设计这样的映射极具挑战性，因为这类网络会故意隐藏其结构和功能，以保护重要成员免受攻击或逮捕。在此，我们专注于正确推断此类网络的结构和功能，不过我们的方法具有广泛适用性。在缺乏关于节点到社区及网络功能分配的真实情况（即“地面真值”）的情况下，基于有噪声数据的单个网络无法代表真实底层网络的所有合理社区和功能。为解决这一限制，我们应用一种生成模型，该模型基于有噪声数据对原始网络进行随机扭曲，生成一组统计上等效的网络。每个唯一生成的网络都会被记录下来，而已经记录的网络的重复副本只会增加该网络的重复计数。我们将每个这样的网络视为地面真值的一种变体，其在现实世界中出现的概率由该网络重复副本的计数加一与所有生成网络总数的比值近似表示。具有频繁出现重复副本的变体社区包含其结构所共有的持久模式。利用香农熵，我们可以找到一个能使针对网络规划的操作的不确定性最小化的变体。从先前步骤的最佳网络开始，重复几步生成新的网络池，会降低最佳新变体的熵。如果熵过高，网络运营商可以识别出那些对其进行监测能使熵显著降低的节点。最后，我们还提出了一种启发式方法来构建一个新变体，该变体不是随机生成的，而是在由噪声数据导致的网络节点到社区及功能的扭曲映射上操作时具有最低的预期成本。