Istituto per le Applicazioni del Calcolo "Mauro Picone", Consiglio Nazionale delle Ricerche, Via dei Taurini 19, 00185, Rome, Italy.
Dipartimento di Scienze Statistiche, Università di Roma "La Sapienza", piazzale Aldo Moro 5, 00185, Rome, Italy.
Sci Rep. 2022 Jun 13;12(1):9757. doi: 10.1038/s41598-022-12710-7.
We present a new method for assessing and measuring homophily in networks whose nodes have categorical attributes, namely when the nodes of networks come partitioned into classes (colors). We probe this method in two different classes of networks: (i) protein-protein interaction (PPI) networks, where nodes correspond to proteins, partitioned according to their functional role, and edges represent functional interactions between proteins (ii) Pokec on-line social network, where nodes correspond to users, partitioned according to their age, and edges respresent friendship between users.Similarly to other classical and well consolidated approaches, our method compares the relative edge density of the subgraphs induced by each class with the corresponding expected relative edge density under a null model. The novelty of our approach consists in prescribing an endogenous null model, namely, the sample space of the null model is built on the input network itself. This allows us to give exact explicit expression for the [Formula: see text]-score of the relative edge density of each class as well as other related statistics. The [Formula: see text]-scores directly quantify the statistical significance of the observed homophily via Čebyšëv inequality. The expression of each [Formula: see text]-score is entered by the network structure through basic combinatorial invariant such as the number of subgraphs with two spanning edges. Each [Formula: see text]-score is computed in [Formula: see text] time for a network with n nodes and m edges. This leads to an overall efficient computational method for assesing homophily. We complement the analysis of homophily/heterophily by considering [Formula: see text]-scores of the number of isolated nodes in the subgraphs induced by each class, that are computed in O(nm) time. Theoretical results are then exploited to show that, as expected, both the analyzed network classes are significantly homophilic with respect to the considered node properties.
我们提出了一种新的方法来评估和测量具有类别属性的网络中的同质性,即当网络的节点被划分为类(颜色)时。我们在两类不同的网络中探查了这种方法:(i)蛋白质-蛋白质相互作用(PPI)网络,其中节点对应于蛋白质,根据其功能角色进行划分,边表示蛋白质之间的功能相互作用;(ii)Pokec 在线社交网络,其中节点对应于用户,根据年龄进行划分,边表示用户之间的友谊。与其他经典且成熟的方法类似,我们的方法比较了每个类诱导的子图的相对边密度与在零模型下的相应预期相对边密度。我们的方法的新颖之处在于规定了一个内源性零模型,即零模型的样本空间是基于输入网络本身构建的。这使我们能够为每个类的相对边密度的[Formula: see text]-得分以及其他相关统计量给出确切的显式表达式。[Formula: see text]-得分通过Čebyšëv 不等式直接量化观察到的同质性的统计显著性。每个[Formula: see text]-得分的表达式通过基本组合不变量(例如具有两个跨越边的子图的数量)由网络结构输入。对于具有 n 个节点和 m 条边的网络,每个[Formula: see text]-得分的计算时间为[Formula: see text]。这导致了一种用于评估同质性的整体有效计算方法。我们通过考虑每个类诱导的子图中孤立节点数量的[Formula: see text]-得分来补充同质性/异质性分析,该得分的计算时间为 O(nm)。然后利用理论结果表明,正如预期的那样,考虑到所考虑的节点属性,所分析的网络类在很大程度上是同质性的。