Wang Xiya, Han Yuexing, Wang Bing
School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China.
Zhejiang Laboratory, Hangzhou 311100, China.
Entropy (Basel). 2023 Jul 15;25(7):1068. doi: 10.3390/e25071068.
Network epidemiology plays a fundamental role in understanding the relationship between network structure and epidemic dynamics, among which identifying influential spreaders is especially important. Most previous studies aim to propose a centrality measure based on network topology to reflect the influence of spreaders, which manifest limited universality. Machine learning enhances the identification of influential spreaders by combining multiple centralities. However, several centrality measures utilized in machine learning methods, such as closeness centrality, exhibit high computational complexity when confronted with large network sizes. Here, we propose a two-phase feature selection method for identifying influential spreaders with a reduced feature dimension. Depending on the definition of influential spreaders, we obtain the optimal feature combination for different synthetic networks. Our results demonstrate that when the datasets are mildly or moderately imbalanced, for Barabasi-Albert (BA) scale-free networks, the centralities' combination with the two-hop neighborhood is fundamental, and for Erdős-Rényi (ER) random graphs, the centralities' combination with the degree centrality is essential. Meanwhile, for Watts-Strogatz (WS) small world networks, feature selection is unnecessary. We also conduct experiments on real-world networks, and the features selected display a high similarity with synthetic networks. Our method provides a new path for identifying superspreaders for the control of epidemics.
网络流行病学在理解网络结构与疫情动态之间的关系方面发挥着基础性作用,其中识别有影响力的传播者尤为重要。以往大多数研究旨在基于网络拓扑结构提出一种中心性度量方法,以反映传播者的影响力,但这些方法的普遍性有限。机器学习通过结合多种中心性来增强对有影响力传播者的识别。然而,机器学习方法中使用的一些中心性度量方法,如接近中心性,在面对大规模网络时计算复杂度较高。在此,我们提出一种两阶段特征选择方法,用于在降低特征维度的情况下识别有影响力的传播者。根据有影响力传播者的定义,我们为不同的合成网络获得了最优特征组合。我们的结果表明,当数据集存在轻度或中度不平衡时,对于巴拉巴西 - 阿尔伯特(BA)无标度网络,两跳邻域与中心性的组合至关重要,而对于厄多斯 - 雷尼(ER)随机图,度中心性与中心性的组合必不可少。同时,对于瓦茨 - 斯托加茨(WS)小世界网络,无需进行特征选择。我们还在真实网络上进行了实验,所选特征与合成网络显示出高度相似性。我们的方法为识别超级传播者以控制疫情提供了一条新途径。