Stolarski Mateusz, Piróg Adam, Bródka Piotr
Department of Artificial Intelligence, Wrocław University of Science and Technology, 50-370 Wrocław, Poland.
4Semantics, 00-833 Warszawa, Poland.
Entropy (Basel). 2024 Nov 6;26(11):955. doi: 10.3390/e26110955.
The identification of key nodes in complex networks is an important topic in many network science areas. It is vital to a variety of real-world applications, including viral marketing, epidemic spreading and influence maximization. In recent years, machine learning algorithms have proven to outperform the conventional, centrality-based methods in accuracy and consistency, but this approach still requires further refinement. What information about the influencers can be extracted from the network? How can we precisely obtain the labels required for training? Can these models generalize well? In this paper, we answer these questions by presenting an enhanced machine learning-based framework for the influence spread problem. We focus on identifying key nodes for the Independent Cascade model, which is a popular reference method. Our main contribution is an improved process of obtaining the labels required for training by introducing "Smart Bins" and proving their advantage over known methods. Next, we show that our methodology allows ML models to not only predict the influence of a given node, but to also determine other characteristics of the spreading process-which is another novelty to the relevant literature. Finally, we extensively test our framework and its ability to generalize beyond complex networks of different types and sizes, gaining important insight into the properties of these methods.
复杂网络中关键节点的识别是许多网络科学领域的一个重要课题。它对包括病毒式营销、流行病传播和影响力最大化在内的各种实际应用至关重要。近年来,机器学习算法已被证明在准确性和一致性方面优于传统的基于中心性的方法,但这种方法仍需要进一步完善。可以从网络中提取哪些关于有影响力者的信息?我们如何精确获得训练所需的标签?这些模型能很好地泛化吗?在本文中,我们通过提出一个针对影响力传播问题的基于机器学习的增强框架来回答这些问题。我们专注于为独立级联模型识别关键节点,该模型是一种流行的参考方法。我们的主要贡献是通过引入“智能箱”并证明其相对于已知方法的优势,改进了获取训练所需标签的过程。接下来,我们表明我们的方法不仅允许机器学习模型预测给定节点的影响力,还能确定传播过程的其他特征——这是相关文献中的另一个新颖之处。最后,我们广泛测试了我们的框架及其在不同类型和规模的复杂网络之外进行泛化的能力,从而深入了解这些方法的特性。