用于时态网络的顺序堆叠链路预测算法

Sequential stacking link prediction algorithms for temporal networks.

作者信息

He Xie, Ghasemian Amir, Lee Eun, Clauset Aaron, Mucha Peter J

机构信息

Department of Mathematics, Dartmouth College, Hanover, NH, USA.

Yale Institute for Network Science, Yale University, New Haven, CT, USA.

出版信息

Nat Commun. 2024 Feb 14;15(1):1364. doi: 10.1038/s41467-024-45598-0.

DOI:10.1038/s41467-024-45598-0

PMID:38355612

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10866871/

Abstract

Link prediction algorithms are indispensable tools in many scientific applications by speeding up network data collection and imputing missing connections. However, in many systems, links change over time and it remains unclear how to optimally exploit such temporal information for link predictions in such networks. Here, we show that many temporal topological features, in addition to having high computational cost, are less accurate in temporal link prediction than sequentially stacked static network features. This sequential stacking link prediction method uses 41 static network features that avoid detailed feature engineering choices and is capable of learning a highly accurate predictive distribution of future connections from historical data. We demonstrate that this algorithm works well for both partially observed and completely unobserved target layers, and on two temporal stochastic block models achieves near-oracle-level performance when combined with other single predictor methods as an ensemble learning method. Finally, we empirically illustrate that stacking multiple predictive methods together further improves performance on 19 real-world temporal networks from different domains.

摘要

链接预测算法通过加速网络数据收集和推断缺失连接，成为许多科学应用中不可或缺的工具。然而，在许多系统中，链接会随时间变化，目前尚不清楚如何在这类网络中最优地利用此类时间信息进行链接预测。在这里，我们表明，许多时间拓扑特征除了计算成本高之外，在时间链接预测中比顺序堆叠的静态网络特征准确性更低。这种顺序堆叠链接预测方法使用41个静态网络特征，避免了详细的特征工程选择，并且能够从历史数据中学习未来连接的高精度预测分布。我们证明，该算法对于部分观察到的和完全未观察到的目标层都有效，并且在两个时间随机块模型上，当与其他单一预测器方法作为集成学习方法结合使用时，可实现接近最优水平的性能。最后，我们通过实验表明，将多种预测方法堆叠在一起可进一步提高在19个来自不同领域的真实世界时间网络上的性能。