Suppr超能文献

基于拓扑先验从不完整图数据进行转移矩阵的贝叶斯推断。

Bayesian inference of transition matrices from incomplete graph data with a topological prior.

作者信息

Perri Vincenzo, Petrović Luka V, Scholtes Ingo

机构信息

Data Analytics Group, Department of Informatics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zurich, Switzerland.

Chair of Machine Learning for Complex Networks, Center for Artificial Intelligence and Data Science, Julius-Maximilians-Universität Würzburg, John-Skilton-Strasse 8a, 97074 Würzburg, Germany.

出版信息

EPJ Data Sci. 2023;12(1):48. doi: 10.1140/epjds/s13688-023-00416-3. Epub 2023 Oct 11.

Abstract

Many network analysis and graph learning techniques are based on discrete- or continuous-time models of random walks. To apply these methods, it is necessary to infer transition matrices that formalize the underlying stochastic process in an observed graph. For weighted graphs, where weighted edges capture observations of repeated interactions between nodes, it is common to estimate the entries of such transition matrices based on the (relative) weights of edges. However in real-world settings we are often confronted with incomplete data, which turns the construction of the transition matrix based on a weighted graph into an . Moreover, we often have access to additional information, which capture topological constraints of the system, i.e. which edges in a weighted graph are (theoretically) possible and which are not. Examples include transportation networks, where we may have access to a small sample of passenger trajectories as well as the physical topology of connections, or a limited set of observed social interactions with additional information on the underlying social structure. Combining these two different sources of information to reliably infer transition matrices from incomplete data on repeated interactions is an important open challenge, with severe implications for the reliability of downstream network analysis tasks. Addressing this issue, we show that including knowledge on such topological constraints can considerably improve the inference of transition matrices, especially in situations where we only have a small number of observed interactions. To this end, we derive an analytically tractable Bayesian method that uses repeated interactions and a topological prior to perform data-efficient inference of transition matrices. We compare our approach against commonly used frequentist and Bayesian approaches both in synthetic data and in five real-world datasets, and we find that our method recovers the transition probabilities with higher accuracy. Furthermore, we demonstrate that the method is robust even in cases when the knowledge of the topological constraint is partial. Lastly, we show that this higher accuracy improves the results for downstream network analysis tasks like cluster detection and node ranking, which highlights the practical relevance of our method for interdisciplinary data-driven analyses of networked systems.

摘要

许多网络分析和图学习技术都基于随机游走的离散或连续时间模型。要应用这些方法,有必要推断出能将观测图中潜在随机过程形式化的转移矩阵。对于加权图,其中加权边捕获了节点间重复交互的观测值,通常基于边的(相对)权重来估计此类转移矩阵的元素。然而在现实环境中,我们常常面临不完整的数据,这使得基于加权图构建转移矩阵变得困难。此外,我们通常可以获取额外信息,这些信息捕获了系统的拓扑约束,即加权图中的哪些边(理论上)是可能的,哪些是不可能的。例如交通网络,我们可能有一小部分乘客轨迹样本以及连接的物理拓扑结构,或者一组有限的观测到的社交互动以及关于潜在社会结构的额外信息。将这两种不同的信息源结合起来,从关于重复交互的不完整数据中可靠地推断转移矩阵,是一个重要的开放性挑战,对下游网络分析任务的可靠性有严重影响。为了解决这个问题,我们表明纳入关于此类拓扑约束的知识可以显著改进转移矩阵的推断,特别是在我们只有少量观测到的交互的情况下。为此,我们推导了一种解析上易于处理的贝叶斯方法,该方法使用重复交互和拓扑先验来进行转移矩阵的数据高效推断。我们在合成数据和五个真实世界数据集上,将我们的方法与常用的频率主义和贝叶斯方法进行比较,发现我们的方法能以更高的精度恢复转移概率。此外,我们证明即使在拓扑约束知识不完整的情况下,该方法也很稳健。最后,我们表明这种更高的精度改善了下游网络分析任务(如聚类检测和节点排名)的结果,这突出了我们的方法在网络系统跨学科数据驱动分析中的实际相关性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b07/10567898/6366cf184196/13688_2023_416_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验