Department of Computer Science, Aalto University, Aalto, Finland.
Department of Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden.
Big Data. 2020 Oct;8(5):335-362. doi: 10.1089/big.2020.0078. Epub 2020 Oct 5.
We study a family of pattern-detection problems in vertex-colored temporal graphs. In particular, given a vertex-colored temporal graph and a multiset of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several applications, for example, in recommending tours for tourists or detecting abnormal behavior in a network of financial transactions. For the family of pattern-detection problems we consider, we establish complexity results and design an algebraic-algorithmic framework based on constrained multilinear sieving. We demonstrate that our solution scales to massive graphs with up to a billion edges for a multiset query with 5 colors and up to 100 million edges for a multiset query with 10 colors, despite the problems being non-deterministic polynomial time-hard. Our implementation, which is publicly available, exhibits practical edge-linear scalability and is highly optimized. For instance, in a real-world graph dataset with >6 million edges and a multiset query with 10 colors, we can extract an optimal solution in <8 minutes on a Haswell desktop with four cores.
我们研究了顶点着色时变图中的一类模式检测问题。具体来说,给定一个顶点着色时变图和一个多色集合作为查询,我们在图中搜索包含查询中指定颜色的时变路径。这类问题有多种应用,例如为游客推荐旅游路线或检测金融交易网络中的异常行为。对于我们考虑的模式检测问题族,我们建立了复杂性结果,并设计了一个基于约束多线性筛选的代数算法框架。我们证明,尽管这些问题是不确定多项式时间难问题,但我们的解决方案可以扩展到具有数十亿条边的大规模图,对于包含 5 种颜色的多色集合查询,可以扩展到具有 1 亿条边的大规模图,而查询时间复杂度仍然保持在多项式级别。我们的实现是公开的,具有实用的边线性可扩展性和高度优化。例如,在一个具有超过 600 万条边的真实世界图数据集和一个包含 10 种颜色的多色集合查询中,我们可以在具有四个内核的 Haswell 台式机上在<8 分钟内提取最优解决方案。