School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America.
Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.
PLoS One. 2019 Jun 28;14(6):e0217316. doi: 10.1371/journal.pone.0217316. eCollection 2019.
How can we extract hidden relations from a tensor and a matrix data simultaneously in a fast, accurate, and scalable way? Coupled matrix-tensor factorization (CMTF) is an important tool for this purpose. Designing an accurate and efficient CMTF method has become more crucial as the size and dimension of real-world data are growing explosively. However, existing methods for CMTF suffer from lack of accuracy, slow running time, and limited scalability. In this paper, we propose S3CMTF, a fast, accurate, and scalable CMTF method. In contrast to previous methods which do not handle large sparse tensors and are not parallelizable, S3CMTF provides parallel sparse CMTF by carefully deriving gradient update rules. S3CMTF asynchronously updates partial gradients without expensive locking. We show that our method is guaranteed to converge to a quality solution theoretically and empirically. S3CMTF further boosts the performance by carefully storing intermediate computation and reusing them. We theoretically and empirically show that S3CMTF is the fastest, outperforming existing methods. Experimental results show that S3CMTF is up to 930× faster than existing methods while providing the best accuracy. S3CMTF shows linear scalability on the number of data entries and the number of cores. In addition, we apply S3CMTF to Yelp rating tensor data coupled with 3 additional matrices to discover interesting patterns.
我们如何才能以快速、准确和可扩展的方式从张量和矩阵数据中提取隐藏关系?联合矩阵-张量分解 (CMTF) 是实现这一目标的重要工具。随着现实世界数据的规模和维度呈爆炸式增长,设计一种准确、高效的 CMTF 方法变得更加重要。然而,现有的 CMTF 方法存在准确性不足、运行时间慢和可扩展性有限等问题。在本文中,我们提出了 S3CMTF,一种快速、准确和可扩展的 CMTF 方法。与之前的方法不同,S3CMTF 可以处理大型稀疏张量并且是可并行化的,它通过仔细推导梯度更新规则提供了并行稀疏 CMTF。S3CMTF 异步更新部分梯度,而无需昂贵的锁定。我们从理论和实验两方面证明了我们的方法可以保证收敛到高质量的解。S3CMTF 通过仔细存储中间计算并重复使用它们进一步提高了性能。我们从理论和实验两方面都证明了 S3CMTF 是最快的,优于现有方法。实验结果表明,S3CMTF 的速度比现有方法快 930 倍,同时提供了最佳的准确性。S3CMTF 在数据项数量和核心数量上具有线性可扩展性。此外,我们将 S3CMTF 应用于 Yelp 评级张量数据,并与另外 3 个矩阵相结合,以发现有趣的模式。