Gansterer Wilfried N, Niederbrucker Gerhard, Straková Hana, Schulze Grotthoff Stefan
University of Vienna, Research Group Theory and Applications of Algorithms, Währinger Straße 29, A-1090 Vienna, Austria.
J Comput Sci. 2013 Nov;4(6):480-488. doi: 10.1016/j.jocs.2013.01.006.
The construction of distributed algorithms for matrix computations built on top of distributed data aggregation algorithms with randomized communication schedules is investigated. For this purpose, a new aggregation algorithm for summing or averaging distributed values, the push-flow algorithm, is developed, which achieves superior resilience properties with respect to failures compared to existing aggregation methods. It is illustrated that on a hypercube topology it asymptotically requires the same number of iterations as the optimal all-to-all reduction operation and that it scales well with the number of nodes. Orthogonalization is studied as a prototypical matrix computation task. A new fault tolerant distributed orthogonalization method rdmGS, which can produce accurate results even in the presence of node failures, is built on top of distributed data aggregation algorithms.
研究了基于具有随机通信调度的分布式数据聚合算法构建的用于矩阵计算的分布式算法。为此,开发了一种用于对分布式值求和或求平均的新聚合算法——推流算法,与现有聚合方法相比,该算法在故障恢复方面具有卓越的性能。结果表明,在超立方体拓扑结构上,它渐近地需要与最优全对全归约操作相同的迭代次数,并且随着节点数量的增加扩展性良好。将正交化作为一个典型的矩阵计算任务进行研究。一种新的容错分布式正交化方法rdmGS,它基于分布式数据聚合算法构建,即使在存在节点故障的情况下也能产生准确的结果。