Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S15. doi: 10.1186/1471-2105-11-S1-S15.
MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees. We introduce MrsRF (MapReduce Speeds up RF), a multi-core algorithm to generate a t x t Robinson-Foulds distance matrix between t trees using the MapReduce paradigm.
We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores. Our results also show that achieving top speedup on a multi-core cluster requires different cluster configurations. Finally, we show how to use an RF matrix to summarize collections of phylogenetic trees visually.
Our results show that MapReduce is a promising paradigm for developing multi-core phylogenetic applications. The results also demonstrate that different multi-core configurations must be tested in order to obtain optimum performance. We conclude that RF matrices play a critical role in developing techniques to summarize large collections of trees.
MapReduce 是一种并行框架,已被有效地用于为大型计算集群设计大规模并行应用程序。在本文中,我们评估了 MapReduce 框架用于设计系统发育应用程序的可行性。我们感兴趣的问题是生成全对全的 Robinson-Foulds 距离矩阵,该矩阵在可视化和聚类大型进化树集合方面有许多应用。我们引入了 MrsRF(MapReduce 加速 RF),这是一种使用 MapReduce 范例生成 t 个树之间的 t x t Robinson-Foulds 距离矩阵的多核算法。
我们在两个大型生物树集上研究了我们的 MrsRF 算法的性能,这两个树集分别包含 150 个分类群的 20000 棵树和 567 个分类群的 33306 棵树。我们的实验表明,MrsRF 是一种可扩展的方法,在 32 个总核上的加速比超过 18。我们的结果还表明,在多核集群上获得最佳加速比需要不同的集群配置。最后,我们展示了如何使用 RF 矩阵直观地总结系统发育树的集合。
我们的结果表明,MapReduce 是开发多核系统发育应用程序的有前途的范例。结果还表明,必须测试不同的多核配置,以获得最佳性能。我们得出结论,RF 矩阵在开发技术以总结大型树集合方面起着关键作用。