The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-68159 Heidelberg, Germany.
Bioinformatics. 2012 Aug 1;28(15):2064-6. doi: 10.1093/bioinformatics/bts309. Epub 2012 May 24.
Due to advances in molecular sequencing and the increasingly rapid collection of molecular data, the field of phyloinformatics is transforming into a computational science. Therefore, new tools are required that can be deployed in supercomputing environments and that scale to hundreds or thousands of cores.
We describe RAxML-Light, a tool for large-scale phylogenetic inference on supercomputers under maximum likelihood. It implements a light-weight checkpointing mechanism, deploys 128-bit (SSE3) and 256-bit (AVX) vector intrinsics, offers two orthogonal memory saving techniques and provides a fine-grain production-level message passing interface parallelization of the likelihood function. To demonstrate scalability and robustness of the code, we inferred a phylogeny on a simulated DNA alignment (1481 taxa, 20 000 000 bp) using 672 cores. This dataset requires one terabyte of RAM to compute the likelihood score on a single tree. CODE AVAILABILITY: https://github.com/stamatak/RAxML-Light-1.0.5 DATA AVAILABILITY: http://www.exelixis-lab.org/onLineMaterial.tar.bz2
alexandros.stamatakis@h-its.org
Supplementary data are available at Bioinformatics online.
由于分子测序技术的进步和分子数据的快速收集,系统发育信息学领域正在转变为计算科学。因此,需要新的工具,可以在超级计算环境中部署,并扩展到数百或数千个内核。
我们描述了 RAxML-Light,这是一种在超级计算机上进行最大似然法大规模系统发育推断的工具。它实现了轻量级检查点机制,部署了 128 位(SSE3)和 256 位(AVX)向量内在函数,提供了两种正交的内存节省技术,并提供了细粒度的生产级消息传递接口并行化似然函数。为了证明代码的可扩展性和鲁棒性,我们使用 672 个内核对模拟 DNA 比对(1481 个分类单元,2000 万 bp)进行了系统发育推断。这个数据集需要 1TB 的 RAM 才能在单个树中计算似然分数。
https://github.com/stamatak/RAxML-Light-1.0.5
http://www.exelixis-lab.org/onLineMaterial.tar.bz2
alexandros.stamatakis@h-its.org
补充数据可在生物信息学在线获得。