Langarita Ruben, Armejach Adria, Ibanez Pablo, Alastruey-Benede Jesus, Moreto Miquel
IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):3139-3153. doi: 10.1109/TCBB.2023.3264514. Epub 2023 Oct 9.
Sequence alignment pipelines for human genomes are an emerging workload that will dominate in the precision medicine field. BWA-MEM2 is a tool widely used in the scientific community to perform read mapping studies. In this paper, we port BWA-MEM2 to the AArch64 architecture using the ARMv8-A specification, and we compare the resulting version against an Intel Skylake system both in performance and in energy-to-solution. The porting effort entails numerous code modifications, since BWA-MEM2 implements certain kernels using x86_64 specific intrinsics, e.g., AVX-512. To adapt this code we use the recently introduced Arm's Scalable Vector Extensions (SVE). More specifically, we use Fujitsu's A64FX processor, the first to implement SVE. The A64FX powers the Fugaku Supercomputer that led the Top500 ranking from June 2020 to November 2021. After porting BWA-MEM2 we define and implement a number of optimizations to improve performance in the A64FX target architecture. We show that while the A64FX performance is lower than that of the Skylake system, A64FX delivers 11.6% better energy-to-solution on average. All the code used for this article is available at https://gitlab.bsc.es/rlangari/bwa-a64fx.
用于人类基因组的序列比对流程是一种新兴的工作负载,将在精准医学领域占据主导地位。BWA-MEM2是科学界广泛用于进行读段映射研究的工具。在本文中,我们使用ARMv8-A规范将BWA-MEM2移植到AArch64架构,并在性能和能耗比方面将生成的版本与英特尔Skylake系统进行比较。由于BWA-MEM2使用x86_64特定的内在函数(例如AVX-512)来实现某些内核,因此移植工作需要进行大量代码修改。为了适配此代码,我们使用了最近推出的Arm可扩展向量扩展(SVE)。更具体地说,我们使用富士通的A64FX处理器,这是首个实现SVE的处理器。A64FX为在2020年6月至2021年11月期间排名全球超级计算机500强榜首的富岳超级计算机提供动力。在移植BWA-MEM2之后,我们定义并实现了一些优化措施,以提高在A64FX目标架构上的性能。我们表明,虽然A64FX的性能低于Skylake系统,但A64FX的平均能耗比高出11.6%。本文使用的所有代码可在https://gitlab.bsc.es/rlangari/bwa-a64fx获取。