Costanza Pascal, Herzeel Charlotte, Verachtert Wilfried
ExaScience Lab, IMEC vzw, Leuven, Belgium.
Evol Bioinform Online. 2019 Aug 15;15:1176934319869015. doi: 10.1177/1176934319869015. eCollection 2019.
elPrep is an extensible multithreaded software framework for efficiently processing Sequence Alignment/Map (SAM)/Binary Alignment/Map (BAM) files in next-generation sequencing pipelines. Similar to other SAM/BAM tools, a key challenge in elPrep is memory management, as such programs need to manipulate large amounts of data. We therefore investigated 3 programming languages with support for assisted or automated memory management for implementing elPrep, namely C++, Go, and Java. We implemented a nontrivial subset of elPrep in all 3 programming languages and compared them by benchmarking their runtime performance and memory use to determine the best language in terms of computational performance. In a previous article, we motivated why, based on these results, we eventually selected Go as our implementation language. In this article, we discuss the difficulty of achieving the best performance in each language in terms of programming language constructs and standard library support. While benchmarks are easy to objectively measure and evaluate, this is less obvious for assessing ease of programming. However, because we expect elPrep to be regularly modified and extended, this is an equally important aspect. We illustrate representative examples of challenges in all 3 languages, and give our opinion why we think that Go is a reasonable choice also in this light.
elPrep是一个可扩展的多线程软件框架,用于在下一代测序流程中高效处理序列比对/映射(SAM)/二进制比对/映射(BAM)文件。与其他SAM/BAM工具类似,elPrep面临的一个关键挑战是内存管理,因为这类程序需要处理大量数据。因此,我们研究了3种支持辅助或自动内存管理的编程语言来实现elPrep,即C++、Go和Java。我们用这3种编程语言实现了elPrep的一个重要子集,并通过对它们的运行时性能和内存使用情况进行基准测试来比较,以确定在计算性能方面最佳的语言。在之前的一篇文章中,我们阐述了基于这些结果,我们最终为何选择Go作为实现语言。在本文中,我们从编程语言结构和标准库支持的角度讨论了在每种语言中实现最佳性能的难度。虽然基准测试易于客观测量和评估,但对于评估编程的难易程度就没那么明显了。然而,由于我们预计elPrep会经常被修改和扩展,这也是一个同样重要的方面。我们展示了这3种语言中具有代表性的挑战示例,并说明为何我们认为从这个角度来看Go也是一个合理的选择。