Suppr超能文献

比较使用C++、Go和Java实现下一代测序工具时的编程难易程度。

Comparing Ease of Programming in C++, Go, and Java for Implementing a Next-Generation Sequencing Tool.

作者信息

Costanza Pascal, Herzeel Charlotte, Verachtert Wilfried

机构信息

ExaScience Lab, IMEC vzw, Leuven, Belgium.

出版信息

Evol Bioinform Online. 2019 Aug 15;15:1176934319869015. doi: 10.1177/1176934319869015. eCollection 2019.

Abstract

elPrep is an extensible multithreaded software framework for efficiently processing Sequence Alignment/Map (SAM)/Binary Alignment/Map (BAM) files in next-generation sequencing pipelines. Similar to other SAM/BAM tools, a key challenge in elPrep is memory management, as such programs need to manipulate large amounts of data. We therefore investigated 3 programming languages with support for assisted or automated memory management for implementing elPrep, namely C++, Go, and Java. We implemented a nontrivial subset of elPrep in all 3 programming languages and compared them by benchmarking their runtime performance and memory use to determine the best language in terms of computational performance. In a previous article, we motivated why, based on these results, we eventually selected Go as our implementation language. In this article, we discuss the difficulty of achieving the best performance in each language in terms of programming language constructs and standard library support. While benchmarks are easy to objectively measure and evaluate, this is less obvious for assessing ease of programming. However, because we expect elPrep to be regularly modified and extended, this is an equally important aspect. We illustrate representative examples of challenges in all 3 languages, and give our opinion why we think that Go is a reasonable choice also in this light.

摘要

elPrep是一个可扩展的多线程软件框架,用于在下一代测序流程中高效处理序列比对/映射(SAM)/二进制比对/映射(BAM)文件。与其他SAM/BAM工具类似,elPrep面临的一个关键挑战是内存管理,因为这类程序需要处理大量数据。因此,我们研究了3种支持辅助或自动内存管理的编程语言来实现elPrep,即C++、Go和Java。我们用这3种编程语言实现了elPrep的一个重要子集,并通过对它们的运行时性能和内存使用情况进行基准测试来比较,以确定在计算性能方面最佳的语言。在之前的一篇文章中,我们阐述了基于这些结果,我们最终为何选择Go作为实现语言。在本文中,我们从编程语言结构和标准库支持的角度讨论了在每种语言中实现最佳性能的难度。虽然基准测试易于客观测量和评估,但对于评估编程的难易程度就没那么明显了。然而,由于我们预计elPrep会经常被修改和扩展,这也是一个同样重要的方面。我们展示了这3种语言中具有代表性的挑战示例,并说明为何我们认为从这个角度来看Go也是一个合理的选择。

相似文献

3
elPrep 4: A multithreaded framework for sequence analysis.elPrep 4:一个用于序列分析的多线程框架。
PLoS One. 2019 Feb 13;14(2):e0209523. doi: 10.1371/journal.pone.0209523. eCollection 2019.
6
Multithreaded variant calling in elPrep 5.elPrep 5 中的多线程变异调用。
PLoS One. 2021 Feb 4;16(2):e0244471. doi: 10.1371/journal.pone.0244471. eCollection 2021.
8
Hadoop-BAM: directly manipulating next generation sequencing data in the cloud.Hadoop-BAM:在云中直接操作下一代测序数据。
Bioinformatics. 2012 Mar 15;28(6):876-7. doi: 10.1093/bioinformatics/bts054. Epub 2012 Feb 2.
9
Qualimap: evaluating next-generation sequencing alignment data.Qualimap:评估下一代测序比对数据。
Bioinformatics. 2012 Oct 15;28(20):2678-9. doi: 10.1093/bioinformatics/bts503. Epub 2012 Aug 22.

本文引用的文献

2
elPrep 4: A multithreaded framework for sequence analysis.elPrep 4:一个用于序列分析的多线程框架。
PLoS One. 2019 Feb 13;14(2):e0209523. doi: 10.1371/journal.pone.0209523. eCollection 2019.
4
The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。
Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验