Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, 72076, Germany.
Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, 72076, Germany.
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae609.
Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. However, current construction methods often introduce biases, excluding complex sequences or relying on references. The PanGenome Graph Builder (PGGB) addresses these issues. To date, though, there is no state-of-the-art pipeline allowing for easy deployment, efficient and dynamic use of available resources, and scalable usage at the same time.
To overcome these limitations, we present nf-core/pangenome, a reference-unbiased approach implemented in Nextflow following nf-core's best practices. Leveraging biocontainers ensures portability and seamless deployment in High-Performance Computing (HPC) environments. Unlike PGGB, nf-core/pangenome distributes alignments across cluster nodes, enabling scalability. Demonstrating its efficiency, we constructed pangenome graphs for 1000 human chromosome 19 haplotypes and 2146 Escherichia coli sequences, achieving a two to threefold speedup compared to PGGB without increasing greenhouse gas emissions.
nf-core/pangenome is released under the MIT open-source license, available on GitHub and Zenodo, with documentation accessible at https://nf-co.re/pangenome/docs/usage.
泛基因组图提供了一种全面的方法来捕获多个基因组中的基因组变异性。然而,当前的构建方法经常引入偏差,排除复杂的序列或依赖于参考。泛基因组图生成器(PGGB)解决了这些问题。然而,到目前为止,还没有一个最先进的流水线能够允许轻松部署、高效和动态地利用可用资源,同时实现可伸缩性。
为了克服这些限制,我们提出了 nf-core/pangenome,这是一种基于 Nextflow 的参考无偏差方法,遵循 nf-core 的最佳实践。利用生物容器确保了在高性能计算(HPC)环境中的可移植性和无缝部署。与 PGGB 不同,nf-core/pangenome 将比对分配到集群节点上,实现了可扩展性。通过构建 1000 个人类染色体 19 号单倍型和 2146 个大肠杆菌序列的泛基因组图,我们证明了其效率,与 PGGB 相比,速度提高了两到三倍,而温室气体排放量没有增加。
nf-core/pangenome 根据 MIT 开源许可证发布,可在 GitHub 和 Zenodo 上获得,并可在 https://nf-co.re/pangenome/docs/usage 上获得文档。