Santus Luisa, Espinosa-Carrasco Jose, Rauschning Leon, Mir-Pedrol Júlia, Trujnara Igor, Vignoli Alessio, Mansouri Leila, Baltzis Athanasios, Floden Evan W, Di Tommaso Paolo, Garriga Edgar, Gudyś Adam, Deorowicz Sebastian, Gilchrist Cameron, Steinegger Martin, Notredame Cedric
Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain.
Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain.
NAR Genom Bioinform. 2025 Jul 31;7(3):lqaf104. doi: 10.1093/nargab/lqaf104. eCollection 2025 Sep.
The computational complexity of many key bioinformatics problems has resulted in numerous alternative heuristic solutions, where no single approach consistently outperforms all others. This creates difficulties for users trying to identify the most suitable tool for their dataset and for developers managing and evaluating alternative methods. As data volumes grow, deploying these methods becomes increasingly difficult, highlighting the need for standardized frameworks for seamless tool deployment and comparison in high-performance computing (HPC) environments. Multiple sequence aligners (MSAs) rank among the most commonly employed modeling techniques in bioinformatics, playing a crucial role in applications such as protein structure prediction, phylogenetic reconstruction, and variant effect prediction. MSAs are NP-hard problems, which makes them a major example of computational challenges where heuristic solutions are essential. Here, we present a pilot design of an nf-core framework for streamlined tool deployment and rigorous performance evaluation focusing on the MSA software ecosystem. While showcased with the integration of popular MSA tools and designed to directly benefit the MSA community, we also present the framework as a proof of principle for the broader bioinformatics community. nf-core/multiplesequencealign is free open-source software available at https://nf-co.re/multiplesequencealign.
许多关键生物信息学问题的计算复杂性导致了众多替代启发式解决方案的出现,在这些方案中,没有一种方法能始终优于其他所有方法。这给试图为其数据集确定最合适工具的用户以及管理和评估替代方法的开发者带来了困难。随着数据量的增长,部署这些方法变得越来越困难,这凸显了在高性能计算(HPC)环境中需要用于无缝工具部署和比较的标准化框架。多序列比对工具(MSA)是生物信息学中最常用的建模技术之一,在蛋白质结构预测、系统发育重建和变异效应预测等应用中发挥着关键作用。MSA是NP难问题,这使其成为启发式解决方案至关重要的计算挑战的一个主要例子。在这里,我们展示了一个nf-core框架的初步设计,该框架用于简化工具部署和进行严格的性能评估,重点是MSA软件生态系统。虽然通过集成流行的MSA工具进行了展示,并旨在直接造福于MSA社区,但我们也将该框架作为更广泛生物信息学社区的一个原理证明。nf-core/multiplesequencealign是免费的开源软件,可在https://nf-co.re/multiplesequencealign获取。