Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland.
SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland.
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae065.
The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, scaling to large sample sizes, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting 2 large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.
下一代测序技术生成的大量多样化的病毒基因组数据集给计算数据分析工作流程带来了一系列挑战,包括严格的质量控制、大规模样本量的扩展,以及针对特定应用的定制步骤。在这里,我们展示了 V-pipe 3.0,这是一个专为分析短病毒基因组的下一代测序数据而设计的计算管道。它旨在实现病毒样本遗传多样性的可重复、可扩展、可适应和透明推断。通过展示 2 个大规模数据分析项目,我们证明了 V-pipe 3.0 在支持可持续病毒基因组数据科学方面的有效性。