Zirión-Martínez Claudia, Magwene Paul M
Department of Biology, Duke University, PO Box 90338, Durham, NC 27708.
bioRxiv. 2025 Aug 20:2025.08.15.670593. doi: 10.1101/2025.08.15.670593.
Analyzing genomic variants in large datasets composed of short-read sequencing data is a process that requires multiple steps and computational tools, which makes it a complicated task that is difficult to reproduce across projects and laboratories. To address this need, we developed a reproducible and scalable Snakemake workflow called WeavePop, which aligns samples to selected references, obtains reference-based assemblies, annotations, and sequences, and identifies small variants and copy-number variants in eukaryotic haploid organisms. All the results are integrated into a database that can be easily shared and explored through a graphical web interface provided alongside the workflow, making the discovery of variants in a population of study very simple. WeavePop is available from GitHub (https://github.com/magwenelab/WeavePop) for Linux operating systems. Here we exemplify the use of WeavePop in a large collection of isolates of the pathogenic fungus .
在由短读长测序数据组成的大型数据集中分析基因组变异是一个需要多个步骤和计算工具的过程,这使得它成为一项复杂的任务,难以在不同项目和实验室之间重现。为满足这一需求,我们开发了一种名为WeavePop的可重现且可扩展的Snakemake工作流程,该流程将样本与选定的参考序列进行比对,获得基于参考序列的组装、注释和序列,并识别真核单倍体生物中的小变异和拷贝数变异。所有结果都整合到一个数据库中,通过工作流程附带的图形化网络界面可以轻松共享和探索,从而使在研究群体中发现变异变得非常简单。WeavePop可从GitHub(https://github.com/magwenelab/WeavePop)获取,适用于Linux操作系统。在这里,我们举例说明WeavePop在大量致病真菌分离株中的应用。