Molecular & Digital Breeding, The New Zealand Institute for Plant and Food Research Limited, 1025 Auckland, New Zealand.
Molecular & Digital Breeding, The New Zealand Institute for Plant and Food Research Limited, 3182 Te Puke, New Zealand.
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae477.
Genome assembly projects have grown exponentially due to breakthroughs in sequencing technologies and assembly algorithms. Evaluating the quality of genome assemblies is critical to ensure the reliability of downstream analysis and interpretation. To fulfil this task, we have developed the AssemblyQC pipeline that performs file-format validation, contaminant checking, contiguity measurement, gene- and repeat-space completeness quantification, telomere inspection, taxonomic assignment, synteny alignment, scaffold examination through Hi-C contact-map visualization, and assessments of completeness, consensus quality and phasing through k-mer analysis. It produces a comprehensive HTML report with method descriptions, tables, and visualizations.
The pipeline uses Nextflow for workflow orchestration and adheres to the best-practice established by the nf-core community. This pipeline offers a reproducible, scalable, and portable method to assess the quality of genome assemblies-the code is available online at GitHub: https://github.com/Plant-Food-Research-Open/assemblyqc.
由于测序技术和组装算法的突破,基因组组装项目呈指数级增长。评估基因组组装的质量对于确保下游分析和解释的可靠性至关重要。为了完成这项任务,我们开发了 AssemblyQC 管道,该管道执行文件格式验证、污染物检查、连续性测量、基因和重复空间完整性量化、端粒检查、分类学分配、通过 Hi-C 接触图可视化进行同线性比对、通过 k-mer 分析评估完整性、一致性质量和相位。它生成一个带有方法描述、表格和可视化的综合 HTML 报告。
该管道使用 Nextflow 进行工作流程编排,并遵循 nf-core 社区建立的最佳实践。该管道提供了一种可重复、可扩展和可移植的方法来评估基因组组装的质量——代码可在 GitHub 上在线获得:https://github.com/Plant-Food-Research-Open/assemblyqc。