• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

组装质量控制 (AssemblyQC):用于可重复报告组装质量的 Nextflow 管道。

AssemblyQC: a Nextflow pipeline for reproducible reporting of assembly quality.

机构信息

Molecular & Digital Breeding, The New Zealand Institute for Plant and Food Research Limited, 1025 Auckland, New Zealand.

Molecular & Digital Breeding, The New Zealand Institute for Plant and Food Research Limited, 3182 Te Puke, New Zealand.

出版信息

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae477.

DOI:10.1093/bioinformatics/btae477
PMID:39078114
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11333564/
Abstract

SUMMARY

Genome assembly projects have grown exponentially due to breakthroughs in sequencing technologies and assembly algorithms. Evaluating the quality of genome assemblies is critical to ensure the reliability of downstream analysis and interpretation. To fulfil this task, we have developed the AssemblyQC pipeline that performs file-format validation, contaminant checking, contiguity measurement, gene- and repeat-space completeness quantification, telomere inspection, taxonomic assignment, synteny alignment, scaffold examination through Hi-C contact-map visualization, and assessments of completeness, consensus quality and phasing through k-mer analysis. It produces a comprehensive HTML report with method descriptions, tables, and visualizations.

AVAILABILITY AND IMPLEMENTATION

The pipeline uses Nextflow for workflow orchestration and adheres to the best-practice established by the nf-core community. This pipeline offers a reproducible, scalable, and portable method to assess the quality of genome assemblies-the code is available online at GitHub: https://github.com/Plant-Food-Research-Open/assemblyqc.

摘要

摘要

由于测序技术和组装算法的突破,基因组组装项目呈指数级增长。评估基因组组装的质量对于确保下游分析和解释的可靠性至关重要。为了完成这项任务,我们开发了 AssemblyQC 管道,该管道执行文件格式验证、污染物检查、连续性测量、基因和重复空间完整性量化、端粒检查、分类学分配、通过 Hi-C 接触图可视化进行同线性比对、通过 k-mer 分析评估完整性、一致性质量和相位。它生成一个带有方法描述、表格和可视化的综合 HTML 报告。

可用性和实现

该管道使用 Nextflow 进行工作流程编排,并遵循 nf-core 社区建立的最佳实践。该管道提供了一种可重复、可扩展和可移植的方法来评估基因组组装的质量——代码可在 GitHub 上在线获得:https://github.com/Plant-Food-Research-Open/assemblyqc。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10af/11333564/f7cb5d2da73f/btae477f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10af/11333564/f7cb5d2da73f/btae477f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10af/11333564/f7cb5d2da73f/btae477f1.jpg

相似文献

1
AssemblyQC: a Nextflow pipeline for reproducible reporting of assembly quality.组装质量控制 (AssemblyQC):用于可重复报告组装质量的 Nextflow 管道。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae477.
2
GAEP: a comprehensive genome assembly evaluating pipeline.GAEP:一个全面的基因组组装评估管道。
J Genet Genomics. 2023 Oct;50(10):747-754. doi: 10.1016/j.jgg.2023.05.009. Epub 2023 May 26.
3
SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies.SQUAT:用于基因组组装数据质量评估的测序质量评估工具。
BMC Genomics. 2019 Apr 18;19(Suppl 9):238. doi: 10.1186/s12864-019-5445-3.
4
HaploMerger2: rebuilding both haploid sub-assemblies from high-heterozygosity diploid genome assembly.HaploMerger2:从高杂合度二倍体基因组组装中重建两个单倍体亚组装体。
Bioinformatics. 2017 Aug 15;33(16):2577-2579. doi: 10.1093/bioinformatics/btx220.
5
MAECI: A pipeline for generating consensus sequence with nanopore sequencing long-read assembly and error correction.MAECI:一种使用纳米孔测序长读段组装和纠错生成共识序列的流水线。
PLoS One. 2022 May 20;17(5):e0267066. doi: 10.1371/journal.pone.0267066. eCollection 2022.
6
Squeakr: an exact and approximate k-mer counting system.Squeakr:一种精确和近似的 k-mer 计数系统。
Bioinformatics. 2018 Feb 15;34(4):568-575. doi: 10.1093/bioinformatics/btx636.
7
Redundans: an assembly pipeline for highly heterozygous genomes.Redundans:一种用于高度杂合基因组的组装管道。
Nucleic Acids Res. 2016 Jul 8;44(12):e113. doi: 10.1093/nar/gkw294. Epub 2016 Apr 29.
8
scanPAV: a pipeline for extracting presence-absence variations in genome pairs.scanPAV:用于提取基因组对中存在-缺失变异的管道。
Bioinformatics. 2018 Sep 1;34(17):3022-3024. doi: 10.1093/bioinformatics/bty189.
9
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.
10
CRISPR-DAV: CRISPR NGS data analysis and visualization pipeline.CRISPR-DAV:CRISPR NGS 数据分析和可视化流程。
Bioinformatics. 2017 Dec 1;33(23):3811-3812. doi: 10.1093/bioinformatics/btx518.

引用本文的文献

1
VueGen: automating the generation of scientific reports.VueGen:科学报告生成自动化
Bioinform Adv. 2025 Jun 24;5(1):vbaf149. doi: 10.1093/bioadv/vbaf149. eCollection 2025.
2
Chromosome-scale assemblies of three Ormosia species: repetitive sequences distribution and structural rearrangement.三种红豆属植物的染色体水平组装:重复序列分布与结构重排
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf047.
3
A Hitchhiker's Guide to long-read genomic analysis.长读长基因组分析指南

本文引用的文献

1
Rapid and sensitive detection of genome contamination at scale with FCS-GX.使用 FCS-GX 实现大规模的基因组污染快速灵敏检测。
Genome Biol. 2024 Feb 26;25(1):60. doi: 10.1186/s13059-024-03198-7.
2
A proposed metric set for evaluation of genome assembly quality.一套用于评估基因组组装质量的提议指标集。
Trends Genet. 2023 Mar;39(3):175-186. doi: 10.1016/j.tig.2022.10.005. Epub 2022 Nov 17.
3
plotsr: visualizing structural similarities and rearrangements between multiple genomes.plotsr:可视化多个基因组之间的结构相似性和重排。
Genome Res. 2025 Apr 14;35(4):545-558. doi: 10.1101/gr.279975.124.
4
Haplotyped genome mapping and functional characterization of a blueberry anthocyanin acetyltransferase (AAT) controlling the accumulation of acylated anthocyanins.控制酰化花青素积累的蓝莓花青素乙酰转移酶(AAT)的单倍型基因组图谱绘制及功能表征
J Exp Bot. 2025 Apr 9;76(6):1607-1624. doi: 10.1093/jxb/erae489.
Bioinformatics. 2022 May 13;38(10):2922-2926. doi: 10.1093/bioinformatics/btac196.
4
Contamination detection in genomic data: more is not enough.基因组数据中的污染检测:更多并不一定更好。
Genome Biol. 2022 Feb 21;23(1):60. doi: 10.1186/s13059-022-02619-9.
5
Twenty years of plant genome sequencing: achievements and challenges.植物基因组测序二十年:成就与挑战
Trends Plant Sci. 2022 Apr;27(4):391-401. doi: 10.1016/j.tplants.2021.10.006. Epub 2021 Nov 12.
6
Empirical evaluation of methods for genome assembly.基因组组装方法的实证评估。
PeerJ Comput Sci. 2021 Jul 9;7:e636. doi: 10.7717/peerj-cs.636. eCollection 2021.
7
Towards complete and error-free genome assemblies of all vertebrate species.致力于完成所有脊椎动物物种的完整且无错误的基因组组装。
Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.
8
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.
9
Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies.Merqury:基因组组装的无参考质量、完整性和相位评估。
Genome Biol. 2020 Sep 14;21(1):245. doi: 10.1186/s13059-020-02134-9.
10
GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations.基因组 QC:基因组组装和基因结构注释的质量评估工具。
BMC Genomics. 2020 Mar 2;21(1):193. doi: 10.1186/s12864-020-6568-2.