• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用 WENGAN 进行高效的人类基因组从头杂交组装。

Efficient hybrid de novo assembly of human genomes with WENGAN.

机构信息

Inria Grenoble Rhône-Alpes, Montbonnot, France.

Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France.

出版信息

Nat Biotechnol. 2021 Apr;39(4):422-430. doi: 10.1038/s41587-020-00747-w. Epub 2020 Dec 14.

DOI:10.1038/s41587-020-00747-w
PMID:33318652
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8041623/
Abstract

Generating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads add accurate short reads to polish the consensus sequence. Here we report an algorithm for hybrid assembly, WENGAN, that provides very high quality at low computational cost. We demonstrate de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology. WENGAN implements efficient algorithms to improve assembly contiguity as well as consensus quality. The resulting genome assemblies have high contiguity (contig NG50: 17.24-80.64 Mb), few assembly errors (contig NGA50: 11.8-59.59 Mb), good consensus quality (QV: 27.84-42.88) and high gene completeness (BUSCO complete: 94.6-95.2%), while consuming low computational resources (CPU hours: 187-1,200). In particular, the WENGAN assembly of the haploid CHM13 sample achieved a contig NG50 of 80.64 Mb (NGA50: 59.59 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50: 57.88 Mb).

摘要

使用仅长且易错的reads 来生成大型、重复丰富的人类基因组的准确基因组组装一直很困难,大多数从长reads 组装的人类基因组会添加准确的短reads 来优化共识序列。在这里,我们报告了一种混合组装算法 WENGAN,该算法以低计算成本提供非常高质量的结果。我们使用 ONT PromethION、PacBio Sequel、Illumina 和 MGI 技术生成的测序数据组合,展示了对四个人类基因组的从头组装。WENGAN 实现了高效的算法,以提高组装连续性和共识质量。生成的基因组组装具有高连续性(contig NG50:17.24-80.64 Mb)、很少的组装错误(contig NGA50:11.8-59.59 Mb)、良好的共识质量(QV:27.84-42.88)和高基因完整性(BUSCO 完整:94.6-95.2%),同时消耗低计算资源(CPU 小时:187-1200)。特别是,WENGAN 对单体型 CHM13 样本的组装实现了 80.64 Mb 的 contig NG50(NGA50:59.59 Mb),超过了当前人类参考基因组(GRCh38 contig NG50:57.88 Mb)的连续性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/8041623/6f6067c82e5b/41587_2020_747_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/8041623/e3545b7c6bea/41587_2020_747_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/8041623/8d75019f30aa/41587_2020_747_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/8041623/31af5d74126a/41587_2020_747_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/8041623/6f6067c82e5b/41587_2020_747_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/8041623/e3545b7c6bea/41587_2020_747_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/8041623/8d75019f30aa/41587_2020_747_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/8041623/31af5d74126a/41587_2020_747_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a6/8041623/6f6067c82e5b/41587_2020_747_Fig4_HTML.jpg

相似文献

1
Efficient hybrid de novo assembly of human genomes with WENGAN.使用 WENGAN 进行高效的人类基因组从头杂交组装。
Nat Biotechnol. 2021 Apr;39(4):422-430. doi: 10.1038/s41587-020-00747-w. Epub 2020 Dec 14.
2
LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.LR_Gapcloser:一种基于平铺路径的缺口闭合器,它使用长读长来完成基因组组装。
Gigascience. 2019 Jan 1;8(1):giy157. doi: 10.1093/gigascience/giy157.
3
TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads.TGS-GapCloser:一种快速准确的大型基因组缺口闭合方法,适用于错误倾向的长reads 覆盖率低的情况。
Gigascience. 2020 Sep 1;9(9). doi: 10.1093/gigascience/giaa094.
4
Chromosome-scale assembly comparison of the Korean Reference Genome KOREF from PromethION and PacBio with Hi-C mapping information.利用 Hi-C 图谱信息对 PromethION 和 PacBio 测序的韩国参考基因组 KOREF 进行染色体水平组装比较。
Gigascience. 2019 Dec 1;8(12). doi: 10.1093/gigascience/giz125.
5
Highly accurate long reads are crucial for realizing the potential of biodiversity genomics.高质量的长读长序列对于实现生物多样性基因组学的潜力至关重要。
BMC Genomics. 2023 Mar 16;24(1):117. doi: 10.1186/s12864-023-09193-9.
6
Benchmarking of next and third generation sequencing technologies and their associated algorithms for genome assembly.对下一代和第三代测序技术及其相关算法进行基因组组装的基准测试。
Mol Med Rep. 2021 Apr;23(4). doi: 10.3892/mmr.2021.11890. Epub 2021 Feb 4.
7
A High-Quality Genome Assembly from a Single Mosquito Using PacBio Sequencing.利用 PacBio 测序从单个蚊子中获得高质量基因组组装。
Genes (Basel). 2019 Jan 18;10(1):62. doi: 10.3390/genes10010062.
8
ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter.ABySS 2.0:使用布隆过滤器对大型基因组进行资源高效组装。
Genome Res. 2017 May;27(5):768-777. doi: 10.1101/gr.214346.116. Epub 2017 Feb 23.
9
Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish.利用来自牛津纳米孔技术和 Illumina 平台的测序reads 为北美花鳉属的四个物种构建基因组草图。
Gigascience. 2020 Jun 1;9(6). doi: 10.1093/gigascience/giaa067.
10
Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools.通过合成基因组文库的长读长测序完成细菌基因组草图
BMC Genomics. 2020 Jul 29;21(1):519. doi: 10.1186/s12864-020-06910-6.

引用本文的文献

1
Benchmarking of bioinformatics tools for the hybrid assembly of human and non-human whole-genome sequencing data.用于人类和非人类全基因组测序数据混合组装的生物信息学工具的基准测试。
Comput Struct Biotechnol J. 2025 Jul 13;27:3099-3109. doi: 10.1016/j.csbj.2025.07.020. eCollection 2025.
2
Chromosome-scale genome assembly of the bed bug Cimex lectularius sheds light on a key insecticide resistance locus.臭虫(温带臭虫)的染色体水平基因组组装揭示了一个关键的抗杀虫剂位点。
G3 (Bethesda). 2025 Sep 3;15(9). doi: 10.1093/g3journal/jkaf161.
3
A chromosome-scale and haplotype-resolved genome assembly of tetraploid blackberry ( L. subgenus Watson).

本文引用的文献

1
HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.HiCanu:从高保真长读段中精确组装片段重复、卫星和等位基因变体。
Genome Res. 2020 Sep;30(9):1291-1305. doi: 10.1101/gr.263566.120. Epub 2020 Aug 14.
2
Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.纳米孔测序和 Shasta 工具包可实现 11 个人类基因组的高效从头组装。
Nat Biotechnol. 2020 Sep;38(9):1044-1053. doi: 10.1038/s41587-020-0503-6. Epub 2020 May 4.
3
Telomere-to-telomere assembly of a complete human X chromosome.
四倍体黑莓(悬钩子属沃森亚属)的染色体水平和单倍型解析基因组组装
Hortic Res. 2025 Feb 18;12(6):uhaf052. doi: 10.1093/hr/uhaf052. eCollection 2025 Jun.
4
Discovery of Cortinarius O-methyltransferases for the heterologous production of dermolutein and physcion.发现用于异源生产皮膜菌素和大黄素甲醚的丝膜菌O-甲基转移酶。
Biotechnol Biofuels Bioprod. 2025 Feb 25;18(1):25. doi: 10.1186/s13068-025-02625-6.
5
Choosing the most suitable NGS technology to combine with a standardized viral enrichment protocol for obtaining complete avian orthoreovirus genomes from metagenomic samples.选择最合适的二代测序(NGS)技术,与标准化病毒富集方案相结合,以便从宏基因组样本中获得完整的禽正呼肠孤病毒基因组。
Front Bioinform. 2025 Feb 4;5:1498921. doi: 10.3389/fbinf.2025.1498921. eCollection 2025.
6
When less is more: sketching with minimizers in genomics.少即是多:基因组学中的最小化器草图。
Genome Biol. 2024 Oct 14;25(1):270. doi: 10.1186/s13059-024-03414-4.
7
Assessing parasite genomes assembled using only Oxford Nanopore Technologies MinION data.评估仅使用牛津纳米孔技术MinION数据组装的寄生虫基因组。
iScience. 2024 Jul 30;27(9):110614. doi: 10.1016/j.isci.2024.110614. eCollection 2024 Sep 20.
8
AsmMix: an efficient haplotype-resolved hybrid genome assembling pipeline.AsmMix:一种高效的单倍型解析混合基因组组装流程。
Front Genet. 2024 Jul 26;15:1421565. doi: 10.3389/fgene.2024.1421565. eCollection 2024.
9
Crossroads of assembling a moss genome: navigating contaminants and horizontal gene transfer in the moss Physcomitrellopsis africana.组装藓类基因组的十字路口:在藓类 Physcomitrellopsis africana 中导航污染物和水平基因转移。
G3 (Bethesda). 2024 Jul 8;14(7). doi: 10.1093/g3journal/jkae104.
10
Genomes of multicellular algal sisters to land plants illuminate signaling network evolution.多细胞藻类植物与陆地植物的姐妹基因组揭示了信号网络的进化。
Nat Genet. 2024 May;56(5):1018-1031. doi: 10.1038/s41588-024-01737-3. Epub 2024 May 1.
端粒到端粒组装完整的人类 X 染色体。
Nature. 2020 Sep;585(7823):79-84. doi: 10.1038/s41586-020-2547-7. Epub 2020 Jul 14.
4
Fast and accurate long-read assembly with wtdbg2.使用 wtdbg2 实现快速准确的长读长序列组装。
Nat Methods. 2020 Feb;17(2):155-158. doi: 10.1038/s41592-019-0669-3. Epub 2019 Dec 9.
5
Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.精确的圆形共识长读测序提高了人类基因组变异检测和组装的准确性。
Nat Biotechnol. 2019 Oct;37(10):1155-1162. doi: 10.1038/s41587-019-0217-9. Epub 2019 Aug 12.
6
ntEdit: scalable genome sequence polishing.ntEdit:可扩展的基因组序列优化。
Bioinformatics. 2019 Nov 1;35(21):4430-4432. doi: 10.1093/bioinformatics/btz400.
7
Assembly of long, error-prone reads using repeat graphs.使用重复图组装长的、易错的读取。
Nat Biotechnol. 2019 May;37(5):540-546. doi: 10.1038/s41587-019-0072-8. Epub 2019 Apr 1.
8
Errors in long-read assemblies can critically affect protein prediction.长读长组装中的错误会严重影响蛋白质预测。
Nat Biotechnol. 2019 Feb;37(2):124-126. doi: 10.1038/s41587-018-0004-z.
9
Long-read sequence and assembly of segmental duplications.长读序列和串联重复序列的组装。
Nat Methods. 2019 Jan;16(1):88-94. doi: 10.1038/s41592-018-0236-3. Epub 2018 Dec 17.
10
Versatile genome assembly evaluation with QUAST-LG.QUAST-LG 进行多功能基因组组装评估。
Bioinformatics. 2018 Jul 1;34(13):i142-i150. doi: 10.1093/bioinformatics/bty266.