• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

克服非模式生物长读长组装中的未折叠单倍型。

Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms.

机构信息

Service Evolution Biologique et Ecologie, Université libre de Bruxelles (ULB), Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium.

Laboratoire d'Ecologie et Génétique Evolutive, Université de Namur, Rue de Bruxelles 61, 5000, Namur, Belgium.

出版信息

BMC Bioinformatics. 2021 Jun 5;22(1):303. doi: 10.1186/s12859-021-04118-3.

DOI:10.1186/s12859-021-04118-3
PMID:34090340
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8178825/
Abstract

BACKGROUND

Long-read sequencing is revolutionizing genome assembly: as PacBio and Nanopore technologies become more accessible in technicity and in cost, long-read assemblers flourish and are starting to deliver chromosome-level assemblies. However, these long reads are usually error-prone, making the generation of a haploid reference out of a diploid genome a difficult enterprise. Failure to properly collapse haplotypes results in fragmented and structurally incorrect assemblies and wreaks havoc on orthology inference pipelines, yet this serious issue is rarely acknowledged and dealt with in genomic projects, and an independent, comparative benchmark of the capacity of assemblers and post-processing tools to properly collapse or purge haplotypes is still lacking.

RESULTS

We tested different assembly strategies on the genome of the rotifer Adineta vaga, a non-model organism for which high coverages of both PacBio and Nanopore reads were available. The assemblers we tested (Canu, Flye, NextDenovo, Ra, Raven, Shasta and wtdbg2) exhibited strikingly different behaviors when dealing with highly heterozygous regions, resulting in variable amounts of uncollapsed haplotypes. Filtering reads generally improved haploid assemblies, and we also benchmarked three post-processing tools aimed at detecting and purging uncollapsed haplotypes in long-read assemblies: HaploMerger2, purge_haplotigs and purge_dups.

CONCLUSIONS

We provide a thorough evaluation of popular assemblers on a non-model eukaryote genome with variable levels of heterozygosity. Our study highlights several strategies using pre and post-processing approaches to generate haploid assemblies with high continuity and completeness. This benchmark will help users to improve haploid assemblies of non-model organisms, and evaluate the quality of their own assemblies.

摘要

背景

长读测序正在彻底改变基因组组装:随着 PacBio 和 Nanopore 技术在技术和成本上变得更加容易获得,长读序列组装器蓬勃发展,开始提供染色体水平的组装。然而,这些长读通常容易出错,使得从二倍体基因组中生成单倍体参考成为一项艰巨的任务。如果不能正确地折叠单倍型,就会导致组装片段化和结构不正确,并破坏同源性推断管道,但这个严重的问题在基因组项目中很少得到承认和处理,而且仍然缺乏对组装器和后处理工具正确折叠或清除单倍型能力的独立、比较基准。

结果

我们在轮虫 Adineta vaga 的基因组上测试了不同的组装策略,轮虫是一种非模式生物,我们可以获得其PacBio 和 Nanopore reads 的高覆盖率。我们测试的组装器(Canu、Flye、NextDenovo、Ra、Raven、Shasta 和 wtdbg2)在处理高度杂合区域时表现出明显不同的行为,导致未折叠单倍型的数量不同。过滤 reads 通常可以改善单倍体组装,我们还对三种旨在检测和清除长读组装中单倍型的后处理工具进行了基准测试:HaploMerger2、purge_haplotigs 和 purge_dups。

结论

我们在具有不同杂合水平的非模式真核生物基因组上对流行的组装器进行了全面评估。我们的研究强调了几种使用预和后处理方法的策略,以生成具有高连续性和完整性的单倍体组装。该基准将帮助用户改进非模式生物的单倍体组装,并评估他们自己的组装质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e9/8178825/b8d1e4938172/12859_2021_4118_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e9/8178825/30150f6e7d57/12859_2021_4118_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e9/8178825/9e04fecab013/12859_2021_4118_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e9/8178825/bd693ff8a18b/12859_2021_4118_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e9/8178825/b8d1e4938172/12859_2021_4118_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e9/8178825/30150f6e7d57/12859_2021_4118_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e9/8178825/9e04fecab013/12859_2021_4118_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e9/8178825/bd693ff8a18b/12859_2021_4118_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e9/8178825/b8d1e4938172/12859_2021_4118_Fig4_HTML.jpg

相似文献

1
Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms.克服非模式生物长读长组装中的未折叠单倍型。
BMC Bioinformatics. 2021 Jun 5;22(1):303. doi: 10.1186/s12859-021-04118-3.
2
Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.评估真核生物基因组的长读长从头组装工具:见解与考虑。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.
3
Benchmarking Long-Read Assemblers for Genomic Analyses of Bacterial Pathogens Using Oxford Nanopore Sequencing.基于 Oxford Nanopore 测序的细菌病原体基因组分析的长读长组装器基准测试
Int J Mol Sci. 2020 Dec 1;21(23):9161. doi: 10.3390/ijms21239161.
4
Benchmarking of long-read assemblers for prokaryote whole genome sequencing.原核生物全基因组测序的长读长组装器基准测试。
F1000Res. 2019 Dec 23;8:2138. doi: 10.12688/f1000research.21782.4. eCollection 2019.
5
Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.清除单倍型:三代二倍体基因组组装的等位基因 contig 重新分配。
BMC Bioinformatics. 2018 Nov 29;19(1):460. doi: 10.1186/s12859-018-2485-7.
6
A practical assembly guideline for genomes with various levels of heterozygosity.具有不同杂合度基因组的实用组装指南。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad337.
7
Identifying and removing haplotypic duplication in primary genome assemblies.鉴定和去除初级基因组组装中的单倍型重复。
Bioinformatics. 2020 May 1;36(9):2896-2898. doi: 10.1093/bioinformatics/btaa025.
8
Comparative Evaluation of Genome Assemblers from Long-Read Sequencing for Plants and Crops.比较长读长测序组装植物和作物基因组的基因组程序。
J Agric Food Chem. 2020 Jul 22;68(29):7670-7677. doi: 10.1021/acs.jafc.0c01647. Epub 2020 Jul 10.
9
HaploMerger2: rebuilding both haploid sub-assemblies from high-heterozygosity diploid genome assembly.HaploMerger2:从高杂合度二倍体基因组组装中重建两个单倍体亚组装体。
Bioinformatics. 2017 Aug 15;33(16):2577-2579. doi: 10.1093/bioinformatics/btx220.
10
Benchmarking multi-platform sequencing technologies for human genome assembly.多平台测序技术在人类基因组组装中的基准测试。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad300.

引用本文的文献

1
Chromosome-level genome assembly of the threatened ornamental plant Hibiscus yunnanensis.濒危观赏植物云南芙蓉的染色体水平基因组组装
Sci Data. 2025 Mar 25;12(1):503. doi: 10.1038/s41597-025-04842-y.
2
Comparative genomics provides insights into the biogeographic and biochemical diversity of meliaceous species.比较基因组学为楝科物种的生物地理和生化多样性提供了见解。
Nat Commun. 2025 Mar 17;16(1):2603. doi: 10.1038/s41467-025-57722-9.
3
Genomic comparison of the temperate coral Astrangia poculata with tropical corals yields insights into winter quiescence, innate immunity, and sexual reproduction.

本文引用的文献

1
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.使用带有 hifiasm 的相定装配图进行单体型解析从头组装。
Nat Methods. 2021 Feb;18(2):170-175. doi: 10.1038/s41592-020-01056-5. Epub 2021 Feb 1.
2
Comparison of long-read methods for sequencing and assembly of a plant genome.长读测序和组装植物基因组方法的比较。
Gigascience. 2020 Dec 21;9(12). doi: 10.1093/gigascience/giaa146.
3
HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.HiCanu:从高保真长读段中精确组装片段重复、卫星和等位基因变体。
温带珊瑚小星珊瑚(Astrangia poculata)与热带珊瑚的基因组比较为冬季静止、先天免疫和有性生殖提供了见解。
G3 (Bethesda). 2025 Apr 17;15(4). doi: 10.1093/g3journal/jkaf033.
4
Chromosome-scale genome dynamics reveal signatures of independent haplotype evolution in the ancient asexual mite .染色体水平的基因组动态揭示了古老无性螨类独立单倍型进化的特征。
Sci Adv. 2025 Jan 24;11(4):eadn0817. doi: 10.1126/sciadv.adn0817.
5
Chromosome-level genome assembly of the bay scallop Argopecten irradians.海湾扇贝 Argopecten irradians 的染色体水平基因组组装。
Sci Data. 2024 Sep 28;11(1):1057. doi: 10.1038/s41597-024-03904-x.
6
Maternal inheritance of functional centrioles in two parthenogenetic nematodes.两种孤雌生殖线虫中功能性中心粒的母系遗传。
Nat Commun. 2024 Jul 18;15(1):6042. doi: 10.1038/s41467-024-50427-5.
7
Morphological and dietary changes encoded in the genome of , a ctenophore-eating ctenophore.在一种以栉水母为食的栉水母基因组中编码的形态学和饮食变化。
NAR Genom Bioinform. 2024 Jun 18;6(2):lqae072. doi: 10.1093/nargab/lqae072. eCollection 2024 Jun.
8
The chromosome-level genomes of the herbal magnoliids Warburgia ugandensis and Saururus chinensis.草药八角莲和中华蛇足石杉的染色体水平基因组。
Sci Data. 2024 May 30;11(1):554. doi: 10.1038/s41597-024-03229-9.
9
Assembly collapsing versus heterozygosity oversizing: detection of homokaryotic and heterokaryotic strains by hybrid genome assembly.组装崩溃与杂合性超大型:通过混合基因组组装检测同核型和异核型菌株。
Microb Genom. 2024 Mar;10(3). doi: 10.1099/mgen.0.001218.
10
Revisiting genomes of non-model species with long reads yields new insights into their biology and evolution.利用长读长重新审视非模式物种的基因组,能为其生物学和进化带来新的见解。
Front Genet. 2024 Feb 7;15:1308527. doi: 10.3389/fgene.2024.1308527. eCollection 2024.
Genome Res. 2020 Sep;30(9):1291-1305. doi: 10.1101/gr.263566.120. Epub 2020 Aug 14.
4
Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.纳米孔测序和 Shasta 工具包可实现 11 个人类基因组的高效从头组装。
Nat Biotechnol. 2020 Sep;38(9):1044-1053. doi: 10.1038/s41587-020-0503-6. Epub 2020 May 4.
5
Telomere-to-telomere assembly of a complete human X chromosome.端粒到端粒组装完整的人类 X 染色体。
Nature. 2020 Sep;585(7823):79-84. doi: 10.1038/s41586-020-2547-7. Epub 2020 Jul 14.
6
A dual-constriction biological nanopore resolves homonucleotide sequences with high fidelity.双束缚生物纳米孔以高保真度解析同核苷酸序列。
Nat Biotechnol. 2020 Dec;38(12):1415-1420. doi: 10.1038/s41587-020-0570-8. Epub 2020 Jul 6.
7
The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies.基因组精修工具 POLCA 可快速准确地对基因组组装进行修正。
PLoS Comput Biol. 2020 Jun 26;16(6):e1007981. doi: 10.1371/journal.pcbi.1007981. eCollection 2020 Jun.
8
GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.GenomeScope 2.0 和 Smudgeplot 用于无参考的多倍体基因组剖析。
Nat Commun. 2020 Mar 18;11(1):1432. doi: 10.1038/s41467-020-14998-3.
9
Benchmarking of long-read assemblers for prokaryote whole genome sequencing.原核生物全基因组测序的长读长组装器基准测试。
F1000Res. 2019 Dec 23;8:2138. doi: 10.12688/f1000research.21782.4. eCollection 2019.
10
Identifying and removing haplotypic duplication in primary genome assemblies.鉴定和去除初级基因组组装中的单倍型重复。
Bioinformatics. 2020 May 1;36(9):2896-2898. doi: 10.1093/bioinformatics/btaa025.