• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估用于创建稳健的泛基因组参考的高质量单倍型解析基因组的数据要求。

Evaluating data requirements for high-quality haplotype-resolved genomes for creating robust pangenome references.

作者信息

Sarashetti Prasad, Lipovac Josipa, Tomas Filip, Šikić Mile, Liu Jianjun

机构信息

Laboratory of Human Genomics, Genome Institute of Singapore, A*STAR, Singapore, Singapore.

Laboratory for Bioinformatics and Computational Biology, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia.

出版信息

Genome Biol. 2024 Dec 18;25(1):312. doi: 10.1186/s13059-024-03452-y.

DOI:10.1186/s13059-024-03452-y
PMID:39696427
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11658127/
Abstract

BACKGROUND

Long-read technologies from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have transformed genomics research by providing diverse data types like HiFi, Duplex, and ultra-long ONT. Despite recent strides in achieving haplotype-phased gapless genome assemblies using long-read technologies, concerns persist regarding the representation of genetic diversity, prompting the development of pangenome references. However, pangenome studies face challenges related to data types, volumes, and cost considerations for each assembled genome, while striving to maintain sensitivity. The absence of comprehensive guidance on optimal data selection exacerbates these challenges.

RESULTS

Our study evaluates recommended data types and volumes required to establish a robust de novo genome assembly pipeline for population-level pangenome projects, extensively examining performance between ONT's Duplex and PacBio HiFi datasets in the context of achieving high-quality phased genomes with enhanced contiguity and completeness. The results show that achieving chromosome-level haplotype-resolved assembly requires 20 × high-quality long reads such as PacBio HiFi or ONT Duplex, combined with 15-20 × of ultra-long ONT per haplotype and 10 × of long-range data such as Omni-C or Hi-C. High-quality long reads from both platforms yield assemblies with comparable contiguity, with HiFi excelling in phasing accuracies, while Duplex generates more T2T contigs.

CONCLUSION

Our study provides insights into optimal data types and volumes for robust de novo genome assembly in population-level pangenome projects. Reassessing the recommended data types and volumes in this study and aligning them with practical economic limitations are vital to the pangenome research community, contributing to their efforts and pushing genomic studies with broader impacts.

摘要

背景

太平洋生物科学公司(PacBio)和牛津纳米孔技术公司(ONT)的长读长技术通过提供如HiFi、双链和超长ONT等多种数据类型,改变了基因组学研究。尽管最近在使用长读长技术实现单倍型定相的无间隙基因组组装方面取得了进展,但对于遗传多样性的代表性仍存在担忧,这促使了泛基因组参考的发展。然而,泛基因组研究在为每个组装基因组考虑数据类型、数量和成本方面面临挑战,同时还要努力保持敏感性。缺乏关于最佳数据选择的全面指导加剧了这些挑战。

结果

我们的研究评估了为群体水平的泛基因组项目建立稳健的从头基因组组装流程所需的推荐数据类型和数量,在实现具有更高连续性和完整性的高质量定相基因组的背景下,广泛研究了ONT的双链和PacBio HiFi数据集之间的性能。结果表明,要实现染色体水平的单倍型解析组装,需要20倍的高质量长读长,如PacBio HiFi或ONT双链,再加上每个单倍型15 - 20倍的超长ONT以及10倍的长程数据,如全基因组染色质构象捕获技术(Omni-C)或高通量染色体构象捕获技术(Hi-C)。来自两个平台的高质量长读长产生的组装具有可比的连续性,HiFi在定相准确性方面表现出色,而双链则产生更多的端粒到端粒(T2T)连续片段。

结论

我们的研究为群体水平的泛基因组项目中稳健的从头基因组组装的最佳数据类型和数量提供了见解。重新评估本研究中推荐的数据类型和数量,并使其与实际经济限制相匹配,对泛基因组研究社区至关重要,有助于他们的工作并推动具有更广泛影响的基因组研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/4cf2cfe0b804/13059_2024_3452_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/de8300f3987e/13059_2024_3452_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/e56566a655b9/13059_2024_3452_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/f6758464f26f/13059_2024_3452_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/c58a695f5427/13059_2024_3452_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/3fe12f123400/13059_2024_3452_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/b1c41a9484be/13059_2024_3452_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/0052b299ae24/13059_2024_3452_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/4cf2cfe0b804/13059_2024_3452_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/de8300f3987e/13059_2024_3452_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/e56566a655b9/13059_2024_3452_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/f6758464f26f/13059_2024_3452_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/c58a695f5427/13059_2024_3452_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/3fe12f123400/13059_2024_3452_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/b1c41a9484be/13059_2024_3452_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/0052b299ae24/13059_2024_3452_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faa5/11658127/4cf2cfe0b804/13059_2024_3452_Fig8_HTML.jpg

相似文献

1
Evaluating data requirements for high-quality haplotype-resolved genomes for creating robust pangenome references.评估用于创建稳健的泛基因组参考的高质量单倍型解析基因组的数据要求。
Genome Biol. 2024 Dec 18;25(1):312. doi: 10.1186/s13059-024-03452-y.
2
Gapless assembly of complete human and plant chromosomes using only nanopore sequencing.仅使用纳米孔测序对完整的人类和植物染色体进行无缝组装。
bioRxiv. 2024 Mar 19:2024.03.15.585294. doi: 10.1101/2024.03.15.585294.
3
Comparison of the two up-to-date sequencing technologies for genome assembly: HiFi reads of Pacific Biosciences Sequel II system and ultralong reads of Oxford Nanopore.比较两种最新的基因组组装测序技术:太平洋生物科学测序仪二代系统的 HiFi 读取和牛津纳米孔的超长读取。
Gigascience. 2020 Dec 15;9(12). doi: 10.1093/gigascience/giaa123.
4
Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.评估真核生物基因组的长读长从头组装工具:见解与考虑。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.
5
Gapless assembly of complete human and plant chromosomes using only nanopore sequencing.仅用纳米孔测序技术实现完整人类和植物染色体的无缝组装。
Genome Res. 2024 Nov 20;34(11):1919-1930. doi: 10.1101/gr.279334.124.
6
Highly accurate long reads are crucial for realizing the potential of biodiversity genomics.高质量的长读长序列对于实现生物多样性基因组学的潜力至关重要。
BMC Genomics. 2023 Mar 16;24(1):117. doi: 10.1186/s12864-023-09193-9.
7
Benchmarking multi-platform sequencing technologies for human genome assembly.多平台测序技术在人类基因组组装中的基准测试。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad300.
8
KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods.使用长读长测序和 Hi-C 测序方法构建分相的、基于双亲的三人韩版参考基因组。
Gigascience. 2022 Mar 24;11. doi: 10.1093/gigascience/giac022.
9
Comparison of Long-Read Methods for Sequencing and Assembly of Lepidopteran Pest Genomes.鳞翅目害虫基因组测序和组装的长读方法比较。
Int J Mol Sci. 2022 Dec 30;24(1):649. doi: 10.3390/ijms24010649.
10
Physical separation of haplotypes in dikaryons allows benchmarking of phasing accuracy in Nanopore and HiFi assemblies with Hi-C data.双核体中单倍型的物理分离允许使用 Hi-C 数据对 Nanopore 和 HiFi 组装的相位准确性进行基准测试。
Genome Biol. 2022 Mar 25;23(1):84. doi: 10.1186/s13059-022-02658-2.

本文引用的文献

1
Gapless assembly of complete human and plant chromosomes using only nanopore sequencing.仅用纳米孔测序技术实现完整人类和植物染色体的无缝组装。
Genome Res. 2024 Nov 20;34(11):1919-1930. doi: 10.1101/gr.279334.124.
2
Telomere-to-telomere assembly by preserving contained reads.通过保留包含的读数进行端粒到端粒组装。
Genome Res. 2024 Nov 20;34(11):1908-1918. doi: 10.1101/gr.279311.124.
3
Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph.使用双图进行二倍体和多倍体基因组的可扩展端粒到端粒组装。
Nat Methods. 2024 Jun;21(6):967-970. doi: 10.1038/s41592-024-02269-8. Epub 2024 May 10.
4
De novo diploid genome assembly using long noisy reads.从头组装具有长噪声读长的二倍体基因组。
Nat Commun. 2024 Apr 5;15(1):2964. doi: 10.1038/s41467-024-47349-7.
5
Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall.全基因组长读长测序下采样及其对变异calling 精度和召回率的影响。
Genome Res. 2023 Dec 27;33(12):2029-2040. doi: 10.1101/gr.278070.123.
6
compleasm: a faster and more accurate reimplementation of BUSCO.compleasm:更快更准确的 BUSCO 重实现。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad595.
7
T2T-YAO: A Telomere-to-telomere Assembled Diploid Reference Genome for Han Chinese.T2T-YAO:一个端粒到端粒组装的中国汉族二倍体参考基因组。
Genomics Proteomics Bioinformatics. 2023 Dec;21(6):1085-1100. doi: 10.1016/j.gpb.2023.08.001. Epub 2023 Aug 16.
8
The complete and fully-phased diploid genome of a male Han Chinese.一位男性汉族个体的完整、全面二倍体基因组。
Cell Res. 2023 Oct;33(10):745-761. doi: 10.1038/s41422-023-00849-5. Epub 2023 Jul 14.
9
A pangenome reference of 36 Chinese populations.36 个中国人群的泛基因组参考图谱。
Nature. 2023 Jul;619(7968):112-121. doi: 10.1038/s41586-023-06173-7. Epub 2023 Jun 14.
10
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.