• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AsmMix:一种高效的单倍型解析混合基因组组装流程。

AsmMix: an efficient haplotype-resolved hybrid genome assembling pipeline.

作者信息

Liu Chao, Wu Pei, Wu Xue, Zhao Xia, Chen Fang, Cheng Xiaofang, Zhu Hongmei, Wang Ou, Xu Mengyang

机构信息

BGI, Tianjin, China.

BGI Research, Shenzhen, China.

出版信息

Front Genet. 2024 Jul 26;15:1421565. doi: 10.3389/fgene.2024.1421565. eCollection 2024.

DOI:10.3389/fgene.2024.1421565
PMID:39130747
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11310137/
Abstract

Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.

摘要

准确的单倍型分型有助于区分等位基因特异性表达、识别顺式调控元件以及表征基因组变异,从而能够更精确地研究基因型与表型之间的关系。第三代单分子长读长和合成共条形码读长测序技术的最新进展利用了长程信息来简化组装图并改善基因组序列组装。然而,由于长读长的测序错误率高以及共条形码读长的捕获效率有限,重建完整的单倍型在方法上仍然具有挑战性。我们在此提出了一种名为AsmMix的流程,用于生成连续且准确的二倍体基因组。它首先组装共条形码读长以生成可能包含许多缺口的准确的单倍型解析组装,而长读长组装是连续的但容易出错。然后将两个组装集整合到错误组装减少的单倍型解析组装中。通过对多个合成数据集的广泛评估,AsmMix在各种测序平台、覆盖深度、读长和读长准确性上始终展示出用于单倍型分型的高精度和召回率,显著优于该领域的其他现有工具。此外,我们使用人类全基因组数据集(HG002)验证了我们流程的有效性,并生成了高度连续、准确且单倍型解析的组装。这些组装使用GIAB基准进行评估,证实了变异检测的准确性。我们的结果表明,AsmMix提供了一种直接而高效的方法,有效地利用长读长和共条形码读长进行单倍型解析组装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/13fb755f5f9a/fgene-15-1421565-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/7a6d9facfa4a/fgene-15-1421565-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/00af7c145c8c/fgene-15-1421565-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/a23963f11f09/fgene-15-1421565-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/e640b47c9505/fgene-15-1421565-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/7047887508b4/fgene-15-1421565-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/a3fdbda7f427/fgene-15-1421565-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/42ab2206664f/fgene-15-1421565-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/13fb755f5f9a/fgene-15-1421565-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/7a6d9facfa4a/fgene-15-1421565-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/00af7c145c8c/fgene-15-1421565-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/a23963f11f09/fgene-15-1421565-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/e640b47c9505/fgene-15-1421565-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/7047887508b4/fgene-15-1421565-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/a3fdbda7f427/fgene-15-1421565-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/42ab2206664f/fgene-15-1421565-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2818/11310137/13fb755f5f9a/fgene-15-1421565-g008.jpg

相似文献

1
AsmMix: an efficient haplotype-resolved hybrid genome assembling pipeline.AsmMix:一种高效的单倍型解析混合基因组组装流程。
Front Genet. 2024 Jul 26;15:1421565. doi: 10.3389/fgene.2024.1421565. eCollection 2024.
2
SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs.SpLitter:利用 TELL-Seq 连接读取和组装图进行二倍体基因组组装。
PeerJ. 2024 Sep 27;12:e18050. doi: 10.7717/peerj.18050. eCollection 2024.
3
De novo diploid genome assembly using long noisy reads.从头组装具有长噪声读长的二倍体基因组。
Nat Commun. 2024 Apr 5;15(1):2964. doi: 10.1038/s41467-024-47349-7.
4
Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools.通过合成基因组文库的长读长测序完成细菌基因组草图
BMC Genomics. 2020 Jul 29;21(1):519. doi: 10.1186/s12864-020-06910-6.
5
Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios.准确的单倍型解析组装揭示了人类三人组结构变异的起源。
Bioinformatics. 2021 Aug 9;37(15):2095-2102. doi: 10.1093/bioinformatics/btab068.
6
stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads.stLFRsv:一种使用共条形码读取的种系结构变异分析流程
Front Genet. 2021 Mar 18;12:636239. doi: 10.3389/fgene.2021.636239. eCollection 2021.
7
Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads.基于连锁reads 的单体型辅助二倍体组装和变异检测。
Methods Mol Biol. 2023;2590:161-182. doi: 10.1007/978-1-0716-2819-5_11.
8
Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case.利用长读长和短读数据组装叶绿体基因组:以白千层作为测试案例的方法比较。
BMC Genomics. 2018 Dec 29;19(1):977. doi: 10.1186/s12864-018-5348-8.
9
Gamete binning: chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes.配子-bin 分析:通过高通量单细胞配子基因组测序实现染色体水平和单倍型分辨率的基因组组装。
Genome Biol. 2020 Dec 29;21(1):306. doi: 10.1186/s13059-020-02235-5.
10
Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.使用 PEPPER-Margin-DeepVariant 进行单体型感知变异调用可实现纳米孔长读段的高精度。
Nat Methods. 2021 Nov;18(11):1322-1332. doi: 10.1038/s41592-021-01299-w. Epub 2021 Nov 1.

本文引用的文献

1
MetaTrass: A high-quality metagenome assembler of the human gut microbiome by cobarcoding sequencing reads.MetaTrass:一种通过共条形码测序读数对人类肠道微生物组进行高质量宏基因组组装的工具。
Imeta. 2022 Aug 15;1(4):e46. doi: 10.1002/imt2.46. eCollection 2022 Dec.
2
Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly.通过优化基于单分子实时测序的单倍体组装构建端粒到端粒的二倍体基因组。
Nat Methods. 2024 Apr;21(4):574-583. doi: 10.1038/s41592-023-02141-1. Epub 2024 Mar 8.
3
Hybrid-hybrid correction of errors in long reads with HERO.
使用 HERO 对长读进行混合-混合纠错。
Genome Biol. 2023 Dec 1;24(1):275. doi: 10.1186/s13059-023-03112-7.
4
Ariadne: synthetic long read deconvolution using assembly graphs.Ariadne:基于组装图的合成长读片段分解。
Genome Biol. 2023 Aug 28;24(1):197. doi: 10.1186/s13059-023-03033-5.
5
The complete sequence of a human Y chromosome.人类 Y 染色体的完整序列。
Nature. 2023 Sep;621(7978):344-354. doi: 10.1038/s41586-023-06457-y. Epub 2023 Aug 23.
6
The complete and fully-phased diploid genome of a male Han Chinese.一位男性汉族个体的完整、全面二倍体基因组。
Cell Res. 2023 Oct;33(10):745-761. doi: 10.1038/s41422-023-00849-5. Epub 2023 Jul 14.
7
Gapless provides combined scaffolding, gap filling, and assembly correction with long reads.Gapless 提供了无间隙的支架搭建、缺口填充和长读段的组装修正功能。
Life Sci Alliance. 2023 May 4;6(7). doi: 10.26508/lsa.202201471. Print 2023 Jul.
8
VT3D: a visualization toolbox for 3D transcriptomic data.VT3D:一个用于 3D 转录组学数据的可视化工具包。
J Genet Genomics. 2023 Sep;50(9):713-719. doi: 10.1016/j.jgg.2023.04.001. Epub 2023 Apr 11.
9
Aquila_stLFR: diploid genome assembly based structural variant calling package for stLFR linked-reads.Aquila_stLFR:用于stLFR连接 reads 的基于二倍体基因组组装的结构变异检测软件包。
Bioinform Adv. 2021 Jun 16;1(1):vbab007. doi: 10.1093/bioadv/vbab007. eCollection 2021.
10
Editorial: Long-read sequencing-Pitfalls, benefits and success stories.社论:长读长测序——陷阱、益处与成功案例
Front Genet. 2023 Jan 4;13:1114542. doi: 10.3389/fgene.2022.1114542. eCollection 2022.