• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HapCUT2:适用于多种测序技术的强大且准确的单倍型组装工具。

HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies.

作者信息

Edge Peter, Bafna Vineet, Bansal Vikas

机构信息

Department of Computer Science & Engineering, University of California, San Diego, La Jolla, California 92053, USA.

Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, California 92053, USA.

出版信息

Genome Res. 2017 May;27(5):801-812. doi: 10.1101/gr.213462.116. Epub 2016 Dec 9.

DOI:10.1101/gr.213462.116
PMID:27940952
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5411775/
Abstract

Many tools have been developed for haplotype assembly-the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to enable the reconstruction of whole-genome haplotypes. However, existing computational methods designed to handle specific technologies do not scale well on data from different protocols. We describe a new algorithm, HapCUT2, that extends our previous method (HapCUT) to handle multiple sequencing technologies. Using simulations and whole-genome sequencing (WGS) data from multiple different data types-dilution pool sequencing, linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation (Hi-C) sequencing-we show that HapCUT2 rapidly assembles haplotypes with best-in-class accuracy for all data types. In particular, HapCUT2 scales well for high sequencing coverage and rapidly assembled haplotypes for two long-read WGS data sets on which other methods struggled. Further, HapCUT2 directly models Hi-C specific error modalities, resulting in significant improvements in error rates compared to HapCUT, the only other method that could assemble haplotypes from Hi-C data. Using HapCUT2, haplotype assembly from a 90× coverage whole-genome Hi-C data set yielded high-resolution haplotypes (78.6% of variants phased in a single block) with high pairwise phasing accuracy (∼98% across chromosomes). Our results demonstrate that HapCUT2 is a robust tool for haplotype assembly applicable to data from diverse sequencing technologies.

摘要

已经开发了许多用于单倍型组装的工具,即利用映射到参考基因组序列的 reads 来重建个体单倍型。由于对获得单倍型解析的人类基因组的兴趣日益增加,已经开发了一系列新的测序方案和技术,以实现全基因组单倍型的重建。然而,现有的旨在处理特定技术的计算方法在处理来自不同方案的数据时扩展性不佳。我们描述了一种新算法 HapCUT2,它扩展了我们之前的方法(HapCUT)以处理多种测序技术。使用来自多种不同数据类型的模拟和全基因组测序(WGS)数据——稀释池测序、连接 reads 测序、单分子实时(SMRT)测序和邻近连接(Hi-C)测序——我们表明 HapCUT2 能快速组装单倍型,对所有数据类型都具有一流的准确性。特别是,HapCUT2 在高测序覆盖度下扩展性良好,并能快速为另外两种其他方法难以处理的长 reads WGS 数据集组装单倍型。此外,HapCUT2 直接对 Hi-C 特定的错误模式进行建模,与唯一一种也能从 Hi-C 数据组装单倍型的其他方法 HapCUT 相比,错误率有显著改善。使用 HapCUT2,从 90×覆盖度的全基因组 Hi-C 数据集组装单倍型产生了高分辨率单倍型(78.6%的变异在单个区域内被定相),且具有高的成对定相准确性(跨染色体约 98%)。我们的结果表明,HapCUT2 是一种适用于来自多种测序技术数据的稳健的单倍型组装工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08c5/5411775/a48c24a6aa6f/801f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08c5/5411775/552f433ee4e6/801f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08c5/5411775/a48c24a6aa6f/801f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08c5/5411775/552f433ee4e6/801f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08c5/5411775/a48c24a6aa6f/801f04.jpg

相似文献

1
HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies.HapCUT2:适用于多种测序技术的强大且准确的单倍型组装工具。
Genome Res. 2017 May;27(5):801-812. doi: 10.1101/gr.213462.116. Epub 2016 Dec 9.
2
HapCUT2: A Method for Phasing Genomes Using Experimental Sequence Data.HapCUT2:一种使用实验序列数据进行基因组相位分析的方法。
Methods Mol Biol. 2023;2590:139-147. doi: 10.1007/978-1-0716-2819-5_9.
3
Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes.基于读取和基于群体的相位整合,实现个体基因组的密集和精确单倍型分型。
Bioinformatics. 2019 Jul 15;35(14):i242-i248. doi: 10.1093/bioinformatics/btz329.
4
A Comparison between Hi-C and 10X Genomics Linked Read Sequencing for Whole Genome Phasing in Hanwoo Cattle.韩牛全基因组定相分析中Hi-C与10X基因组连接读长测序的比较
Genes (Basel). 2020 Mar 20;11(3):332. doi: 10.3390/genes11030332.
5
Canu: scalable and accurate long-read assembly via adaptive -mer weighting and repeat separation.Canu:通过自适应k-mer加权和重复序列分离实现可扩展且准确的长读长序列拼接
Genome Res. 2017 May;27(5):722-736. doi: 10.1101/gr.215087.116. Epub 2017 Mar 15.
6
HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies.HySA:一种使用下一代测序技术和单分子测序技术的混合结构变异组装方法。
Genome Res. 2017 May;27(5):793-800. doi: 10.1101/gr.214767.116. Epub 2017 Jan 19.
7
Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C.利用 Hi-C 对长读从头基因组组装进行扩展单倍型相位分析。
Nat Commun. 2021 Apr 28;12(1):1935. doi: 10.1038/s41467-020-20536-y.
8
Hybrid assembly of the large and highly repetitive genome of , a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.利用MaSuRCA巨读算法对面包小麦的祖先之一——[具体物种名称未给出]的大型高度重复基因组进行混合组装。
Genome Res. 2017 May;27(5):787-792. doi: 10.1101/gr.213405.116. Epub 2017 Jan 27.
9
A Long-Read Sequencing Approach for Direct Haplotype Phasing in Clinical Settings.一种在临床环境中直接进行单体型定相的长读测序方法。
Int J Mol Sci. 2020 Dec 1;21(23):9177. doi: 10.3390/ijms21239177.
10
Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.对GRCh38和从头单倍体基因组组装的评估证明了参考组装的持久质量。
Genome Res. 2017 May;27(5):849-864. doi: 10.1101/gr.213611.116. Epub 2017 Apr 10.

引用本文的文献

1
Chromosome-level haplotype-resolved genome assembly provides insights into the highly heterozygous genome of Italian ryegrass (Lolium multiflorum Lam.).染色体水平单倍型解析的基因组组装为多花黑麦草(Lolium multiflorum Lam.)高度杂合的基因组提供了见解。
Plant Genome. 2025 Sep;18(3):e70079. doi: 10.1002/tpg2.70079.
2
Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.对广泛认可的“瓶中基因组”匹配的肿瘤-正常样本对进行开发及深度测序。
Sci Data. 2025 Jul 16;12(1):1195. doi: 10.1038/s41597-025-05438-2.
3
Harpy: a pipeline for processing haplotagging linked-read data.

本文引用的文献

1
Extensive sequencing of seven human genomes to characterize benchmark reference materials.对七个人类基因组进行广泛测序以表征基准参考材料。
Sci Data. 2016 Jun 7;3:160025. doi: 10.1038/sdata.2016.25.
2
InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms.InPhaDel:整合鸟枪法和邻近连接测序以对单核苷酸多态性的缺失进行定相分析
Nucleic Acids Res. 2016 Jul 8;44(12):e111. doi: 10.1093/nar/gkw281. Epub 2016 Apr 21.
3
Chromosome-scale shotgun assembly using an in vitro method for long-range linkage.
哈比:一种用于处理单倍型标记连接读段数据的流程。
Bioinform Adv. 2025 Jun 5;5(1):vbaf133. doi: 10.1093/bioadv/vbaf133. eCollection 2025.
4
HPTAS: An Alignment-Free Haplotype Phasing Algorithm Focused on Allele-Specific Studies Using Transcriptome Data.HPTAS:一种无比对的单倍型分型算法,专注于利用转录组数据进行等位基因特异性研究。
Int J Mol Sci. 2025 Jun 13;26(12):5700. doi: 10.3390/ijms26125700.
5
Possible Involvement of Ghost Introgressions in the Striking Diversity of Vomeronasal Type 1 Receptor Genes in East African Cichlids.幽灵基因渗入可能与东非丽鱼科鱼类犁鼻器1型受体基因的显著多样性有关。
Ecol Evol. 2025 May 21;15(5):e71467. doi: 10.1002/ece3.71467. eCollection 2025 May.
6
The fourspine stickleback (Apeltes quadracus) has an XY sex chromosome system with polymorphic inversions on both X and Y chromosomes.四棘刺鱼(Apeltes quadracus)具有XY性别染色体系统,X和Y染色体上均存在多态性倒位。
PLoS Genet. 2025 May 9;21(5):e1011465. doi: 10.1371/journal.pgen.1011465. eCollection 2025 May.
7
Restoring flowcell type and basecaller configuration from FASTQ files of nanopore sequencing data.从纳米孔测序数据的FASTQ文件中恢复流动槽类型和碱基识别器配置。
Nat Commun. 2025 May 2;16(1):4102. doi: 10.1038/s41467-025-59378-x.
8
A Hitchhiker's Guide to long-read genomic analysis.长读长基因组分析指南
Genome Res. 2025 Apr 14;35(4):545-558. doi: 10.1101/gr.279975.124.
9
Hi-reComb: constructing recombination maps from bulk gamete Hi-C sequencing.Hi-reComb:从大量配子Hi-C测序构建重组图谱。
bioRxiv. 2025 Mar 12:2025.03.06.641907. doi: 10.1101/2025.03.06.641907.
10
Reconstruction of diploid higher-order human 3D genome interactions from noisy Pore-C data using Dip3D.使用Dip3D从有噪声的Pore-C数据重建二倍体高阶人类3D基因组相互作用。
Nat Struct Mol Biol. 2025 Mar 4. doi: 10.1038/s41594-025-01512-w.
使用体外方法进行长程连锁的染色体水平鸟枪法组装。
Genome Res. 2016 Mar;26(3):342-50. doi: 10.1101/gr.193474.115. Epub 2016 Feb 4.
4
Haplotyping germline and cancer genomes with high-throughput linked-read sequencing.利用高通量连锁读长测序对种系和癌症基因组进行单倍型分型
Nat Biotechnol. 2016 Mar;34(3):303-11. doi: 10.1038/nbt.3432. Epub 2016 Feb 1.
5
Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome.牛津纳米孔测序、混合纠错及真核生物基因组的从头组装
Genome Res. 2015 Nov;25(11):1750-6. doi: 10.1101/gr.191395.115. Epub 2015 Oct 7.
6
Assembly and diploid architecture of an individual human genome via single-molecule technologies.通过单分子技术构建单个人类基因组的组装与二倍体结构
Nat Methods. 2015 Aug;12(8):780-6. doi: 10.1038/nmeth.3454. Epub 2015 Jun 29.
7
Haplotype-resolved genome sequencing: experimental methods and applications.单体型解析基因组测序:实验方法与应用。
Nat Rev Genet. 2015 Jun;16(6):344-58. doi: 10.1038/nrg3903. Epub 2015 May 7.
8
A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.一份碱基对分辨率的人类基因组三维图谱揭示了染色质环化的原理。
Cell. 2014 Dec 18;159(7):1665-80. doi: 10.1016/j.cell.2014.11.021. Epub 2014 Dec 11.
9
Whole-genome haplotyping approaches and genomic medicine.全基因组单倍型分析方法与基因组医学
Genome Med. 2014 Sep 25;6(9):73. doi: 10.1186/s13073-014-0073-7. eCollection 2014.
10
Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing.通过保持相邻性的转座和组合索引进行单倍型解析的全基因组测序。
Nat Genet. 2014 Dec;46(12):1343-9. doi: 10.1038/ng.3119. Epub 2014 Oct 19.