• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HINGE:长读长组装可实现最佳的重复序列解析。

HINGE: long-read assembly achieves optimal repeat resolution.

作者信息

Kamath Govinda M, Shomorony Ilan, Xia Fei, Courtade Thomas A, Tse David N

机构信息

Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA.

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA.

出版信息

Genome Res. 2017 May;27(5):747-756. doi: 10.1101/gr.216465.116. Epub 2017 Mar 20.

DOI:10.1101/gr.216465.116
PMID:28320918
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5411769/
Abstract

Long-read sequencing technologies have the potential to produce gold-standard de novo genome assemblies, but fully exploiting error-prone reads to resolve repeats remains a challenge. Aggressive approaches to repeat resolution often produce misassemblies, and conservative approaches lead to unnecessary fragmentation. We present HINGE, an assembler that seeks to achieve optimal repeat resolution by distinguishing repeats that can be resolved given the data from those that cannot. This is accomplished by adding "hinges" to reads for constructing an overlap graph where only unresolvable repeats are merged. As a result, HINGE combines the error resilience of overlap-based assemblers with repeat-resolution capabilities of de Bruijn graph assemblers. HINGE was evaluated on the long-read bacterial data sets from the NCTC project. HINGE produces more finished assemblies than Miniasm and the manual pipeline of NCTC based on the HGAP assembler and Circlator. HINGE also allows us to identify 40 data sets where unresolvable repeats prevent the reliable construction of a unique finished assembly. In these cases, HINGE outputs a visually interpretable assembly graph that encodes all possible finished assemblies consistent with the reads, while other approaches such as the NCTC pipeline and FALCON either fragment the assembly or resolve the ambiguity arbitrarily.

摘要

长读长测序技术有潜力生成金标准的从头基因组组装结果,但充分利用易出错的读段来解析重复序列仍是一项挑战。激进的重复序列解析方法往往会产生错误组装,而保守的方法则会导致不必要的片段化。我们提出了HINGE,这是一种组装程序,旨在通过区分根据数据可解析的重复序列和不可解析的重复序列来实现最佳的重复序列解析。这是通过在读取序列中添加“铰链”来构建重叠图来实现的,在该重叠图中,只有不可解析的重复序列才会被合并。因此,HINGE将基于重叠的组装程序的错误恢复能力与德布鲁因图组装程序的重复序列解析能力结合起来。HINGE在来自NCTC项目的长读长细菌数据集上进行了评估。与基于HGAP组装程序和Circlator的Miniasm以及NCTC的手动流程相比,HINGE产生的完整组装结果更多。HINGE还使我们能够识别40个数据集,在这些数据集中,不可解析的重复序列阻碍了唯一完整组装的可靠构建。在这些情况下,HINGE输出一个可视化可解释的组装图,该图编码了与读段一致的所有可能的完整组装结果,而其他方法,如NCTC流程和FALCON,要么使组装碎片化,要么任意解决模糊性问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/9c42382a5475/747f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/9fb43369a3a0/747f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/271fc0fb5bcb/747f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/527788a87949/747f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/63de57963fb6/747f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/a99c9e55d6b8/747f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/9c42382a5475/747f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/9fb43369a3a0/747f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/271fc0fb5bcb/747f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/527788a87949/747f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/63de57963fb6/747f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/a99c9e55d6b8/747f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/5411769/9c42382a5475/747f06.jpg

相似文献

1
HINGE: long-read assembly achieves optimal repeat resolution.HINGE:长读长组装可实现最佳的重复序列解析。
Genome Res. 2017 May;27(5):747-756. doi: 10.1101/gr.216465.116. Epub 2017 Mar 20.
2
Canu: scalable and accurate long-read assembly via adaptive -mer weighting and repeat separation.Canu:通过自适应k-mer加权和重复序列分离实现可扩展且准确的长读长序列拼接
Genome Res. 2017 May;27(5):722-736. doi: 10.1101/gr.215087.116. Epub 2017 Mar 15.
3
Hybrid assembly of the large and highly repetitive genome of , a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.利用MaSuRCA巨读算法对面包小麦的祖先之一——[具体物种名称未给出]的大型高度重复基因组进行混合组装。
Genome Res. 2017 May;27(5):787-792. doi: 10.1101/gr.213405.116. Epub 2017 Jan 27.
4
Improved assembly of noisy long reads by k-mer validation.通过k-mer验证改进嘈杂长读段的组装。
Genome Res. 2016 Dec;26(12):1710-1720. doi: 10.1101/gr.209247.116. Epub 2016 Oct 7.
5
Assembly of long, error-prone reads using repeat graphs.使用重复图组装长的、易错的读取。
Nat Biotechnol. 2019 May;37(5):540-546. doi: 10.1038/s41587-019-0072-8. Epub 2019 Apr 1.
6
Fast and accurate de novo genome assembly from long uncorrected reads.从长的未校正读段中进行快速且准确的从头基因组组装。
Genome Res. 2017 May;27(5):737-746. doi: 10.1101/gr.214270.116. Epub 2017 Jan 18.
7
FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch:一种基于草图的快速基因组装配器。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.
8
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.
9
RepLong: de novo repeat identification using long read sequencing data.RepLong:利用长读测序数据进行从头重复识别。
Bioinformatics. 2018 Apr 1;34(7):1099-1107. doi: 10.1093/bioinformatics/btx717.
10
Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.对GRCh38和从头单倍体基因组组装的评估证明了参考组装的持久质量。
Genome Res. 2017 May;27(5):849-864. doi: 10.1101/gr.213611.116. Epub 2017 Apr 10.

引用本文的文献

1
Genome assembly in the telomere-to-telomere era.端粒到端粒时代的基因组组装。
Nat Rev Genet. 2024 Sep;25(9):658-670. doi: 10.1038/s41576-024-00718-w. Epub 2024 Apr 22.
2
De novo diploid genome assembly using long noisy reads.从头组装具有长噪声读长的二倍体基因组。
Nat Commun. 2024 Apr 5;15(1):2964. doi: 10.1038/s41467-024-47349-7.
3
Comprehensive assessment of 11 de novo HiFi assemblers on complex eukaryotic genomes and metagenomes.对 11 种从头开始的 HiFi 组装器在复杂真核基因组和宏基因组上的综合评估。

本文引用的文献

1
Fast and accurate de novo genome assembly from long uncorrected reads.从长的未校正读段中进行快速且准确的从头基因组组装。
Genome Res. 2017 May;27(5):737-746. doi: 10.1101/gr.214270.116. Epub 2017 Jan 18.
2
An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.一种改进的基因组组装方法揭示了大西洋鳕鱼中丰富的串联重复序列。
BMC Genomics. 2017 Jan 18;18(1):95. doi: 10.1186/s12864-016-3448-x.
3
Assembly of long error-prone reads using de Bruijn graphs.使用德布鲁因图组装长易错读段。
Genome Res. 2024 Mar 20;34(2):326-340. doi: 10.1101/gr.278232.123.
4
Comparison of assembly using long-read shotgun metagenomic sequencing of viruses in fecal and serum samples from marine mammals.利用长读长鸟枪法宏基因组测序对海洋哺乳动物粪便和血清样本中的病毒进行组装的比较。
Front Microbiol. 2023 Sep 22;14:1248323. doi: 10.3389/fmicb.2023.1248323. eCollection 2023.
5
Repetitive DNA sequence detection and its role in the human genome.重复 DNA 序列检测及其在人类基因组中的作用。
Commun Biol. 2023 Sep 19;6(1):954. doi: 10.1038/s42003-023-05322-y.
6
Genome sequence assembly algorithms and misassembly identification methods.基因组序列组装算法和错误组装识别方法。
Mol Biol Rep. 2022 Nov;49(11):11133-11148. doi: 10.1007/s11033-022-07919-8. Epub 2022 Sep 23.
7
Long-read mapping to repetitive reference sequences using Winnowmap2.使用Winnowmap2将长读段映射到重复参考序列。
Nat Methods. 2022 Jun;19(6):705-710. doi: 10.1038/s41592-022-01457-8. Epub 2022 Apr 1.
8
Nanopore sequencing technology, bioinformatics and applications.纳米孔测序技术、生物信息学及其应用。
Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8.
9
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.宏基因组学音乐——应用、分析流程及其相关工具的综述。
Funct Integr Genomics. 2022 Feb;22(1):3-26. doi: 10.1007/s10142-021-00810-y. Epub 2021 Oct 18.
10
Empirical evaluation of methods for genome assembly.基因组组装方法的实证评估。
PeerJ Comput Sci. 2021 Jul 9;7:e636. doi: 10.7717/peerj-cs.636. eCollection 2021.
Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12.
4
Phased diploid genome assembly with single-molecule real-time sequencing.基于单分子实时测序的阶段性二倍体基因组组装
Nat Methods. 2016 Dec;13(12):1050-1054. doi: 10.1038/nmeth.4035. Epub 2016 Oct 17.
5
Information-optimal genome assembly via sparse read-overlap graphs.通过稀疏读段重叠图实现信息最优的基因组组装
Bioinformatics. 2016 Sep 1;32(17):i494-i502. doi: 10.1093/bioinformatics/btw450.
6
Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.Minimap和miniasm:用于有噪声长序列的快速映射和从头组装。
Bioinformatics. 2016 Jul 15;32(14):2103-10. doi: 10.1093/bioinformatics/btw152. Epub 2016 Mar 19.
7
The Atlantic salmon genome provides insights into rediploidization.大西洋鲑鱼基因组为重新二倍体化提供了见解。
Nature. 2016 May 12;533(7602):200-5. doi: 10.1038/nature17164. Epub 2016 Apr 18.
8
Circlator: automated circularization of genome assemblies using long sequencing reads.Circlator:利用长测序读段实现基因组组装的自动化环化
Genome Biol. 2015 Dec 29;16:294. doi: 10.1186/s13059-015-0849-0.
9
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.利用单分子测序和局部敏感哈希组装大型基因组。
Nat Biotechnol. 2015 Jun;33(6):623-30. doi: 10.1038/nbt.3238. Epub 2015 May 25.
10
Reducing assembly complexity of microbial genomes with single-molecule sequencing.利用单分子测序降低微生物基因组的组装复杂性
Genome Biol. 2013;14(9):R101. doi: 10.1186/gb-2013-14-9-r101.