• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于组装图的概率搜索光学图谱填补基因组支架的缺口。

Filling gaps of genome scaffolds via probabilistic searching optical maps against assembly graph.

机构信息

Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.

Institute of Biology, University of Chinese Academy of Sciences, Beijing, 100049, China.

出版信息

BMC Bioinformatics. 2021 Oct 30;22(1):533. doi: 10.1186/s12859-021-04448-2.

DOI:10.1186/s12859-021-04448-2
PMID:34717539
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8557617/
Abstract

BACKGROUND

Optical maps record locations of specific enzyme recognition sites within long genome fragments. This long-distance information enables aligning genome assembly contigs onto optical maps and ordering contigs into scaffolds. The generated scaffolds, however, often contain a large amount of gaps. To fill these gaps, a feasible way is to search genome assembly graph for the best-matching contig paths that connect boundary contigs of gaps. The combination of searching and evaluation procedures might be "searching followed by evaluation", which is infeasible for long gaps, or "searching by evaluation", which heavily relies on heuristics and thus usually yields unreliable contig paths.

RESULTS

We here report an accurate and efficient approach to filling gaps of genome scaffolds with aids of optical maps. Using simulated data from 12 species and real data from 3 species, we demonstrate the successful application of our approach in gap filling with improved accuracy and completeness of genome scaffolds.

CONCLUSION

Our approach applies a sequential Bayesian updating technique to measure the similarity between optical maps and candidate contig paths. Using this similarity to guide path searching, our approach achieves higher accuracy than the existing "searching by evaluation" strategy that relies on heuristics. Furthermore, unlike the "searching followed by evaluation" strategy enumerating all possible paths, our approach prunes the unlikely sub-paths and extends the highly-probable ones only, thus significantly increasing searching efficiency.

摘要

背景

光学图谱记录了长基因组片段中特定酶识别位点的位置。这种长距离信息可用于将基因组组装 contigs 比对到光学图谱上,并将 contigs 排序到支架中。然而,生成的支架通常包含大量的缺口。为了填补这些缺口,可以通过在基因组组装图中搜索最佳匹配的 contig 路径来连接缺口的边界 contigs。搜索和评估过程的组合可能是“先搜索后评估”,对于长缺口来说是不可行的,或者是“通过评估进行搜索”,这严重依赖于启发式方法,因此通常会产生不可靠的 contig 路径。

结果

我们在这里报告了一种利用光学图谱填补基因组支架缺口的准确高效方法。使用来自 12 个物种的模拟数据和来自 3 个物种的真实数据,我们展示了该方法在缺口填补方面的成功应用,提高了基因组支架的准确性和完整性。

结论

我们的方法应用了一种顺序贝叶斯更新技术来测量光学图谱和候选 contig 路径之间的相似度。利用这种相似度来指导路径搜索,我们的方法比依赖启发式的现有“通过评估进行搜索”策略具有更高的准确性。此外,与枚举所有可能路径的“先搜索后评估”策略不同,我们的方法仅修剪不太可能的子路径,并扩展高度可能的路径,从而显著提高搜索效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/16ad52c7a2fa/12859_2021_4448_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/eb6b3f862e4d/12859_2021_4448_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/c3bf1f491904/12859_2021_4448_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/9c4f5bb81280/12859_2021_4448_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/d47f1d13d34c/12859_2021_4448_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/16ad52c7a2fa/12859_2021_4448_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/eb6b3f862e4d/12859_2021_4448_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/c3bf1f491904/12859_2021_4448_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/9c4f5bb81280/12859_2021_4448_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/d47f1d13d34c/12859_2021_4448_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/8557617/16ad52c7a2fa/12859_2021_4448_Fig5_HTML.jpg

相似文献

1
Filling gaps of genome scaffolds via probabilistic searching optical maps against assembly graph.基于组装图的概率搜索光学图谱填补基因组支架的缺口。
BMC Bioinformatics. 2021 Oct 30;22(1):533. doi: 10.1186/s12859-021-04448-2.
2
OMACC: an Optical-Map-Assisted Contig Connector for improving de novo genome assembly.OMACC:一种用于改进从头基因组组装的光学图谱辅助重叠群连接工具。
BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S7. doi: 10.1186/1752-0509-7-S6-S7. Epub 2013 Dec 13.
3
LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes.LTC:一种提高复杂基因组物理作图中 contig 组装效率的新算法。
BMC Bioinformatics. 2010 Nov 30;11:584. doi: 10.1186/1471-2105-11-584.
4
OMGS: Optical Map-Based Genome Scaffolding.OMGS:基于光学图谱的基因组支架构建
J Comput Biol. 2020 Apr;27(4):519-533. doi: 10.1089/cmb.2019.0310. Epub 2019 Dec 3.
5
HiC-Hiker: a probabilistic model to determine contig orientation in chromosome-length scaffolds with Hi-C.HiC-Hiker:一种基于 Hi-C 技术确定染色体长度支架中连续序列方向的概率模型。
Bioinformatics. 2020 Jul 1;36(13):3966-3974. doi: 10.1093/bioinformatics/btaa288.
6
Gap Filling as Exact Path Length Problem.间隙填充作为精确路径长度问题。
J Comput Biol. 2016 May;23(5):347-61. doi: 10.1089/cmb.2015.0197. Epub 2016 Mar 9.
7
SLIQ: simple linear inequalities for efficient contig scaffolding.SLIQ:用于高效重叠群支架构建的简单线性不等式
J Comput Biol. 2012 Oct;19(10):1162-75. doi: 10.1089/cmb.2011.0263.
8
Facilitated sequence assembly using densely labeled optical DNA barcodes: A combinatorial auction approach.利用高密度标记的光学 DNA 条码进行序列组装:组合拍卖方法。
PLoS One. 2018 Mar 9;13(3):e0193900. doi: 10.1371/journal.pone.0193900. eCollection 2018.
9
Aligning optical maps to de Bruijn graphs.将光学图谱比对到 De Bruijn 图上。
Bioinformatics. 2019 Sep 15;35(18):3250-3256. doi: 10.1093/bioinformatics/btz069.
10
Accurate detection of chimeric contigs via Bionano optical maps.通过 Bionano 光学图谱准确检测嵌合体片段。
Bioinformatics. 2019 May 15;35(10):1760-1762. doi: 10.1093/bioinformatics/bty850.

本文引用的文献

1
Computational methods for chromosome-scale haplotype reconstruction.染色体级别的单倍型重构的计算方法。
Genome Biol. 2021 Apr 12;22(1):101. doi: 10.1186/s13059-021-02328-9.
2
BiSCoT: improving large eukaryotic genome assemblies with optical maps.BiSCoT:利用光学图谱改进大型真核生物基因组组装
PeerJ. 2020 Nov 5;8:e10150. doi: 10.7717/peerj.10150. eCollection 2020.
3
instaGRAAL: chromosome-level quality scaffolding of genomes using a proximity ligation-based scaffolder.instaGRAAL:基于邻近连接的支架的基因组染色体水平的高质量支架。
Genome Biol. 2020 Jun 18;21(1):148. doi: 10.1186/s13059-020-02041-z.
4
Assessment of low-coverage nanopore long read sequencing for SNP genotyping in doubled haploid canola (Brassica napus L.).评估低覆盖度纳米孔长读测序在双单倍体油菜(甘蓝型油菜)SNP 基因分型中的应用。
Sci Rep. 2019 Jun 18;9(1):8688. doi: 10.1038/s41598-019-45131-0.
5
Aligning optical maps to de Bruijn graphs.将光学图谱比对到 De Bruijn 图上。
Bioinformatics. 2019 Sep 15;35(18):3250-3256. doi: 10.1093/bioinformatics/btz069.
6
LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.LR_Gapcloser:一种基于平铺路径的缺口闭合器,它使用长读长来完成基因组组装。
Gigascience. 2019 Jan 1;8(1):giy157. doi: 10.1093/gigascience/giy157.
7
Novo&Stitch: accurate reconciliation of genome assemblies via optical maps.Novo&Stitch:通过光学图谱实现基因组组装的精确比对。
Bioinformatics. 2018 Jul 1;34(13):i43-i51. doi: 10.1093/bioinformatics/bty255.
8
Modelling BioNano optical data and simulation study of genome map assembly.生物纳米光学数据建模及基因组图谱组装的模拟研究。
Bioinformatics. 2018 Dec 1;34(23):3966-3974. doi: 10.1093/bioinformatics/bty456.
9
Multiscale Structuring of the E. coli Chromosome by Nucleoid-Associated and Condensin Proteins.大肠杆菌染色体的核基质相关和凝聚蛋白的多尺度结构。
Cell. 2018 Feb 8;172(4):771-783.e18. doi: 10.1016/j.cell.2017.12.027. Epub 2018 Jan 18.
10
OMSim: a simulator for optical map data.OMSim:光学图谱数据模拟器。
Bioinformatics. 2017 Sep 1;33(17):2740-2742. doi: 10.1093/bioinformatics/btx293.