Suppr超能文献

一种三坐标坐标系统,用于快速准确地分析基于三色 de Bruijn 图的泛基因组。

A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes.

机构信息

State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China.

出版信息

BMC Bioinformatics. 2021 May 27;22(1):282. doi: 10.1186/s12859-021-04149-w.

Abstract

BACKGROUND

With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge.

RESULTS

We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (< 50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure.

CONCLUSIONS

Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C +  + program for implementing our method that is available at https://github.com/eggleader/cSupB .

摘要

背景

随着精确测序和组装技术的快速发展,越来越多的高质量染色体级和单倍型解析基因组序列组装已经产生,这为计算泛基因组学提供了巨大的机会。虽然基因组图是泛基因组表示最有用的模型之一,但它们的结构复杂性使得很难直观地呈现基因组信息,例如线性参考基因组。因此,有效地和准确地分析基因组图的空间结构并协调信息仍然是一个重大挑战。

结果

我们开发了一种新的方法,即彩色超级泡泡(cSupB),它可以克服图的复杂性,并组织一组特定于物种或群体的感兴趣的单倍型序列。基于这个模型,我们提出了一个三坐标系统,结合了偏移值、拓扑结构和样本信息。此外,cSupB 提供了一种新颖的方法,利用完整的拓扑信息和高效地检测高度相似样本中的小插入缺失(<50 bp),这可以通过模拟数据集进行验证。此外,我们证明 cSupB 可以适应复杂的循环结构。

结论

虽然通过放宽约束条件,解决方案适用于越来越复杂的基因组图,有向无环图、模式 cSupB 和 cSupB 方法可以扩展到任何彩色有向无环图。我们预计,我们的方法将有助于分析个体单倍型变体和群体基因组多样性。我们已经开发了一个 C++程序来实现我们的方法,可在 https://github.com/eggleader/cSupB 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b19a/8161984/30e36ec0e1d7/12859_2021_4149_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验