Suppr超能文献

通过读取颜色化的德布鲁因图进行宏基因组单核苷酸多态性(SNP)检测

Metagenome SNP calling via read-colored de Bruijn graphs.

作者信息

Alipanahi Bahar, Muggli Martin D, Jundi Musa, Noyes Noelle R, Boucher Christina

机构信息

Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA.

出版信息

Bioinformatics. 2021 Apr 1;36(22-23):5275-5281. doi: 10.1093/bioinformatics/btaa081.

Abstract

MOTIVATION

Metagenomics refers to the study of complex samples containing of genetic contents of multiple individual organisms and, thus, has been used to elucidate the microbiome and resistome of a complex sample. The microbiome refers to all microbial organisms in a sample, and the resistome refers to all of the antimicrobial resistance (AMR) genes in pathogenic and non-pathogenic bacteria. Single-nucleotide polymorphisms (SNPs) can be effectively used to 'fingerprint' specific organisms and genes within the microbiome and resistome and trace their movement across various samples. However, to effectively use these SNPs for this traceability, a scalable and accurate metagenomics SNP caller is needed. Moreover, such an SNP caller should not be reliant on reference genomes since 95% of microbial species is unculturable, making the determination of a reference genome extremely challenging. In this article, we address this need.

RESULTS

We present LueVari, a reference-free SNP caller based on the read-colored de Bruijn graph, an extension of the traditional de Bruijn graph that allows repeated regions longer than the k-mer length and shorter than the read length to be identified unambiguously. LueVari is able to identify SNPs in both AMR genes and chromosomal DNA from shotgun metagenomics data with reliable sensitivity (between 91% and 99%) and precision (between 71% and 99%) as the performance of competing methods varies widely. Furthermore, we show that LueVari constructs sequences containing the variation, which span up to 97.8% of genes in datasets, which can be helpful in detecting distinct AMR genes in large metagenomic datasets.

AVAILABILITY AND IMPLEMENTATION

Code and datasets are publicly available at https://github.com/baharpan/cosmo/tree/LueVari.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

宏基因组学是指对包含多个个体生物遗传内容的复杂样本进行研究,因此已被用于阐明复杂样本的微生物组和抗性组。微生物组是指样本中的所有微生物,而抗性组是指致病和非致病细菌中所有的抗微生物抗性(AMR)基因。单核苷酸多态性(SNP)可有效地用于对微生物组和抗性组内的特定生物和基因进行“指纹识别”,并追踪它们在各种样本中的移动。然而,为了有效地将这些SNP用于这种可追溯性,需要一个可扩展且准确的宏基因组学SNP调用程序。此外,这样的SNP调用程序不应依赖参考基因组,因为95%的微生物物种无法培养,这使得确定参考基因组极具挑战性。在本文中,我们满足了这一需求。

结果

我们提出了LueVari,这是一种基于读取着色德布鲁因图的无参考SNP调用程序,它是传统德布鲁因图的扩展,能够明确识别长度大于k-mer长度且小于读取长度的重复区域。LueVari能够从鸟枪法宏基因组学数据中识别AMR基因和染色体DNA中的SNP,其灵敏度(91%至99%)和精度(71%至99%)可靠,而竞争方法的性能差异很大。此外,我们表明LueVari构建了包含变异的序列,这些序列在数据集中跨越高达97.8%的基因,这有助于在大型宏基因组数据集中检测不同的AMR基因。

可用性和实现

代码和数据集可在https://github.com/baharpan/cosmo/tree/LueVari上公开获取。

补充信息

补充数据可在《生物信息学》在线版上获取。

相似文献

2
Building large updatable colored de Bruijn graphs via merging.通过合并构建大型可更新彩色 de Bruijn 图。
Bioinformatics. 2019 Jul 15;35(14):i51-i60. doi: 10.1093/bioinformatics/btz350.
5
OGRE: Overlap Graph-based metagenomic Read clustEring.OGRE:基于重叠图的宏基因组读聚类。
Bioinformatics. 2021 May 17;37(7):905-912. doi: 10.1093/bioinformatics/btaa760.
7
Succinct colored de Bruijn graphs.简明彩色 de Bruijn 图。
Bioinformatics. 2017 Oct 15;33(20):3181-3187. doi: 10.1093/bioinformatics/btx067.
9
Integrating long-range connectivity information into de Bruijn graphs.将长程连接信息整合到 de Bruijn 图中。
Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.

引用本文的文献

3
Graphite: painting genomes using a colored de Bruijn graph.Graphite:使用彩色德布鲁因图绘制基因组
NAR Genom Bioinform. 2024 Oct 23;6(4):lqae142. doi: 10.1093/nargab/lqae142. eCollection 2024 Sep.
5
Buffering updates enables efficient dynamic de Bruijn graphs.缓冲更新可实现高效的动态德布鲁因图。
Comput Struct Biotechnol J. 2021 Jul 6;19:4067-4078. doi: 10.1016/j.csbj.2021.06.047. eCollection 2021.
6
Sparse Binary Relation Representations for Genome Graph Annotation.用于基因组图注释的稀疏二元关系表示
J Comput Biol. 2020 Apr;27(4):626-639. doi: 10.1089/cmb.2019.0324. Epub 2019 Dec 20.
7
Relative Suffix Trees.
Comput J. 2018 May;61(5):773-788. doi: 10.1093/comjnl/bxx108. Epub 2017 Nov 21.

本文引用的文献

1
Integrating long-range connectivity information into de Bruijn graphs.将长程连接信息整合到 de Bruijn 图中。
Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.
2
Variant profiling of evolving prokaryotic populations.进化中的原核生物群体的变异分析
PeerJ. 2017 Feb 16;5:e2997. doi: 10.7717/peerj.2997. eCollection 2017.
3
Succinct colored de Bruijn graphs.简明彩色 de Bruijn 图。
Bioinformatics. 2017 Oct 15;33(20):3181-3187. doi: 10.1093/bioinformatics/btx067.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验