Suppr超能文献

用于宏基因组学的 Amordad 数据库引擎。

The Amordad database engine for metagenomics.

机构信息

Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.

出版信息

Bioinformatics. 2014 Oct 15;30(20):2949-55. doi: 10.1093/bioinformatics/btu405. Epub 2014 Jun 27.

Abstract

MOTIVATION

Several technical challenges in metagenomic data analysis, including assembling metagenomic sequence data or identifying operational taxonomic units, are both significant and well known. These forms of analysis are increasingly cited as conceptually flawed, given the extreme variation within traditionally defined species and rampant horizontal gene transfer. Furthermore, computational requirements of such analysis have hindered content-based organization of metagenomic data at large scale.

RESULTS

In this article, we introduce the Amordad database engine for alignment-free, content-based indexing of metagenomic datasets. Amordad places the metagenome comparison problem in a geometric context, and uses an indexing strategy that combines random hashing with a regular nearest neighbor graph. This framework allows refinement of the database over time by continual application of random hash functions, with the effect of each hash function encoded in the nearest neighbor graph. This eliminates the need to explicitly maintain the hash functions in order for query efficiency to benefit from the accumulated randomness. Results on real and simulated data show that Amordad can support logarithmic query time for identifying similar metagenomes even as the database size reaches into the millions.

AVAILABILITY AND IMPLEMENTATION

Source code, licensed under the GNU general public license (version 3) is freely available for download from http://smithlabresearch.org/amordad

CONTACT

andrewds@usc.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

宏基因组数据分析存在一些技术挑战,包括组装宏基因组序列数据或识别操作分类单元,这些挑战都很重要且广为人知。鉴于传统定义的物种内存在极端变异和猖獗的水平基因转移,这些形式的分析被认为在概念上存在缺陷。此外,此类分析的计算要求阻碍了大规模基于内容的宏基因组数据分析。

结果

在本文中,我们介绍了 Amordad 数据库引擎,用于无比对、基于内容的宏基因组数据集索引。Amordad 将宏基因组比较问题置于几何环境中,并使用一种索引策略,该策略将随机哈希与正则最近邻图相结合。该框架允许通过持续应用随机哈希函数来随时细化数据库,每个哈希函数的效果都编码在最近邻图中。这消除了为了从累积的随机性中受益而需要显式维护哈希函数的需求。在真实和模拟数据上的结果表明,即使数据库大小达到数百万,Amordad 也可以支持对数查询时间来识别相似的宏基因组。

可用性和实现

根据 GNU 通用公共许可证(版本 3)获得许可的源代码可从 http://smithlabresearch.org/amordad 免费下载。

联系人

andrewds@usc.edu

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
The Amordad database engine for metagenomics.用于宏基因组学的 Amordad 数据库引擎。
Bioinformatics. 2014 Oct 15;30(20):2949-55. doi: 10.1093/bioinformatics/btu405. Epub 2014 Jun 27.
2
COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.认知器:宏基因组数据集功能注释框架
PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.
5
Metagenomic binning through low-density hashing.基于低密度哈希的宏基因组 bin 划分。
Bioinformatics. 2019 Jan 15;35(2):219-226. doi: 10.1093/bioinformatics/bty611.
6
Identifying biologically relevant differences between metagenomic communities.鉴定宏基因组群落间具有生物学意义的差异。
Bioinformatics. 2010 Mar 15;26(6):715-21. doi: 10.1093/bioinformatics/btq041. Epub 2010 Feb 3.
7
Bambus 2: scaffolding metagenomes.Bambus 2:支架宏基因组。
Bioinformatics. 2011 Nov 1;27(21):2964-71. doi: 10.1093/bioinformatics/btr520. Epub 2011 Sep 16.
9
MOCAT2: a metagenomic assembly, annotation and profiling framework.MOCAT2:一种宏基因组组装、注释和分析框架。
Bioinformatics. 2016 Aug 15;32(16):2520-3. doi: 10.1093/bioinformatics/btw183. Epub 2016 Apr 8.

引用本文的文献

1
Alignment-Free Sequence Analysis and Applications.无比对序列分析及其应用
Annu Rev Biomed Data Sci. 2018 Jul;1:93-114. doi: 10.1146/annurev-biodatasci-080917-013431. Epub 2018 Apr 25.
3
Web Resources for Metagenomics Studies.宏基因组学研究的网络资源
Genomics Proteomics Bioinformatics. 2015 Oct;13(5):296-303. doi: 10.1016/j.gpb.2015.10.003. Epub 2015 Nov 18.

本文引用的文献

6
Next-generation phylogenomics.下一代系统发生基因组学。
Biol Direct. 2013 Jan 22;8:3. doi: 10.1186/1745-6150-8-3.
8
Integrative analysis of environmental sequences using MEGAN4.使用 MEGAN4 进行环境序列的综合分析。
Genome Res. 2011 Sep;21(9):1552-60. doi: 10.1101/gr.120618.111. Epub 2011 Jun 20.
10
Enterotypes of the human gut microbiome.人类肠道微生物组的肠型。
Nature. 2011 May 12;473(7346):174-80. doi: 10.1038/nature09944. Epub 2011 Apr 20.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验