Ye Yuzhen, Tang Haixu
School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA.
Bioinformatics. 2016 Apr 1;32(7):1001-8. doi: 10.1093/bioinformatics/btv510. Epub 2015 Aug 29.
Metagenomics research has accelerated the studies of microbial organisms, providing insights into the composition and potential functionality of various microbial communities. Metatranscriptomics (studies of the transcripts from a mixture of microbial species) and other meta-omics approaches hold even greater promise for providing additional insights into functional and regulatory characteristics of the microbial communities. Current metatranscriptomics projects are often carried out without matched metagenomic datasets (of the same microbial communities). For the projects that produce both metatranscriptomic and metagenomic datasets, their analyses are often not integrated. Metagenome assemblies are far from perfect, partially explaining why metagenome assemblies are not used for the analysis of metatranscriptomic datasets.
Here, we report a reads mapping algorithm for mapping of short reads onto a de Bruijn graph of assemblies. A hash table of junction k-mers (k-mers spanning branching structures in the de Bruijn graph) is used to facilitate fast mapping of reads to the graph. We developed an application of this mapping algorithm: a reference-based approach to metatranscriptome assembly using graphs of metagenome assembly as the reference. Our results show that this new approach (called TAG) helps to assemble substantially more transcripts that otherwise would have been missed or truncated because of the fragmented nature of the reference metagenome.
TAG was implemented in C++ and has been tested extensively on the Linux platform. It is available for download as open source at http://omics.informatics.indiana.edu/TAG CONTACT: yye@indiana.edu.
宏基因组学研究加速了对微生物有机体的研究,为深入了解各种微生物群落的组成和潜在功能提供了线索。宏转录组学(对微生物物种混合物转录本的研究)和其他宏组学方法在深入了解微生物群落的功能和调控特征方面具有更大的潜力。当前的宏转录组学项目通常在没有匹配的宏基因组数据集(来自相同微生物群落)的情况下进行。对于同时产生宏转录组学和宏基因组学数据集的项目,其分析往往没有整合。宏基因组组装远非完美,这在一定程度上解释了为什么宏基因组组装未用于宏转录组数据集的分析。
在此,我们报告了一种将短读段映射到组装的德布鲁因图上的读段映射算法。使用连接k-mer(跨越德布鲁因图中分支结构的k-mer)哈希表来促进读段快速映射到图上。我们开发了这种映射算法的一个应用:一种基于参考的宏转录组组装方法,使用宏基因组组装图作为参考。我们的结果表明,这种新方法(称为TAG)有助于组装大量更多的转录本,否则由于参考宏基因组的碎片化性质,这些转录本可能会被遗漏或截断。
TAG用C++实现,并已在Linux平台上进行了广泛测试。可在http://omics.informatics.indiana.edu/TAG上作为开源软件下载。联系方式:yye@indiana.edu。