Suppr超能文献

CAMAMED:一种用于宏基因组数据的基于组成感知映射分析的流程。

CAMAMED: a pipeline for composition-aware mapping-based analysis of metagenomic data.

作者信息

Norouzi-Beirami Mohammad H, Marashi Sayed-Amir, Banaei-Moghaddam Ali M, Kavousi Kaveh

机构信息

Laboratory of Complex Biological systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran 1417614335, Iran.

Department of Biotechnology, College of Science, University of Tehran, Tehran 1417614411, Iran.

出版信息

NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa107. doi: 10.1093/nargab/lqaa107. eCollection 2021 Mar.

Abstract

Metagenomics is the study of genomic DNA recovered from a microbial community. Both assembly-based and mapping-based methods have been used to analyze metagenomic data. When appropriate gene catalogs are available, mapping-based methods are preferred over assembly based approaches, especially for analyzing the data at the functional level. In this study, we introduce CAMAMED as a composition-aware mapping-based metagenomic data analysis pipeline. This pipeline can analyze metagenomic samples at both taxonomic and functional profiling levels. Using this pipeline, metagenome sequences can be mapped to non-redundant gene catalogs and the gene frequency in the samples are obtained. Due to the highly compositional nature of metagenomic data, the cumulative sum-scaling method is used at both taxa and gene levels for compositional data analysis in our pipeline. Additionally, by mapping the genes to the KEGG database, annotations related to each gene can be extracted at different functional levels such as KEGG ortholog groups, enzyme commission numbers and reactions. Furthermore, the pipeline enables the user to identify potential biomarkers in case-control metagenomic samples by investigating functional differences. The source code for this software is available from https://github.com/mhnb/camamed. Also, the ready to use Docker images are available at https://hub.docker.com.

摘要

宏基因组学是对从微生物群落中回收的基因组DNA的研究。基于组装和基于映射的方法都已用于分析宏基因组数据。当有合适的基因目录可用时,基于映射的方法比基于组装的方法更受青睐,尤其是在功能水平上分析数据时。在本研究中,我们引入了CAMAMED作为一种基于组成感知映射的宏基因组数据分析管道。该管道可以在分类学和功能分析水平上分析宏基因组样本。使用该管道,可以将宏基因组序列映射到非冗余基因目录,并获得样本中的基因频率。由于宏基因组数据具有高度的组成性质,我们的管道在分类群和基因水平上都使用累积和缩放方法进行组成数据分析。此外,通过将基因映射到KEGG数据库,可以在不同的功能水平上提取与每个基因相关的注释,如KEGG直系同源组、酶委员会编号和反应。此外,该管道还能让用户通过研究功能差异来识别病例对照宏基因组样本中的潜在生物标志物。该软件的源代码可从https://github.com/mhnb/camamed获取。此外,可在https://hub.docker.com获得即用型Docker镜像。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa71/7787360/79fefb12eacc/lqaa107fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验