Suppr超能文献

基于系统发育的微生物群落分类。

Phylogeny-based classification of microbial communities.

机构信息

Department of Computer Science and Engineering, Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521 USA and School of Information Science and Technology, Tsinghua University, Beijing 100084, China.

出版信息

Bioinformatics. 2014 Feb 15;30(4):449-56. doi: 10.1093/bioinformatics/btt700. Epub 2013 Dec 24.

Abstract

MOTIVATION

Next-generation sequencing coupled with metagenomics has led to the rapid growth of sequence databases and enabled a new branch of microbiology called comparative metagenomics. Comparative metagenomic analysis studies compositional patterns within and between different environments providing a deep insight into the structure and function of complex microbial communities. It is a fast growing field that requires the development of novel supervised learning techniques for addressing challenges associated with metagenomic data, e.g. sensitivity to the choice of sequence similarity cutoff used to define operational taxonomic units (OTUs), high dimensionality and sparsity of the data and so forth. On the other hand, the natural properties of microbial community data may provide useful information about the structure of the data. For example, similarity between species encoded by a phylogenetic tree captures the relationship between OTUs and may be useful for the analysis of complex microbial datasets where the diversity patterns comprise features at multiple taxonomic levels. Even though some of the challenges have been addressed by learning algorithms in the literature, none of the available methods take advantage of the inherent properties of metagenomic data.

RESULTS

We proposed a novel supervised classification method for microbial community samples, where each sample is represented as a set of OTU frequencies, which takes advantage of the natural structure in microbial community data encoded by a phylogenetic tree. This model allows us to take advantage of environment-specific compositional patterns that may contain features at multiple granularity levels. Our method is based on the multinomial logistic regression model with a tree-guided penalty function. Additionally, we proposed a new simulation framework for generating 16S ribosomal RNA gene read counts that may be useful in comparative metagenomics research. Our experimental results on simulated and real data show that the phylogenetic information used in our method improves the classification accuracy.

AVAILABILITY AND IMPLEMENTATION

http://www.cs.ucr.edu/~tanaseio/metaphyl.htm.

摘要

动机

下一代测序技术与宏基因组学的结合,导致了序列数据库的快速增长,并催生了一个名为比较宏基因组学的微生物学新分支。比较宏基因组分析研究了不同环境中微生物群落的组成模式,深入了解了复杂微生物群落的结构和功能。这是一个快速发展的领域,需要开发新的监督学习技术来解决与宏基因组数据相关的挑战,例如对用于定义操作分类单元 (OTUs) 的序列相似性截止值的选择敏感、数据的高维性和稀疏性等。另一方面,微生物群落数据的自然属性可能提供有关数据结构的有用信息。例如,系统发育树编码的物种之间的相似性捕捉了 OTUs 之间的关系,并且可能对分析具有多个分类水平特征的复杂微生物数据集有用。尽管文献中的学习算法已经解决了一些挑战,但现有的方法都没有利用宏基因组数据的固有特性。

结果

我们提出了一种新的微生物群落样本监督分类方法,其中每个样本表示为一组 OTU 频率,该方法利用了系统发育树编码的微生物群落数据的自然结构。该模型允许我们利用特定环境的组成模式,这些模式可能包含多个粒度级别的特征。我们的方法基于具有树引导惩罚函数的多项逻辑回归模型。此外,我们提出了一种新的 16S 核糖体 RNA 基因读取计数模拟框架,该框架可能对比较宏基因组学研究有用。我们在模拟和真实数据上的实验结果表明,我们方法中使用的系统发育信息可以提高分类准确性。

可用性和实现

http://www.cs.ucr.edu/~tanaseio/metaphyl.htm。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验