Suppr超能文献

一种用于宏基因组功能分析中特征选择与比较的两阶段统计程序。

A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes.

作者信息

Pookhao Naruekamol, Sohn Michael B, Li Qike, Jenkins Isaac, Du Ruofei, Jiang Hongmei, An Lingling

机构信息

Department of Agricultural & Biosystems Engineering, Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and Department of Statistics, Northwestern University, Evanston, IL 60208, USA.

Department of Agricultural & Biosystems Engineering, Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and Department of Statistics, Northwestern University, Evanston, IL 60208, USA Department of Agricultural & Biosystems Engineering, Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and Department of Statistics, Northwestern University, Evanston, IL 60208, USA.

出版信息

Bioinformatics. 2015 Jan 15;31(2):158-65. doi: 10.1093/bioinformatics/btu635. Epub 2014 Sep 24.

Abstract

MOTIVATION

With the advance of new sequencing technologies producing massive short reads data, metagenomics is rapidly growing, especially in the fields of environmental biology and medical science. The metagenomic data are not only high dimensional with large number of features and limited number of samples but also complex with a large number of zeros and skewed distribution. Efficient computational and statistical tools are needed to deal with these unique characteristics of metagenomic sequencing data. In metagenomic studies, one main objective is to assess whether and how multiple microbial communities differ under various environmental conditions.

RESULTS

We propose a two-stage statistical procedure for selecting informative features and identifying differentially abundant features between two or more groups of microbial communities. In the functional analysis of metagenomes, the features may refer to the pathways, subsystems, functional roles and so on. In the first stage of the proposed procedure, the informative features are selected using elastic net as reducing the dimension of metagenomic data. In the second stage, the differentially abundant features are detected using generalized linear models with a negative binomial distribution. Compared with other available methods, the proposed approach demonstrates better performance for most of the comprehensive simulation studies. The new method is also applied to two real metagenomic datasets related to human health. Our findings are consistent with those in previous reports.

AVAILABILITY

R code and two example datasets are available at http://cals.arizona.edu/∼anling/software.htm.

SUPPLEMENTARY INFORMATION

Supplementary file is available at Bioinformatics online.

摘要

动机

随着能够产生大量短读长数据的新测序技术的发展,宏基因组学正在迅速发展,尤其是在环境生物学和医学领域。宏基因组数据不仅具有高维度,特征数量众多且样本数量有限,而且还很复杂,存在大量零值和偏态分布。需要高效的计算和统计工具来处理宏基因组测序数据的这些独特特征。在宏基因组研究中,一个主要目标是评估多个微生物群落在各种环境条件下是否存在差异以及如何存在差异。

结果

我们提出了一种两阶段统计程序,用于选择信息特征并识别两组或多组微生物群落之间差异丰富的特征。在宏基因组的功能分析中,特征可能指途径、子系统、功能角色等。在所提出程序的第一阶段,使用弹性网络选择信息特征以降低宏基因组数据的维度。在第二阶段,使用具有负二项分布的广义线性模型检测差异丰富的特征。与其他现有方法相比,所提出的方法在大多数综合模拟研究中表现出更好的性能。该新方法还应用于两个与人类健康相关的真实宏基因组数据集。我们的发现与先前报告中的发现一致。

可用性

R代码和两个示例数据集可在http://cals.arizona.edu/∼anling/software.htm获得。

补充信息

补充文件可在《生物信息学》在线版获得。

相似文献

3
Identifying biologically relevant differences between metagenomic communities.鉴定宏基因组群落间具有生物学意义的差异。
Bioinformatics. 2010 Mar 15;26(6):715-21. doi: 10.1093/bioinformatics/btq041. Epub 2010 Feb 3.
6
Statistical approach of functional profiling for a microbial community.微生物群落功能谱分析的统计方法。
PLoS One. 2014 Sep 8;9(9):e106588. doi: 10.1371/journal.pone.0106588. eCollection 2014.

引用本文的文献

本文引用的文献

2
Differential abundance analysis for microbial marker-gene surveys.微生物标记基因调查的差异丰度分析。
Nat Methods. 2013 Dec;10(12):1200-2. doi: 10.1038/nmeth.2658. Epub 2013 Sep 29.
5
Inflammatory bowel disease in the obese patient.肥胖患者的炎症性肠病。
Clin Colon Rectal Surg. 2011 Dec;24(4):244-52. doi: 10.1055/s-0031-1295687.
6
Integrative analysis of environmental sequences using MEGAN4.使用 MEGAN4 进行环境序列的综合分析。
Genome Res. 2011 Sep;21(9):1552-60. doi: 10.1101/gr.120618.111. Epub 2011 Jun 20.
8
Differential expression analysis for sequence count data.差异表达分析序列计数数据。
Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验