文献检索，用中文搜 PubMed

MOTIVATION

With the advance of new sequencing technologies producing massive short reads data, metagenomics is rapidly growing, especially in the fields of environmental biology and medical science. The metagenomic data are not only high dimensional with large number of features and limited number of samples but also complex with a large number of zeros and skewed distribution. Efficient computational and statistical tools are needed to deal with these unique characteristics of metagenomic sequencing data. In metagenomic studies, one main objective is to assess whether and how multiple microbial communities differ under various environmental conditions.

RESULTS

We propose a two-stage statistical procedure for selecting informative features and identifying differentially abundant features between two or more groups of microbial communities. In the functional analysis of metagenomes, the features may refer to the pathways, subsystems, functional roles and so on. In the first stage of the proposed procedure, the informative features are selected using elastic net as reducing the dimension of metagenomic data. In the second stage, the differentially abundant features are detected using generalized linear models with a negative binomial distribution. Compared with other available methods, the proposed approach demonstrates better performance for most of the comprehensive simulation studies. The new method is also applied to two real metagenomic datasets related to human health. Our findings are consistent with those in previous reports.

AVAILABILITY

R code and two example datasets are available at http://cals.arizona.edu/∼anling/software.htm.

SUPPLEMENTARY INFORMATION

Supplementary file is available at Bioinformatics online.

A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes.

作者信息

Pookhao Naruekamol, Sohn Michael B, Li Qike, Jenkins Isaac, Du Ruofei, Jiang Hongmei, An Lingling

机构信息

Department of Agricultural & Biosystems Engineering, Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and Department of Statistics, Northwestern University, Evanston, IL 60208, USA Department of Agricultural & Biosystems Engineering, Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 85721 and Department of Statistics, Northwestern University, Evanston, IL 60208, USA.

出版信息

Bioinformatics. 2015 Jan 15;31(2):158-65. doi: 10.1093/bioinformatics/btu635. Epub 2014 Sep 24.

动机

随着能够产生大量短读长数据的新测序技术的发展，宏基因组学正在迅速发展，尤其是在环境生物学和医学领域。宏基因组数据不仅具有高维度，特征数量众多且样本数量有限，而且还很复杂，存在大量零值和偏态分布。需要高效的计算和统计工具来处理宏基因组测序数据的这些独特特征。在宏基因组研究中，一个主要目标是评估多个微生物群落在各种环境条件下是否存在差异以及如何存在差异。

结果

我们提出了一种两阶段统计程序，用于选择信息特征并识别两组或多组微生物群落之间差异丰富的特征。在宏基因组的功能分析中，特征可能指途径、子系统、功能角色等。在所提出程序的第一阶段，使用弹性网络选择信息特征以降低宏基因组数据的维度。在第二阶段，使用具有负二项分布的广义线性模型检测差异丰富的特征。与其他现有方法相比，所提出的方法在大多数综合模拟研究中表现出更好的性能。该新方法还应用于两个与人类健康相关的真实宏基因组数据集。我们的发现与先前报告中的发现一致。

可用性

R代码和两个示例数据集可在http://cals.arizona.edu/∼anling/software.htm获得。

补充信息

补充文件可在《生物信息学》在线版获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种用于宏基因组功能分析中特征选择与比较的两阶段统计程序。

A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

一种用于宏基因组功能分析中特征选择与比较的两阶段统计程序。

A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息