一种针对时间序列宏基因组测序数据的差异丰度分析的信息性方法。

An informative approach on differential abundance analysis for time-course metagenomic sequencing data.

作者信息

Luo Dan, Ziebell Sara, An Lingling

机构信息

Department of Epidemiology and Biostatistics, College of Public Health.

Interdisciplinary Program in Statistics.

出版信息

Bioinformatics. 2017 May 1;33(9):1286-1292. doi: 10.1093/bioinformatics/btw828.

DOI:10.1093/bioinformatics/btw828

PMID:28057680

Abstract

MOTIVATION

The advent of high-throughput next generation sequencing technology has greatly promoted the field of metagenomics where previously unattainable information about microbial communities can be discovered. Detecting differentially abundant features (e.g. species or genes) plays a critical role in revealing the contributors (i.e. pathogens) to the biological or medical status of microbial samples. However, currently available statistical methods lack power in detecting differentially abundant features contrasting different biological or medical conditions, in particular, for time series metagenomic sequencing data. We have proposed a novel procedure, metaDprof, which is built upon a spline-based method assuming heterogeneous error, to meet the challenges of detecting differentially abundant features from metagenomic samples by comparing different biological/medical conditions across time. It contains two stages: (i) global detection on features and (ii) time interval detection for significant features. The detection procedures in both stages are based on sound statistical support.

RESULTS

Compared with existing methods the new method metaDprof shows the best performance in comprehensive simulation studies. Not only can it accurately detect features relating to the biological condition or disease status of samples but it also can accurately detect the starting and ending time points when the differences arise. The proposed method is also applied to a real metagenomic dataset and the results provide an interesting angle to understand the relationship between the microbiota in mouse gut and diet type.

AVAILABILITY AND IMPLEMENTATION

R code and an example dataset are available at https://cals.arizona.edu/∼anling/sbg/software.htm.

CONTACT

anling@email.arizona.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量下一代测序技术的出现极大地推动了宏基因组学领域的发展，在该领域中，可以发现以前无法获得的有关微生物群落的信息。检测差异丰富的特征（例如物种或基因）在揭示微生物样本的生物学或医学状态的促成因素（即病原体）方面起着关键作用。然而，目前可用的统计方法在检测不同生物学或医学条件下差异丰富的特征时缺乏效力，特别是对于时间序列宏基因组测序数据。我们提出了一种新的程序metaDprof，它基于一种假设误差异质性的样条法构建，以应对通过比较不同时间的生物学/医学条件从宏基因组样本中检测差异丰富特征的挑战。它包括两个阶段：（i）对特征的全局检测和（ii）对显著特征的时间间隔检测。两个阶段的检测程序都有可靠的统计支持。