Luo Dan, Ziebell Sara, An Lingling
Department of Epidemiology and Biostatistics, College of Public Health.
Interdisciplinary Program in Statistics.
Bioinformatics. 2017 May 1;33(9):1286-1292. doi: 10.1093/bioinformatics/btw828.
The advent of high-throughput next generation sequencing technology has greatly promoted the field of metagenomics where previously unattainable information about microbial communities can be discovered. Detecting differentially abundant features (e.g. species or genes) plays a critical role in revealing the contributors (i.e. pathogens) to the biological or medical status of microbial samples. However, currently available statistical methods lack power in detecting differentially abundant features contrasting different biological or medical conditions, in particular, for time series metagenomic sequencing data. We have proposed a novel procedure, metaDprof, which is built upon a spline-based method assuming heterogeneous error, to meet the challenges of detecting differentially abundant features from metagenomic samples by comparing different biological/medical conditions across time. It contains two stages: (i) global detection on features and (ii) time interval detection for significant features. The detection procedures in both stages are based on sound statistical support.
Compared with existing methods the new method metaDprof shows the best performance in comprehensive simulation studies. Not only can it accurately detect features relating to the biological condition or disease status of samples but it also can accurately detect the starting and ending time points when the differences arise. The proposed method is also applied to a real metagenomic dataset and the results provide an interesting angle to understand the relationship between the microbiota in mouse gut and diet type.
R code and an example dataset are available at https://cals.arizona.edu/∼anling/sbg/software.htm.
Supplementary data are available at Bioinformatics online.
高通量下一代测序技术的出现极大地推动了宏基因组学领域的发展,在该领域中,可以发现以前无法获得的有关微生物群落的信息。检测差异丰富的特征(例如物种或基因)在揭示微生物样本的生物学或医学状态的促成因素(即病原体)方面起着关键作用。然而,目前可用的统计方法在检测不同生物学或医学条件下差异丰富的特征时缺乏效力,特别是对于时间序列宏基因组测序数据。我们提出了一种新的程序metaDprof,它基于一种假设误差异质性的样条法构建,以应对通过比较不同时间的生物学/医学条件从宏基因组样本中检测差异丰富特征的挑战。它包括两个阶段:(i)对特征的全局检测和(ii)对显著特征的时间间隔检测。两个阶段的检测程序都有可靠的统计支持。
与现有方法相比,新方法metaDprof在综合模拟研究中表现出最佳性能。它不仅可以准确检测与样本的生物学状况或疾病状态相关的特征,还可以准确检测差异出现的起始和结束时间点。所提出的方法还应用于一个真实的宏基因组数据集,结果为理解小鼠肠道微生物群与饮食类型之间的关系提供了一个有趣的视角。
R代码和一个示例数据集可在https://cals.arizona.edu/∼anling/sbg/software.htm获取。
补充数据可在《生物信息学》在线获取。