Sohn Michael B, Du Ruofei, An Lingling
Interdisciplinary Program in Statistics and.
Department of Agricultural and Biosystems Engineering, University of Arizona, Tucson, AZ 85721, USA.
Bioinformatics. 2015 Jul 15;31(14):2269-75. doi: 10.1093/bioinformatics/btv165. Epub 2015 Mar 19.
The analysis of differential abundance for features (e.g. species or genes) can provide us with a better understanding of microbial communities, thus increasing our comprehension and understanding of the behaviors of microbial communities. However, it could also mislead us about the characteristics of microbial communities if the abundances or counts of features on different scales are not properly normalized within and between communities, prior to the analysis of differential abundance. Normalization methods used in the differential analysis typically try to adjust counts on different scales to a common scale using the total sum, mean or median of representative features across all samples. These methods often yield undesirable results when the difference in total counts of differentially abundant features (DAFs) across different conditions is large.
We develop a novel method, Ratio Approach for Identifying Differential Abundance (RAIDA), which utilizes the ratio between features in a modified zero-inflated lognormal model. RAIDA removes possible problems associated with counts on different scales within and between conditions. As a result, its performance is not affected by the amount of difference in total abundances of DAFs across different conditions. Through comprehensive simulation studies, the performance of our method is consistently powerful, and under some situations, RAIDA greatly surpasses other existing methods. We also apply RAIDA on real datasets of type II diabetes and find interesting results consistent with previous reports.
An R package for RAIDA can be accessed from http://cals.arizona.edu/%7Eanling/sbg/software.htm.
对特征(例如物种或基因)的差异丰度进行分析,能够让我们更好地理解微生物群落,从而加深我们对微生物群落行为的认识和理解。然而,如果在分析差异丰度之前,不同尺度上特征的丰度或计数在群落内部和群落之间没有得到适当的标准化,那么这也可能会误导我们对微生物群落特征的认识。差异分析中使用的标准化方法通常试图使用所有样本中代表性特征的总和、均值或中位数,将不同尺度上的计数调整到一个共同的尺度。当不同条件下差异丰富特征(DAF)的总数差异很大时,这些方法往往会产生不理想的结果。
我们开发了一种新方法,即识别差异丰度的比率方法(RAIDA),它在修正的零膨胀对数正态模型中利用特征之间的比率。RAIDA消除了与不同条件下和不同条件之间不同尺度计数相关的潜在问题。因此,其性能不受不同条件下DAF总丰度差异量的影响。通过全面的模拟研究,我们方法的性能始终很强,并且在某些情况下,RAIDA大大超过了其他现有方法。我们还将RAIDA应用于II型糖尿病的真实数据集,并发现了与先前报告一致的有趣结果。
可以从http://cals.arizona.edu/%7Eanling/sbg/software.htm访问RAIDA的R包。