Culman S W, Gauch H G, Blackwood C B, Thies J E
Department of Crop and Soil Sciences, Cornell University, Ithaca, NY, United States.
J Microbiol Methods. 2008 Sep;75(1):55-63. doi: 10.1016/j.mimet.2008.04.011. Epub 2008 May 16.
The analysis of T-RFLP data has developed considerably over the last decade, but there remains a lack of consensus about which statistical analyses offer the best means for finding trends in these data. In this study, we empirically tested and theoretically compared ten diverse T-RFLP datasets derived from soil microbial communities using the more common ordination methods in the literature: principal component analysis (PCA), nonmetric multidimensional scaling (NMS) with Sørensen, Jaccard and Euclidean distance measures, correspondence analysis (CA), detrended correspondence analysis (DCA) and a technique new to T-RFLP data analysis, the Additive Main Effects and Multiplicative Interaction (AMMI) model. Our objectives were i) to determine the distribution of variation in T-RFLP datasets using analysis of variance (ANOVA), ii) to determine the more robust and informative multivariate ordination methods for analyzing T-RFLP data, and iii) to compare the methods based on theoretical considerations. For the 10 datasets examined in this study, ANOVA revealed that the variation from Environment main effects was always small, variation from T-RFs main effects was large, and variation from T-RFxEnvironment (TxE) interactions was intermediate. Larger variation due to TxE indicated larger differences in microbial communities between environments/treatments and thus demonstrated the utility of ANOVA to provide an objective assessment of community dissimilarity. The comparison of statistical methods typically yielded similar empirical results. AMMI, T-RF-centered PCA, and DCA were the most robust methods in terms of producing ordinations that consistently reached a consensus with other methods. In datasets with high sample heterogeneity, NMS analyses with Sørensen and Jaccard distance were the most sensitive for recovery of complex gradients. The theoretical comparison showed that some methods hold distinct advantages for T-RFLP analysis, such as estimations of variation captured, realistic or minimal assumptions about the data, reduced weight placed on rare T-RFs, and uniqueness of solutions. Our results lead us to recommend that method selection be guided by T-RFLP dataset complexity and the outlined theoretical criteria. Finally, we recommend using binary or relativized peak height data with soil-based T-RFLP data for ordination-based exploratory microbial analyses.
在过去十年中,T-RFLP数据的分析有了很大发展,但对于哪种统计分析方法能为发现这些数据中的趋势提供最佳手段,仍缺乏共识。在本研究中,我们使用文献中更常见的排序方法,对来自土壤微生物群落的十个不同的T-RFLP数据集进行了实证测试和理论比较:主成分分析(PCA)、使用 Sørensen、Jaccard 和欧几里得距离度量的非度量多维尺度分析(NMS)、对应分析(CA)、去趋势对应分析(DCA)以及一种T-RFLP数据分析中的新技术——加性主效应和乘性交互作用(AMMI)模型。我们的目标是:i)使用方差分析(ANOVA)确定T-RFLP数据集中变异的分布;ii)确定用于分析T-RFLP数据的更稳健且信息丰富的多元排序方法;iii)基于理论考虑比较这些方法。对于本研究中检测的10个数据集,方差分析表明,环境主效应引起的变异始终较小,T-RFs主效应引起的变异较大,而T-RF×环境(T×E)交互作用引起的变异处于中间水平。由于T×E导致的较大变异表明不同环境/处理之间微生物群落的差异较大,从而证明了方差分析在客观评估群落差异方面的实用性。统计方法的比较通常会得出相似的实证结果。就产生的排序结果与其他方法始终达成一致而言,AMMI、以T-RF为中心的PCA和DCA是最稳健的方法。在样本异质性高的数据集中,使用Sørensen和Jaccard距离的NMS分析对于恢复复杂梯度最为敏感。理论比较表明,一些方法在T-RFLP分析中具有明显优势,例如捕获变异的估计、对数据现实或最小化的假设、对稀有T-RFs的权重降低以及解的唯一性。我们的结果使我们建议,方法的选择应以T-RFLP数据集的复杂性和概述的理论标准为指导。最后,我们建议在基于排序的探索性微生物分析中,将二元或相对化的峰高数据与基于土壤的T-RFLP数据一起使用。