Xia Li C, Steele Joshua A, Cram Jacob A, Cardon Zoe G, Simmons Sheri L, Vallino Joseph J, Fuhrman Jed A, Sun Fengzhu
Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089-2910, USA.
BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S15. doi: 10.1186/1752-0509-5-S2-S15. Epub 2011 Dec 14.
The increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval.
We extended our LSA technique to time series data with replicates and termed it extended LSA, or eLSA. Simulations showed the capability of eLSA to capture subinterval and time-delayed associations. We implemented the eLSA technique into an easy-to-use analytic software package. The software pipeline integrates data normalization, statistical correlation calculation, statistical significance evaluation, and association network construction steps. We applied the eLSA technique to microbial community and gene expression datasets, where unique time-dependent associations were identified.
The extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis. These statistically significant associations can provide insights to the real dynamics of biological systems. The newly designed eLSA software efficiently streamlines the analysis and is freely available from the eLSA homepage, which can be accessed at http://meta.usc.edu/softs/lsa.
宏基因组学和其他分子生物学研究中时间序列微生物群落数据的可得性不断增加,使得大规模微生物共现和关联网络的分析成为可能。在众多可用的分析技术中,局部相似性分析(LSA)方法独具特色,因为它能捕捉时间序列数据中局部且可能存在时间延迟的共现和关联模式,而这些模式无法通过普通相关性分析识别。然而,最初开发的LSA方法并未考虑带有重复样本的时间序列数据,这阻碍了对可用信息的充分利用。有了重复样本,就有可能了解局部相似性(LS)分数的变异性并获得其置信区间。
我们将LSA技术扩展到带有重复样本的时间序列数据,并将其称为扩展LSA,即eLSA。模拟结果显示了eLSA捕捉子区间和时间延迟关联的能力。我们将eLSA技术实现为一个易于使用的分析软件包。该软件流程整合了数据归一化、统计相关性计算、统计显著性评估和关联网络构建步骤。我们将eLSA技术应用于微生物群落和基因表达数据集,识别出了独特的时间依赖性关联。
扩展LSA分析技术被证明能够在重复时间序列数据中揭示出比普通相关性分析更具统计学意义的局部且可能存在时间延迟的关联模式。这些具有统计学意义的关联能够为生物系统的实际动态提供见解。新设计的eLSA软件有效地简化了分析过程,可从eLSA主页免费获取,网址为http://meta.usc.edu/softs/lsa。