Su Xiaoquan, Hu Jianqiang, Huang Shi, Ning Kang
Shandong Key Laboratory of Energy Genetics, CAS Key Laboratory of Biofuels and BioEnergy Genome Center, Computational Biology Group of Single Cell Center, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences. Qingdao 266101, People's Republic of China.
1] Shandong Key Laboratory of Energy Genetics, CAS Key Laboratory of Biofuels and BioEnergy Genome Center, Computational Biology Group of Single Cell Center, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences. Qingdao 266101, People's Republic of China [2] University of Chinese Academy of Sciences, Beijing 100049, China.
Sci Rep. 2014 Sep 17;4:6393. doi: 10.1038/srep06393.
The research in microbial communities would potentially impact a vast number of applications in "bio"-related disciplines. Large-scale analyses became a clear trend in microbial community studies, thus it is increasingly important to perform efficient and in-depth data mining for insightful biological principles from large number of samples. However, as microbial communities are from different sources and of different structures, comparison and data-mining from large number of samples become quite difficult. In this work, we have proposed a data model to represent large-scale comparison of microbial community samples, namely the "Multi-Dimensional View" data model (the MDV model) that should at least include 3 aspects: samples profile (S), taxa profile (T) and meta-data profile (V). We have also proposed a method for rapid data analysis based on the MDV model and applied it on the case studies with samples from various environmental conditions. Results have shown that though sampling environments usually define key variables, the analysis could detect bio-makers and even subtle variables based on large number of samples, which might be used to discover novel principles that drive the development of communities. The efficiency and effectiveness of data analysis method based on the MDV model have been validated by the results.
微生物群落研究可能会对大量“生物”相关学科的应用产生潜在影响。大规模分析已成为微生物群落研究的一个明显趋势,因此,从大量样本中进行高效且深入的数据挖掘以洞察生物学原理变得越来越重要。然而,由于微生物群落来源不同且结构各异,对大量样本进行比较和数据挖掘变得相当困难。在这项工作中,我们提出了一种数据模型来表示微生物群落样本的大规模比较,即“多维视图”数据模型(MDV模型),该模型至少应包括三个方面:样本概况(S)、分类单元概况(T)和元数据概况(V)。我们还提出了一种基于MDV模型的快速数据分析方法,并将其应用于来自各种环境条件样本的案例研究。结果表明,尽管采样环境通常定义关键变量,但该分析可以基于大量样本检测生物标志物甚至细微变量,这些变量可能用于发现驱动群落发展的新原理。基于MDV模型的数据分析方法的效率和有效性已通过结果得到验证。