Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, USA.
PLoS Comput Biol. 2011 Dec;7(12):e1002228. doi: 10.1371/journal.pcbi.1002228. Epub 2011 Dec 1.
The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures - results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation.
大量相关生物体的大规模、高通量数据集的日益丰富为通过同时对来自多个物种的数据集进行双聚类进行比较分析提供了机会。这些分析需要重新制定如何组织多物种数据集和可视化比较基因组学数据分析结果的方法。最近,我们开发了一种方法,多物种 cMonkey,它可以整合来自多个物种的异构高通量数据类型,以识别保守的调控模块。在这里,我们展示了一个基于 Gaggle 的集成数据可视化系统,用于探索我们方法的结果(可在 http://meatwad.bio.nyu.edu/cmmr.html 上获得)。该系统还可用于探索其他比较基因组学数据集和来自其他数据分析过程的输出 - 来自其他多物种聚类程序或不同单物种数据集的独立聚类的结果。我们提供了一个使用我们的系统的示例,用于两种细菌,大肠杆菌和鼠伤寒沙门氏菌。我们通过探索涉及氮代谢的保守双聚类来说明我们系统的使用,揭示了 yjjI 的一个潜在功能,yjjI 是一个目前尚未表征的基因,我们预测它参与氮同化。