Suppr超能文献

EDISA:从多个基因表达谱时间序列中提取双聚类

EDISA: extracting biclusters from multiple time-series of gene expression profiles.

作者信息

Supper Jochen, Strauch Martin, Wanke Dierk, Harter Klaus, Zell Andreas

机构信息

Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany.

出版信息

BMC Bioinformatics. 2007 Sep 12;8:334. doi: 10.1186/1471-2105-8-334.

Abstract

BACKGROUND

Cells dynamically adapt their gene expression patterns in response to various stimuli. This response is orchestrated into a number of gene expression modules consisting of co-regulated genes. A growing pool of publicly available microarray datasets allows the identification of modules by monitoring expression changes over time. These time-series datasets can be searched for gene expression modules by one of the many clustering methods published to date. For an integrative analysis, several time-series datasets can be joined into a three-dimensional gene-condition-time dataset, to which standard clustering or biclustering methods are, however, not applicable. We thus devise a probabilistic clustering algorithm for gene-condition-time datasets.

RESULTS

In this work, we present the EDISA (Extended Dimension Iterative Signature Algorithm), a novel probabilistic clustering approach for 3D gene-condition-time datasets. Based on mathematical definitions of gene expression modules, the EDISA samples initial modules from the dataset which are then refined by removing genes and conditions until they comply with the module definition. A subsequent extension step ensures gene and condition maximality. We applied the algorithm to a synthetic dataset and were able to successfully recover the implanted modules over a range of background noise intensities. Analysis of microarray datasets has lead us to define three biologically relevant module types: 1) We found modules with independent response profiles to be the most prevalent ones. These modules comprise genes which are co-regulated under several conditions, yet with a different response pattern under each condition. 2) Coherent modules with similar responses under all conditions occurred frequently, too, and were often contained within these modules. 3) A third module type, which covers a response specific to a single condition was also detected, but rarely. All of these modules are essentially different types of biclusters.

CONCLUSION

We successfully applied the EDISA to different 3D datasets. While previous studies were mostly aimed at detecting coherent modules only, our results show that coherent responses are often part of a more general module type with independent response profiles under different conditions. Our approach thus allows for a more comprehensive view of the gene expression response. After subsequent analysis of the resulting modules, the EDISA helped to shed light on the global organization of transcriptional control. An implementation of the algorithm is available at http://www-ra.informatik.uni-tuebingen.de/software/IAGEN/.

摘要

背景

细胞会根据各种刺激动态调整其基因表达模式。这种反应被编排成由共同调控基因组成的多个基因表达模块。越来越多公开可用的微阵列数据集使得通过监测基因表达随时间的变化来识别模块成为可能。可以使用迄今为止发表的众多聚类方法之一在这些时间序列数据集中搜索基因表达模块。为了进行综合分析,可以将多个时间序列数据集合并为一个三维基因 - 条件 - 时间数据集,然而标准的聚类或双聚类方法并不适用于此。因此,我们设计了一种用于基因 - 条件 - 时间数据集的概率聚类算法。

结果

在这项工作中,我们提出了EDISA(扩展维度迭代签名算法),这是一种针对三维基因 - 条件 - 时间数据集的新型概率聚类方法。基于基因表达模块的数学定义,EDISA从数据集中对初始模块进行采样,然后通过去除基因和条件对其进行优化,直到它们符合模块定义。随后的扩展步骤确保基因和条件的最大化。我们将该算法应用于一个合成数据集,并能够在一系列背景噪声强度下成功恢复植入的模块。对微阵列数据集的分析使我们定义了三种生物学上相关的模块类型:1)我们发现具有独立反应谱的模块最为普遍。这些模块包含在多种条件下共同调控的基因,但在每种条件下具有不同的反应模式。2)在所有条件下具有相似反应的连贯模块也经常出现,并且通常包含在这些模块中。3)还检测到了第三种模块类型,其涵盖特定于单个条件的反应,但很少见。所有这些模块本质上都是不同类型的双聚类。

结论

我们成功地将EDISA应用于不同的三维数据集。虽然先前的研究大多仅旨在检测连贯模块,但我们的结果表明,连贯反应通常是在不同条件下具有独立反应谱的更一般模块类型的一部分。因此,我们的方法能够对基因表达反应有更全面的了解。在对所得模块进行后续分析后,EDISA有助于揭示转录调控的全局组织。该算法的实现可在http://www-ra.informatik.uni-tuebingen.de/software/IAGEN/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6ea/2063505/2e5ff7ceae86/1471-2105-8-334-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验