Klémal Jirí, Blachon Sylvain, Soulet Arnaud, Crémilleux Bruno, Gandrillon Olivier
GREYC, CNRS UMR 6072, Université de Caen, Campus Côte de Nacre, F-14032 Caen Cédex, France.
In Silico Biol. 2008;8(2):157-75.
Current analyses of co-expressed genes are often based on global approaches such as clustering or bi-clustering. An alternative way is to employ local methods and search for patterns--sets of genes displaying specific expression properties in a set of situations. The main bottleneck of this type of analysis is twofold--computational costs and an overwhelming number of candidate patterns which can hardly be further exploited. A timely application of background knowledge available in literature databases, biological ontologies and other sources can help to focus on the most plausible patterns only. The paper proposes, implements and tests a flexible constraint-based framework that enables the effective mining and representation of meaningful over-expression patterns representing intrinsic associations among genes and biological situations. The framework can be simultaneously applied to a wide spectrum of genomic data and we demonstrate that it allows to generate new biological hypotheses with clinical implications.
当前对共表达基因的分析通常基于聚类或双聚类等全局方法。另一种方法是采用局部方法并寻找模式——即在一组情况下显示特定表达特性的基因集。这类分析的主要瓶颈有两个方面——计算成本以及大量几乎无法进一步利用的候选模式。及时应用文献数据库、生物本体和其他来源中可用的背景知识有助于仅关注最合理的模式。本文提出、实现并测试了一个基于灵活约束的框架,该框架能够有效地挖掘和表示代表基因与生物学情况之间内在关联的有意义的过表达模式。该框架可同时应用于广泛的基因组数据,并且我们证明它能够产生具有临床意义的新生物学假设。