Hansen Matthew, Everett Logan, Singh Larry, Hannenhalli Sridhar
Department of Genetics, Penn Center for Bioinformatics, University of Pennsylvania, Pennsylvania, USA.
Algorithms Mol Biol. 2010 Jan 4;5:4. doi: 10.1186/1748-7188-5-4.
Functionally related genes tend to be correlated in their expression patterns across multiple conditions and/or tissue-types. Thus co-expression networks are often used to investigate functional groups of genes. In particular, when one of the genes is a transcription factor (TF), the co-expression-based interaction is interpreted, with caution, as a direct regulatory interaction. However, any particular TF, and more importantly, any particular regulatory interaction, is likely to be active only in a subset of experimental conditions. Moreover, the subset of expression samples where the regulatory interaction holds may be marked by presence or absence of a modifier gene, such as an enzyme that post-translationally modifies the TF. Such subtlety of regulatory interactions is overlooked when one computes an overall expression correlation.
Here we present a novel mixture modeling approach where a TF-Gene pair is presumed to be significantly correlated (with unknown coefficient) in an (unknown) subset of expression samples. The parameters of the model are estimated using a Maximum Likelihood approach. The estimated mixture of expression samples is then mined to identify genes potentially modulating the TF-Gene interaction. We have validated our approach using synthetic data and on four biological cases in cow, yeast, and humans.
While limited in some ways, as discussed, the work represents a novel approach to mine expression data and detect potential modulators of regulatory interactions.
功能相关的基因在多种条件和/或组织类型下的表达模式往往具有相关性。因此,共表达网络常被用于研究基因的功能组。特别是,当其中一个基因是转录因子(TF)时,基于共表达的相互作用会被谨慎地解释为直接调控相互作用。然而,任何特定的转录因子,更重要的是,任何特定的调控相互作用,可能只在一部分实验条件下活跃。此外,调控相互作用存在的表达样本子集可能由修饰基因的存在或缺失来标记,比如一种对转录因子进行翻译后修饰的酶。当计算整体表达相关性时,调控相互作用的这种微妙之处被忽略了。
在此,我们提出一种新颖的混合建模方法,其中假定转录因子 - 基因对在(未知的)表达样本子集中显著相关(系数未知)。使用最大似然法估计模型参数。然后挖掘估计出的表达样本混合集,以识别可能调节转录因子 - 基因相互作用的基因。我们已使用合成数据以及牛、酵母和人类的四个生物学案例验证了我们的方法。
如前所述,尽管在某些方面存在局限性,但这项工作代表了一种挖掘表达数据并检测调控相互作用潜在调节因子的新颖方法。