Barash Yoseph, Friedman Nir
School of Computer Science and Engineering, Hebrew University, Jerusalem 91904, Israel.
J Comput Biol. 2002;9(2):169-91. doi: 10.1089/10665270252935403.
The recent growth in genomic data and measurements of genome-wide expression patterns allows us to apply computational tools to examine gene regulation by transcription factors. In this work, we present a class of mathematical models that help in understanding the connections between transcription factors and functional classes of genes based on genetic and genomic data. Such a model represents the joint distribution of transcription factor binding sites and of expression levels of a gene in a unified probabilistic model. Learning a combined probability model of binding sites and expression patterns enables us to improve the clustering of the genes based on the discovery of putative binding sites and to detect which binding sites and experiments best characterize a cluster. To learn such models from data, we introduce a new search method that rapidly learns a model according to a Bayesian score. We evaluate our method on synthetic data as well as on real life data and analyze the biological insights it provides. Finally, we demonstrate the applicability of the method to other data analysis problems in gene expression data.
近期基因组数据的增长以及全基因组表达模式的测量,使我们能够应用计算工具来研究转录因子对基因的调控。在这项工作中,我们提出了一类数学模型,基于遗传和基因组数据帮助理解转录因子与基因功能类别之间的联系。这样的模型在一个统一的概率模型中表示转录因子结合位点和基因表达水平的联合分布。学习结合位点和表达模式的组合概率模型,使我们能够基于假定结合位点的发现改进基因聚类,并检测哪些结合位点和实验最能表征一个聚类。为了从数据中学习此类模型,我们引入了一种新的搜索方法,该方法能根据贝叶斯评分快速学习模型。我们在合成数据以及实际数据上评估我们的方法,并分析其提供的生物学见解。最后,我们展示了该方法在基因表达数据中其他数据分析问题上的适用性。