Department of Information and Computer Science, Aalto University School of Science and Technology, Helsinki, Finland.
Proc Natl Acad Sci U S A. 2010 Apr 27;107(17):7793-8. doi: 10.1073/pnas.0914285107. Epub 2010 Apr 12.
We present a computational method for identifying potential targets of a transcription factor (TF) using wild-type gene expression time series data. For each putative target gene we fit a simple differential equation model of transcriptional regulation, and the model likelihood serves as a score to rank targets. The expression profile of the TF is modeled as a sample from a Gaussian process prior distribution that is integrated out using a nonparametric Bayesian procedure. This results in a parsimonious model with relatively few parameters that can be applied to short time series datasets without noticeable overfitting. We assess our method using genome-wide chromatin immunoprecipitation (ChIP-chip) and loss-of-function mutant expression data for two TFs, Twist, and Mef2, controlling mesoderm development in Drosophila. Lists of top-ranked genes identified by our method are significantly enriched for genes close to bound regions identified in the ChIP-chip data and for genes that are differentially expressed in loss-of-function mutants. Targets of Twist display diverse expression profiles, and in this case a model-based approach performs significantly better than scoring based on correlation with TF expression. Our approach is found to be comparable or superior to ranking based on mutant differential expression scores. Also, we show how integrating complementary wild-type spatial expression data can further improve target ranking performance.
我们提出了一种计算方法,用于使用野生型基因表达时间序列数据识别转录因子 (TF) 的潜在靶标。对于每个假定的靶标基因,我们拟合一个简单的转录调节微分方程模型,模型似然度作为评分来对靶标进行排序。TF 的表达谱被建模为来自高斯过程先验分布的样本,该分布使用非参数贝叶斯程序进行积分。这导致了一个具有相对较少参数的简约模型,可以应用于没有明显过度拟合的短时间序列数据集。我们使用全基因组染色质免疫沉淀 (ChIP-chip) 和两种 TF Twist 和 Mef2 的功能丧失突变体表达数据来评估我们的方法,这些 TF 控制果蝇中中胚层的发育。我们方法识别的排名靠前的基因列表显著富集了在 ChIP-chip 数据中识别到的靠近结合区域的基因和在功能丧失突变体中差异表达的基因。Twist 的靶标显示出多样化的表达谱,在这种情况下,基于模型的方法的性能明显优于基于与 TF 表达的相关性进行评分的方法。我们的方法被发现与基于突变体差异表达评分的排名相当或更优。此外,我们展示了如何整合互补的野生型空间表达数据可以进一步提高靶标排名性能。