Division of Biostatistics, Department of Clinical Sciences, University of Texas Southwestern Medical Center at Dallas, Dallas, TX, USA.
Stat Med. 2010 Feb 20;29(4):489-503. doi: 10.1002/sim.3815.
The genome-wide DNA-protein-binding data, DNA sequence data and gene expression data represent complementary means to deciphering global and local transcriptional regulatory circuits. Combining these different types of data can not only improve the statistical power, but also provide a more comprehensive picture of gene regulation. In this paper, we propose a novel statistical model to augment protein-DNA-binding data with gene expression and DNA sequence data when available. We specify a hierarchical Bayes model and use Markov chain Monte Carlo simulations to draw inferences. Both simulation studies and an analysis of an experimental data set show that the proposed joint modeling method can significantly improve the specificity and sensitivity of identifying target genes as compared with conventional approaches relying on a single data source.
全基因组 DNA-蛋白质结合数据、DNA 序列数据和基因表达数据代表了破译全局和局部转录调控回路的互补手段。结合这些不同类型的数据不仅可以提高统计能力,还可以更全面地了解基因调控。在本文中,我们提出了一种新的统计模型,当有蛋白质-DNA 结合数据、基因表达数据和 DNA 序列数据时,可以对其进行扩充。我们指定了一个层次贝叶斯模型,并使用马尔可夫链蒙特卡罗模拟进行推断。模拟研究和对实验数据集的分析都表明,与依赖单一数据源的传统方法相比,所提出的联合建模方法可以显著提高识别靶基因的特异性和敏感性。