Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA.
Nucleic Acids Res. 2010 Apr;38(7):2154-67. doi: 10.1093/nar/gkp1180. Epub 2010 Jan 6.
Coupling chromatin immunoprecipitation (ChIP) with recently developed massively parallel sequencing technologies has enabled genome-wide detection of protein-DNA interactions with unprecedented sensitivity and specificity. This new technology, ChIP-Seq, presents opportunities for in-depth analysis of transcription regulation. In this study, we explore the value of using ChIP-Seq data to better detect and refine transcription factor binding sites (TFBS). We introduce a novel computational algorithm named Hybrid Motif Sampler (HMS), specifically designed for TFBS motif discovery in ChIP-Seq data. We propose a Bayesian model that incorporates sequencing depth information to aid motif identification. Our model also allows intra-motif dependency to describe more accurately the underlying motif pattern. Our algorithm combines stochastic sampling and deterministic 'greedy' search steps into a novel hybrid iterative scheme. This combination accelerates the computation process. Simulation studies demonstrate favorable performance of HMS compared to other existing methods. When applying HMS to real ChIP-Seq datasets, we find that (i) the accuracy of existing TFBS motif patterns can be significantly improved; and (ii) there is significant intra-motif dependency inside all the TFBS motifs we tested; modeling these dependencies further improves the accuracy of these TFBS motif patterns. These findings may offer new biological insights into the mechanisms of transcription factor regulation.
将染色质免疫沉淀(ChIP)与最近开发的大规模平行测序技术相结合,使我们能够以前所未有的灵敏度和特异性在全基因组范围内检测蛋白质-DNA 相互作用。这项新技术 ChIP-Seq 为深入分析转录调控提供了机会。在本研究中,我们探讨了使用 ChIP-Seq 数据更好地检测和细化转录因子结合位点(TFBS)的价值。我们引入了一种名为 Hybrid Motif Sampler(HMS)的新型计算算法,该算法专门用于在 ChIP-Seq 数据中发现 TFBS 基序。我们提出了一个贝叶斯模型,该模型结合了测序深度信息,以帮助识别基序。我们的模型还允许基序内的依赖性,以更准确地描述潜在的基序模式。我们的算法将随机抽样和确定性“贪婪”搜索步骤结合到一种新颖的混合迭代方案中。这种组合加速了计算过程。模拟研究表明,HMS 的性能优于其他现有方法。当将 HMS 应用于真实的 ChIP-Seq 数据集时,我们发现:(i)现有 TFBS 基序模式的准确性可以显著提高;(ii)我们测试的所有 TFBS 基序中都存在显著的基序内依赖性;对这些依赖性进行建模可以进一步提高这些 TFBS 基序模式的准确性。这些发现可能为转录因子调控机制提供新的生物学见解。