Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Street 32, Moscow 119991, GSP-1, Russia.
Nucleic Acids Res. 2013 Jan;41(Database issue):D195-202. doi: 10.1093/nar/gks1089. Epub 2012 Nov 21.
Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.
转录因子 (TF) 结合位点 (TFBS) 模型对于计算转录调控网络的重建至关重要。在现有的存储库中,一个 TF 通常有几个模型(也称为结合谱或基序),这些模型是从不同的实验数据中获得的。对于实际应用,为 TF 提供单个 TFBS 模型更为实用。我们表明,将来自各种类型实验的 TFBS 数据整合到单个模型中通常会导致模型质量的提高,这可能是由于对源特定技术偏差的部分纠正。我们提出了 Homo sapiens 综合模型集合 (HOCOMOCO,http://autosome.ru/HOCOMOCO/,http://cbrc.kaust.edu.sa/hocomoco/),其中包含了通过整合低通量和高通量方法获得的结合序列精心构建的 TFBS 模型。为了构建表示这些 TFBS 模型的位置权重矩阵,我们在四种计算模式下使用了 ChIPMunk 软件,包括与 DNA 螺旋螺距相关的新开发的周期性位置先验模式。我们为每个 TF 选择了一个 TFBS 模型,除非有明确的实验证据表明存在两个截然不同的 TFBS 模型。我们为每个模型分配了一个质量评级。HOCOMOCO 包含 426 个经过系统整理的 TFBS 模型,用于 401 个人类 TF,其中 172 个模型基于多个数据源。