Institute for Systems Biology, Seattle, Washington, United States of America.
PLoS One. 2012;7(8):e42779. doi: 10.1371/journal.pone.0042779. Epub 2012 Aug 27.
Transcription factor-DNA interactions, central to cellular regulation and control, are commonly described by position weight matrices (PWMs). These matrices are frequently used to predict transcription factor binding sites in regulatory regions of DNA to complement and guide further experimental investigation. The DNA sequence preferences of transcription factors, encoded in PWMs, are dictated primarily by select residues within the DNA binding domain(s) that interact directly with DNA. Therefore, the DNA binding properties of homologous transcription factors with identical DNA binding domains may be characterized by PWMs derived from different species. Accordingly, we have implemented a fully automated domain-level homology searching method for identical DNA binding sequences.By applying the domain-level homology search to transcription factors with existing PWMs in the JASPAR and TRANSFAC databases, we were able to significantly increase coverage in terms of the total number of PWMs associated with a given species, assign PWMs to transcription factors that did not previously have any associations, and increase the number of represented species with PWMs over an order of magnitude. Additionally, using protein binding microarray (PBM) data, we have validated the domain-level method by demonstrating that transcription factor pairs with matching DNA binding domains exhibit comparable DNA binding specificity predictions to transcription factor pairs with completely identical sequences.The increased coverage achieved herein demonstrates the potential for more thorough species-associated investigation of protein-DNA interactions using existing resources. The PWM scanning results highlight the challenging nature of transcription factors that contain multiple DNA binding domains, as well as the impact of motif discovery on the ability to predict DNA binding properties. The method is additionally suitable for identifying domain-level homology mappings to enable utilization of additional information sources in the study of transcription factors. The domain-level homology search method, resulting PWM mappings, web-based user interface, and web API are publicly available at http://dodoma.systemsbiology.netdodoma.systemsbiology.net.
转录因子与 DNA 的相互作用是细胞调控的核心,通常用位置权重矩阵 (PWMs) 来描述。这些矩阵常用于预测 DNA 调控区域中转录因子的结合位点,以补充和指导进一步的实验研究。PWMs 中编码的转录因子与 DNA 的结合序列偏好主要由与 DNA 直接相互作用的 DNA 结合域内的选择残基决定。因此,具有相同 DNA 结合域的同源转录因子的 DNA 结合特性可以用来自不同物种的 PWM 来表征。因此,我们实现了一种完全自动化的同源性搜索方法,用于搜索具有相同 DNA 结合序列的同源性转录因子。通过将域级同源性搜索应用于 JASPAR 和 TRANSFAC 数据库中具有现有 PWM 的转录因子,我们能够显著增加与给定物种相关的 PWM 总数的覆盖范围,将 PWM 分配给以前没有任何关联的转录因子,并将具有 PWM 的代表物种数量增加一个数量级。此外,使用蛋白质结合微阵列 (PBM) 数据,我们通过证明具有匹配 DNA 结合域的转录因子对与具有完全相同序列的转录因子对具有可比的 DNA 结合特异性预测,验证了域级方法的有效性。本文实现的更高的覆盖范围表明,使用现有资源更全面地研究物种相关的蛋白质-DNA 相互作用具有潜力。PWM 扫描结果突出了包含多个 DNA 结合域的转录因子的挑战性,以及 motif 发现对预测 DNA 结合特性能力的影响。该方法还适合于识别域级同源映射,以在转录因子研究中利用其他信息源。域级同源性搜索方法、产生的 PWM 映射、基于网络的用户界面和网络 API 可在 http://dodoma.systemsbiology.net 上获得。