Laboratory of Molecular Modeling and Design, Dalian Institute of Chemical Physics, The Chinese Academy of Sciences, Dalian, Liaoning, China.
PLoS One. 2013;8(1):e52460. doi: 10.1371/journal.pone.0052460. Epub 2013 Jan 8.
Scanning through genomes for potential transcription factor binding sites (TFBSs) is becoming increasingly important in this post-genomic era. The position weight matrix (PWM) is the standard representation of TFBSs utilized when scanning through sequences for potential binding sites. However, many transcription factor (TF) motifs are short and highly degenerate, and methods utilizing PWMs to scan for sites are plagued by false positives. Furthermore, many important TFs do not have well-characterized PWMs, making identification of potential binding sites even more difficult. One approach to the identification of sites for these TFs has been to use the 3D structure of the TF to predict the DNA structure around the TF and then to generate a PWM from the predicted 3D complex structure. However, this approach is dependent on the similarity of the predicted structure to the native structure. We introduce here a novel approach to identify TFBSs utilizing structure information that can be applied to TFs without characterized PWMs, as long as a 3D complex structure (TF/DNA) exists. This approach utilizes an energy function that is uniquely trained on each structure. Our approach leads to increased prediction accuracy and robustness compared with those using a more general energy function. The software is freely available upon request.
在这个后基因组时代,扫描基因组以寻找潜在的转录因子结合位点(TFBS)变得越来越重要。位置权重矩阵(PWM)是扫描序列中潜在结合位点时使用的 TFBS 的标准表示形式。然而,许多转录因子(TF)基序较短且高度简并,利用 PWMs 扫描位点的方法容易出现假阳性。此外,许多重要的 TF 没有特征良好的 PWM,使得识别潜在的结合位点更加困难。对于这些 TF 的位点识别的一种方法是使用 TF 的 3D 结构来预测 TF 周围的 DNA 结构,然后从预测的 3D 复合物结构生成 PWM。然而,这种方法依赖于预测结构与天然结构的相似性。我们在这里引入了一种新的利用结构信息识别 TFBS 的方法,该方法可应用于没有特征化 PWM 的 TF,只要存在 3D 复合物结构(TF/DNA)即可。该方法利用了一种针对每个结构进行独特训练的能量函数。与使用更通用的能量函数相比,我们的方法提高了预测的准确性和稳健性。该软件可根据要求免费提供。