Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei, Taiwan.
PLoS One. 2012;7(2):e30446. doi: 10.1371/journal.pone.0030446. Epub 2012 Feb 1.
DNA-binding proteins such as transcription factors use DNA-binding domains (DBDs) to bind to specific sequences in the genome to initiate many important biological functions. Accurate prediction of such target sequences, often represented by position weight matrices (PWMs), is an important step to understand many biological processes. Recent studies have shown that knowledge-based potential functions can be applied on protein-DNA co-crystallized structures to generate PWMs that are considerably consistent with experimental data. However, this success has not been extended to DNA-binding proteins lacking co-crystallized structures. This study aims at investigating the possibility of predicting the DNA sequences bound by DNA-binding proteins from the proteins' unbound structures (structures of the unbound state). Given an unbound query protein and a template complex, the proposed method first employs structure alignment to generate synthetic protein-DNA complexes for the query protein. Once a complex is available, an atomic-level knowledge-based potential function is employed to predict PWMs characterizing the sequences to which the query protein can bind. The evaluation of the proposed method is based on seven DNA-binding proteins, which have structures of both DNA-bound and unbound forms for prediction as well as annotated PWMs for validation. Since this work is the first attempt to predict target sequences of DNA-binding proteins from their unbound structures, three types of structural variations that presumably influence the prediction accuracy were examined and discussed. Based on the analyses conducted in this study, the conformational change of proteins upon binding DNA was shown to be the key factor. This study sheds light on the challenge of predicting the target DNA sequences of a protein lacking co-crystallized structures, which encourages more efforts on the structure alignment-based approaches in addition to docking- and homology modeling-based approaches for generating synthetic complexes.
DNA 结合蛋白(如转录因子)使用 DNA 结合域(DBD)结合基因组中的特定序列,从而启动许多重要的生物学功能。准确预测这些靶序列(通常用位置权重矩阵(PWM)表示)是理解许多生物学过程的重要步骤。最近的研究表明,基于知识的势函数可应用于蛋白质-DNA 共结晶结构,以生成与实验数据相当一致的 PWM。然而,这种成功尚未扩展到缺乏共结晶结构的 DNA 结合蛋白。本研究旨在探讨从未结合状态的蛋白质结构预测 DNA 结合蛋白结合的 DNA 序列的可能性。给定一个未结合的查询蛋白和一个模板复合物,所提出的方法首先使用结构比对生成查询蛋白的合成蛋白-DNA 复合物。一旦获得复合物,就使用原子级基于知识的势函数来预测描述查询蛋白可以结合的序列的 PWM。该方法的评估基于七个 DNA 结合蛋白,这些蛋白具有 DNA 结合和未结合形式的结构,可用于预测,并具有已注释的 PWM 用于验证。由于这项工作是首次尝试从未结合的结构预测 DNA 结合蛋白的靶序列,因此检查和讨论了三种可能影响预测准确性的结构变化类型。基于本研究进行的分析,表明 DNA 结合时蛋白质构象的变化是关键因素。这项研究揭示了预测缺乏共结晶结构的蛋白质靶 DNA 序列的挑战,这鼓励在基于结构比对的方法以及基于对接和同源建模的方法之外,更多地致力于生成合成复合物的方法。