Logic of Genomic Systems Laboratory, Spanish National Biotechnology Centre, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain.
PLoS Comput Biol. 2010 Nov 11;6(11):e1000989. doi: 10.1371/journal.pcbi.1000989.
The specific binding of regulatory proteins to DNA sequences exhibits no clear patterns of association between amino acids (AAs) and nucleotides (NTs). This complexity of protein-DNA interactions raises the question of whether a simple set of wide-coverage recognition rules can ever be identified. Here, we analyzed this issue using the extensive LacI family of transcriptional factors (TFs). We searched for recognition patterns by introducing a new approach to phylogenetic footprinting, based on the pervasive presence of local regulation in prokaryotic transcriptional networks. We identified a set of specificity correlations--determined by two AAs of the TFs and two NTs in the binding sites--that is conserved throughout a dominant subgroup within the family regardless of the evolutionary distance, and that act as a relatively consistent recognition code. The proposed rules are confirmed with data of previous experimental studies and by events of convergent evolution in the phylogenetic tree. The presence of a code emphasizes the stable structural context of the LacI family, while defining a precise blueprint to reprogram TF specificity with many practical applications.
调控蛋白与 DNA 序列的特异性结合没有表现出氨基酸(AAs)和核苷酸(NTs)之间明显的关联模式。这种蛋白质-DNA 相互作用的复杂性提出了一个问题,即是否可以确定一套简单的广泛覆盖的识别规则。在这里,我们使用广泛的 LacI 家族转录因子(TFs)来分析这个问题。我们通过引入一种新的基于原核转录网络中普遍存在局部调控的系统发育足迹分析方法来寻找识别模式。我们确定了一组特异性相关性——由 TF 的两个氨基酸和结合位点中的两个核苷酸决定——在家族中的一个主要亚群中是保守的,无论进化距离如何,并且可以作为一个相对一致的识别代码。所提出的规则通过先前实验研究的数据和系统发育树上的趋同进化事件得到了证实。代码的存在强调了 LacI 家族稳定的结构背景,同时为使用许多实际应用程序重新编程 TF 特异性定义了一个精确的蓝图。