Luscombe Nicholas M, Thornton Janet M
Biomolecular Structures and Modelling Unit, Department of Biochemistry and Molecular Biology, University College, London, UK.
J Mol Biol. 2002 Jul 26;320(5):991-1009. doi: 10.1016/s0022-2836(02)00571-5.
We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.
我们研究了21个DNA结合蛋白家族中氨基酸残基序列的保守性,并研究了突变对DNA序列识别的影响。通过将每个蛋白家族分为以下三类之一,这些观察结果能得到最好的理解:(i)非特异性的,其结合与DNA序列无关;(ii)高度特异性的,其结合是特异性的,且该家族的所有成员都靶向相同的DNA序列;(iii)多特异性的,其结合也是特异性的,但家族中的各个成员靶向不同的DNA序列。总体而言,与DNA接触的蛋白质残基比蛋白质表面的其他部分保守性更好,但个别残基位置存在复杂的潜在保守趋势。与DNA主链相互作用的氨基酸残基在所有蛋白家族中都高度保守,并为同源蛋白-DNA复合物提供了稳定接触的核心。相比之下,与DNA碱基相互作用的氨基酸残基的保守程度因家族分类而异。在非特异性家族中,与碱基接触的残基高度保守,且相互作用总是出现在小沟中,在那里碱基类型之间几乎没有区别。在高度特异性家族中,与碱基接触的残基高度保守,使成员蛋白能够识别相同的靶序列。在多特异性家族中,与碱基接触的残基频繁发生突变,使不同的蛋白能够识别不同的靶序列。最后,我们报告称,与靶序列中碱基的相互作用通常(尽管并非总是)遵循氨基酸-碱基识别的通用密码,并且对于这些相互作用,氨基酸突变的影响最容易理解。