Klump H
Institut für Physikalische Chemie, Universität Freiburg, F.R.G.
Biosystems. 1987;21(1):33-49. doi: 10.1016/0303-2647(87)90005-0.
The list of published restriction endonucleases along with their substrates provides an excellent data base for the evaluation of the evolution and codification of the key elements for specific recognition sites on the DNA. In this paper the considerations will be limited to palindromic tetramer-, pentamer-, and hexamer-sequences. It is basically assumed that each base pair within these sequences has to be recognized by directionally unique bidentate hydrogen bonds either within the plane of the base pair or by bridging the appropriate H-bond donor/acceptor groups of the neighbouring bases of the same strand. Thus sequence specificity is mediated by twelve (eight) H-bonds, originating from the protein recognition modules. Besides a pronounced preference for GC base pairs expressed by their high frequency in the most abundant sequences, serving the need of maximal thermodynamic stability of the double helical substrates, it can also be shown that the stacking of consecutive bases within the recognition site sequences plays a major role in shaping the particular DNA/protein interface. Finally it will be demonstrated that the full set of sequences discussed in this paper can readily be derived by stepwise expanding the vocabulary of three simple tetrameric sequences by inserting single base pairs into the centre of a minimal sequence, thus creating all the published pentameric restriction sites, or by inserting/adding two GC base pairs in a palindromic way, thus creating the known multiplicity of hexameric sites.
已发表的限制性内切酶及其底物列表为评估DNA上特定识别位点关键元件的进化和编码提供了一个出色的数据库。在本文中,讨论将限于回文四聚体、五聚体和六聚体序列。基本假设是,这些序列中的每个碱基对必须通过碱基对平面内方向唯一的双齿氢键,或通过桥接同一链上相邻碱基的适当氢键供体/受体基团来识别。因此,序列特异性由来自蛋白质识别模块的十二个(八个)氢键介导。除了在最丰富的序列中以高频率表现出对GC碱基对的明显偏好,以满足双螺旋底物最大热力学稳定性的需求外,还可以表明,识别位点序列中连续碱基的堆积在塑造特定的DNA/蛋白质界面中起主要作用。最后将证明,本文讨论的全套序列可以通过以下方式轻松推导出来:通过在最小序列的中心插入单个碱基对来逐步扩展三个简单四聚体序列的词汇,从而创建所有已发表的五聚体限制性位点;或者以回文方式插入/添加两个GC碱基对,从而创建已知的多种六聚体位点。