McClelland M
Department of Biochemistry and Molecular Biology, University of Chicago, IL 60637.
Nucleic Acids Res. 1988 Mar 25;16(5):2283-94. doi: 10.1093/nar/16.5.2283.
I show that the recognition sequences of Type II restriction systems are correlated with the G + C content of the host bacterial DNA. Almost all restriction systems with G + C rich tetranucleotide recognition sequences are found in species with A + T rich genomes, whereas G + C rich hexanucleotide and octanucleotide recognition sequences are found almost exclusively in species with G + C rich genomes. Most hexanucleotide recognition sequences found in species with A + T rich genomes are A + T rich. This distribution eliminates a substantial proportion of the potential variance in the frequency of restriction recognition sequences in the host genomes. As a consequence, almost all restriction recognition sequences, including those eight base pairs in length (Not I and Sfi I), are predicted to occur with a frequency ranging from once every 300 to once every 5,000 base pairs in the host genome. Since the G + C content of bacteriophage DNA and of the host genome are also correlated, the data presented is evidence that most Type II "restriction systems" are indeed involved in phage restriction.
我发现II型限制系统的识别序列与宿主细菌DNA的G + C含量相关。几乎所有具有富含G + C的四核苷酸识别序列的限制系统都存在于基因组富含A + T的物种中,而富含G + C的六核苷酸和八核苷酸识别序列几乎只存在于基因组富含G + C的物种中。在基因组富含A + T的物种中发现的大多数六核苷酸识别序列富含A + T。这种分布消除了宿主基因组中限制识别序列频率的很大一部分潜在差异。因此,几乎所有的限制识别序列,包括那些长度为八个碱基对的序列(Not I和Sfi I),预计在宿主基因组中的出现频率为每300至每5000个碱基对出现一次。由于噬菌体DNA和宿主基因组的G + C含量也相关,所呈现的数据证明大多数II型“限制系统”确实参与噬菌体限制。