Konopka A K, Smythers G W, Owens J, Maizel J V
National Cancer Institute, Frederick, Maryland 21701-1013.
Gene Anal Tech. 1987 Jul-Aug;4(4):63-74. doi: 10.1016/0735-0651(87)90020-3.
Computer-assisted sequence analysis was applied to detect the most apparent nonrandom sequence motifs in eukaryotic introns. We describe in detail a method, which we call distance analysis, that we applied to the extensive study of 405 eukaryotic intron sequences. We observed very strong two-base periodicities for almost all tetranucleotides that are tandem repeats of nonhomopolymeric dinucleotides (the exception was GCGC and CGCG). We also observed, by using a fixed-point alignment method, that these periodic sequence motifs belong to large clusters of dinucleotides repeated tandemly as many as 15-35 times, which corresponds to the cluster lengths of 30-70 bases. We did not observe two-base periodicity of tetranucleotides in the collections of either 262 spliced eukaryotic exons or 107 bacterial genes. Instead, these sequences displayed strong three-base periodicity of some other tetranucleotides. These findings suggest that introns and exons display distinct sequence properties that can be used for mapping purposes.
运用计算机辅助序列分析来检测真核生物内含子中最明显的非随机序列基序。我们详细描述了一种我们称为距离分析的方法,该方法应用于对405个真核生物内含子序列的广泛研究。我们观察到,几乎所有作为非均聚二核苷酸串联重复的四核苷酸都具有很强的双碱基周期性(例外情况是GCGC和CGCG)。我们还通过定点比对方法观察到,这些周期性序列基序属于串联重复多达15 - 35次的二核苷酸大簇,这对应于30 - 70个碱基的簇长度。在262个剪接后的真核生物外显子或107个细菌基因的集合中,我们未观察到四核苷酸的双碱基周期性。相反,这些序列显示出其他一些四核苷酸的强三碱基周期性。这些发现表明,内含子和外显子具有不同的序列特性,可用于图谱绘制目的。