Karlin S, Macken C
Department of Mathematics, Stanford University, CA 94305.
Nucleic Acids Res. 1991 Aug 11;19(15):4241-6. doi: 10.1093/nar/19.15.4241.
A statistical method based on r-fragments, sums of distances between (r + 1) consecutive restriction enzyme sites, is introduced for detecting nonrandomness in the distribution or too markers in sequence data. The technique is applicable whenever large numbers of markers are available and will detect clumping, excessive dispersion or too much evenness of spacing of the markers. It is particularly adapted to varying the scale on which inhomogeneities can be detected, from nearest neighbor interactions to more distant interactions. The r-fragment procedure is applied primarily to the Kohara et al. (1) physical map of E. coli. Other applications to DAM methylation sites in E. coli and NotI sites in human chromosome 21 are presented. Restriction sites for the eight enzymes used in (1) appear to be randomly distributed, although at widely differing densities. These conclusions are substantially in agreement with the analysis of Churchill et al. (3). Extreme variability in the density of the eight restriction enzyme sites cannot be explained by variability in mono-, di- or trinucleotide frequencies.
介绍了一种基于r片段(即(r + 1)个连续限制酶切位点之间的距离总和)的统计方法,用于检测序列数据中标记分布的非随机性。只要有大量标记可用,该技术就适用,并且能够检测到标记的聚集、过度分散或间距过于均匀的情况。它特别适合于改变检测不均匀性的尺度,从最近邻相互作用到更远距离的相互作用。r片段程序主要应用于Kohara等人(1)绘制的大肠杆菌物理图谱。还展示了该方法在大肠杆菌DAM甲基化位点和人类21号染色体NotI位点上的其他应用。尽管(1)中使用的8种酶的限制酶切位点密度差异很大,但这些位点似乎是随机分布的。这些结论与Churchill等人(3)的分析基本一致。8种限制酶切位点密度的极端变异性不能用单核苷酸、二核苷酸或三核苷酸频率的变异性来解释。