Murakami Hiroo, Aburatani Sachiyo, Horimoto Katsuhisa
Laboratory of Biostatistics, Institute of Medical Science, University of Tokyo, Shirokane-dai 4-6-1, Minato-ku, Tokyo 108-8639, Japan.
Genome Inform. 2005;16(1):13-21.
Various types of repeat sequences are abundant in genomic sequences, and they are associated with the biological phenomena at distinct levels. In particular, comparative analyses of whole-genome-sized sequence data have revealed that repeat sequences cause segmental duplications, which are a type of chromosomal structural arrangement. In this study, we analyzed the relationships between segmental duplications and repeat sequences in human chromosome 7. For this purpose, three methods for detecting repeat sequences were applied to the genomic sequences of human chromosome 7: RepeatMasker for the dispersed repeats, TRF for the tandem repeats, and STEPSTONE for the inter-spread repeats. By plotting the detected repeat sequences against the locations on the chromosome, all three types of repeats were found to be concentrated around the regions of segmental duplications, as a macroscopic feature of their distributions. Furthermore, the latter two repeat sequences were classified in terms of their periods, and the distribution bias of the detected repeat sequences was statistically tested between the segmental duplication regions and the other regions. As a result, the periods of two repeats were biased, with less than a 5% level of significance probability by the chi(2) test, and the repeats with long periods, about 130bp and more than 400bp, were attributed to a bias with a 5% level of significance probability by the normalized residual test. The mechanism of segmental duplications is discussed based on the present results.
各种类型的重复序列在基因组序列中大量存在,并且它们与不同层面的生物学现象相关。特别地,对全基因组大小序列数据的比较分析揭示,重复序列会导致片段重复,这是一种染色体结构排列类型。在本研究中,我们分析了人类7号染色体上片段重复与重复序列之间的关系。为此,将三种检测重复序列的方法应用于人类7号染色体的基因组序列:用于检测分散重复序列的RepeatMasker、用于检测串联重复序列的TRF以及用于检测散布重复序列的STEPSTONE。通过将检测到的重复序列相对于染色体上的位置作图,发现所有三种类型的重复序列都集中在片段重复区域周围,这是它们分布的一个宏观特征。此外,对后两种重复序列按其周期进行了分类,并对检测到的重复序列在片段重复区域和其他区域之间的分布偏差进行了统计检验。结果,两种重复序列的周期存在偏差,经卡方检验显著性概率小于5%,经标准化残差检验,周期约为130bp及以上且大于等于400bp的重复序列存在显著性概率为5%的偏差。基于目前的结果对片段重复的机制进行了讨论。