Rosandić M, Paar V, Basar I
Department of Internal Medicine, University Hospital Rebro, University of Zagreb, Kispatićeva 12, Zagreb, Croatia.
J Theor Biol. 2003 Mar 7;221(1):29-37. doi: 10.1006/jtbi.2003.3165.
A new key-string segmentation algorithm for identification of alpha satellite DNAs and higher-order repeat (HOR) units was introduced and exemplified. Starting with an initial key string, we determine the dominant key string and HOR. Our key-string algorithm was used to scan the recent GenBank data for human alpha satellite DNA sequence AC017075.8 (193 277 bp) from the centromeric region of chromosome 7. The sequence was computationally segmented into one HOR domain (super-repeat domain) and two non-HOR domains. Dominant key-string GTTTCT provided segmentation in terms of alpha monomers. The HOR is tandemly repeated in 54 copies in the super-repeat (HOR) domain. Five insertions and three deletions in the HOR structure associated with a dominant key string were identified. Concensus HOR was constructed. Divergence of individual HOR copies from concensus amounts to 0.7% on the average, while divergence between 16 monomer variants within each HOR is on the average 20%. In the front and back domain, 199 monomer variants were identified that are not organized in HOR and diverge by 20-40%.
介绍并举例说明了一种用于识别α卫星DNA和高阶重复(HOR)单元的新的关键字符串分割算法。从初始关键字符串开始,我们确定主导关键字符串和HOR。我们的关键字符串算法用于扫描最近GenBank数据库中来自7号染色体着丝粒区域的人类α卫星DNA序列AC017075.8(193277 bp)。该序列通过计算被分割为一个HOR结构域(超级重复结构域)和两个非HOR结构域。主导关键字符串GTTTCT根据α单体提供了分割。HOR在超级重复(HOR)结构域中以54个拷贝串联重复。识别出与主导关键字符串相关的HOR结构中的5个插入和3个缺失。构建了共有HOR。单个HOR拷贝与共有的平均差异为0.7%,而每个HOR内16个单体变体之间的平均差异为20%。在前结构域和后结构域中,识别出199个未组织成HOR且差异为20 - 40%的单体变体。