Barry A E, Howman E V, Cancilla M R, Saffery R, Choo K H
The Murdoch Institute, Royal Children's Hospital, Flemington Road, Parkville 3052, Australia.
Hum Mol Genet. 1999 Feb;8(2):217-27. doi: 10.1093/hmg/8.2.217.
We previously described the cloning of an 80 kb DNA corresponding to the core protein-binding domain of a human chromosome 10-derived neocentromere. Here we report the complete sequence of this DNA (designated NC DNA) and its detailed structural analysis. The sequence is devoid of human centromeric alpha-satellite DNA and the pericentric beta- and gamma-satellites, the ATRS and 48 bp repeat DNA. One copy of a sequence that is related to the CENPB box motif is present, and a number of copies of other pericentric sequences including pJalpha and classical satellites I and III are present but both their relative sparsity and non-tandem organization suggest that each sequence, on its own, is unlikely to mimic any role the sequence may have in the normal centromere. The DNA-binding motifs of the architectural and regulatory proteins HMGI and topoII have a normal abundance and random distribution, implying that these sequences are not key functional elements. The total A + T content of the sequence is not notably different from that of the human genome, but an abundance of AT-rich islands and a biased distribution of these islands within the NC sequence are clearlydiscernible and may be functionally significant. Substantial amounts of transposable elements and low copy number tandem repeats, including several that are highly AT- and purine-rich, are also present and may act as functional elements. One of the AT-rich tandemrepeats (AT28) may form interesting structures and is described in detail. The defined features show only a loose resemblance to the structures of known centromeres, highlighting the possibility that, rather than a conserved primary sequence, it is the overallcomposition and distribution patterns of various unknown functional elements, or any 'ordinary' DNA under appropriate epigenetic influences, that determine centromere formation and function. This is the firstdetailed analysis of a neocentromere DNA and provides a basis for comparison against future sequences.
我们之前描述了对应于源自人类10号染色体的新着丝粒核心蛋白结合结构域的80 kb DNA的克隆。在此,我们报告该DNA(命名为NC DNA)的完整序列及其详细的结构分析。该序列缺乏人类着丝粒α卫星DNA以及着丝粒周围的β和γ卫星、ATRS和48 bp重复DNA。存在一个与CENPB框基序相关的序列拷贝,还存在一些其他着丝粒周围序列的拷贝,包括pJα以及经典卫星I和III,但它们相对稀少且非串联排列表明,每个序列本身不太可能模拟其在正常着丝粒中可能具有的任何作用。结构和调节蛋白HMGI和拓扑异构酶II的DNA结合基序具有正常丰度和随机分布,这意味着这些序列不是关键功能元件。该序列的总A + T含量与人类基因组的总A + T含量没有显著差异,但明显可辨别出大量富含AT的岛以及这些岛在NC序列内的偏向分布,这可能具有功能意义。还存在大量转座元件和低拷贝数串联重复序列,包括几个高度富含AT和嘌呤的序列,它们可能作为功能元件起作用。其中一个富含AT的串联重复序列(AT28)可能形成有趣的结构,并将进行详细描述。所确定的特征与已知着丝粒的结构仅存在松散的相似性,这突出了一种可能性,即决定着丝粒形成和功能的不是保守的一级序列,而是各种未知功能元件的整体组成和分布模式,或者是在适当表观遗传影响下的任何“普通”DNA。这是对新着丝粒DNA的首次详细分析,并为与未来序列进行比较提供了基础。