Carrel Laura, Park Chungoo, Tyekucheva Svitlana, Dunn John, Chiaromonte Francesca, Makova Kateryna D
Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America.
PLoS Genet. 2006 Sep 29;2(9):e151. doi: 10.1371/journal.pgen.0020151. Epub 2006 Aug 3.
What genomic landmarks render most genes silent while leaving others expressed on the inactive X chromosome in mammalian females? To date, signals determining expression status of genes on the inactive X remain enigmatic despite the availability of complete genomic sequences. Long interspersed repeats (L1s), particularly abundant on the X, are hypothesized to spread the inactivation signal and are enriched in the vicinity of inactive genes. However, both L1s and inactive genes are also more prevalent in ancient evolutionary strata. Did L1s accumulate there because of their role in inactivation or simply because they spent more time on the rarely recombining X? Here we utilize an experimentally derived inactivation profile of the entire human X chromosome to uncover sequences important for its inactivation, and to predict expression status of individual genes. Focusing on Xp22, where both inactive and active genes reside within evolutionarily young strata, we compare neighborhoods of genes with different inactivation states to identify enriched oligomers. Occurrences of such oligomers are then used as features to train a linear discriminant analysis classifier. Remarkably, expression status is correctly predicted for 84% and 91% of active and inactive genes, respectively, on the entire X, suggesting that oligomers enriched in Xp22 capture most of the genomic signal determining inactivation. To our surprise, the majority of oligomers associated with inactivated genes fall within L1 elements, even though L1 frequency in Xp22 is low. Moreover, these oligomers are enriched in parts of L1 sequences that are usually underrepresented in the genome. Thus, our results strongly support the role of L1s in X inactivation, yet indicate that a chromatin microenvironment composed of multiple genomic sequence elements determines expression status of X chromosome genes.
在哺乳动物雌性个体中,是什么基因组特征使得大多数基因在失活的X染色体上保持沉默,而其他一些基因却仍能表达?尽管已有完整的基因组序列,但迄今为止,决定失活X染色体上基因表达状态的信号仍然是个谜。长散在重复序列(L1s)在X染色体上特别丰富,有人推测它们会传播失活信号,并在失活基因附近富集。然而,L1s和失活基因在古老的进化层中也更为普遍。L1s在那里积累是因为它们在失活过程中的作用,还是仅仅因为它们在很少发生重组的X染色体上存在的时间更长?在这里,我们利用通过实验得出的整个人类X染色体的失活图谱,来揭示对其失活至关重要的序列,并预测单个基因的表达状态。聚焦于Xp22区域,该区域内失活和活跃的基因都位于进化上较年轻的层中,我们比较了具有不同失活状态的基因的邻域,以识别富集的寡聚物。然后将这些寡聚物的出现情况用作特征,来训练线性判别分析分类器。值得注意的是,对于整个X染色体上分别处于活跃和失活状态的基因,其表达状态的预测准确率分别为84%和91%,这表明在Xp22中富集的寡聚物捕获了大部分决定失活的基因组信号。令我们惊讶的是,与失活基因相关的大多数寡聚物都位于L1元件内,尽管Xp22中的L1频率很低。此外,这些寡聚物在基因组中通常含量较少的L1序列部分中富集。因此,我们的结果有力地支持了L1s在X染色体失活中的作用,但也表明由多个基因组序列元件组成的染色质微环境决定了X染色体基因的表达状态。