Allen Elena, Horvath Steve, Tong Frances, Kraft Peter, Spiteri Elizabeth, Riggs Arthur D, Marahrens York
Department of Human Genetics, University of California, Los Angeles, CA 90095, USA.
Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9940-5. doi: 10.1073/pnas.1737401100. Epub 2003 Aug 8.
Genes subject to monoallelic expression are expressed from only one of the two alleles either selected at random (random monoallelic genes) or in a parent-of-origin specific manner (imprinted genes). Because high densities of long interspersed nuclear element (LINE)-1 transposon sequence have been implicated in X-inactivation, we asked whether monoallelically expressed autosomal genes are also flanked by high densities of LINE-1 sequence. A statistical analysis of repeat content in the regions surrounding monoallelically and biallelically expressed genes revealed that random monoallelic genes were flanked by significantly higher densities of LINE-1 sequence, evolutionarily more recent and less truncated LINE-1 elements, fewer CpG islands, and fewer base-pairs of short interspersed nuclear elements (SINEs) sequence than biallelically expressed genes. Random monoallelic and imprinted genes were pooled and subjected to a clustering analysis algorithm, which found two clusters on the basis of aforementioned sequence characteristics. Interestingly, these clusters did not follow the random monoallelic vs. imprinted classifications. We infer that chromosomal sequence context plays a role in monoallelic gene expression and may involve the recognition of long repeats or other features. The sequence characteristics that distinguished the high-LINE-1 category were used to identify more than 1,000 additional genes from the human and mouse genomes as candidate genes for monoallelic expression.
单等位基因表达的基因仅从两个等位基因中的一个表达,要么是随机选择(随机单等位基因),要么是以亲本来源特异性方式(印记基因)。由于长散在核元件1(LINE-1)转座子序列的高密度与X染色体失活有关,我们询问单等位基因表达的常染色体基因是否也侧翼有高密度的LINE-1序列。对单等位基因表达和双等位基因表达基因周围区域的重复序列含量进行统计分析发现,与双等位基因表达的基因相比,随机单等位基因侧翼的LINE-1序列密度显著更高,进化上更新且截短较少的LINE-1元件更多,CpG岛更少,短散在核元件(SINEs)序列的碱基对更少。将随机单等位基因和印记基因合并并进行聚类分析算法,该算法根据上述序列特征发现了两个聚类。有趣的是,这些聚类并不遵循随机单等位基因与印记基因的分类。我们推断染色体序列背景在单等位基因表达中起作用,可能涉及对长重复序列或其他特征的识别。用于区分高LINE-1类别的序列特征被用于从人类和小鼠基因组中识别出1000多个额外的基因作为单等位基因表达的候选基因。