Zheng Deyou, Zhang Zhaolei, Harrison Paul M, Karro John, Carriero Nick, Gerstein Mark
Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA.
J Mol Biol. 2005 May 27;349(1):27-45. doi: 10.1016/j.jmb.2005.02.072. Epub 2005 Apr 2.
Pseudogenes are inheritable genetic elements formally defined by two properties: their similarity to functioning genes and their presumed lack of activity. However, their precise characterization, particularly with respect to the latter quality, has proven elusive. An opportunity to explore this issue arises from the recent emergence of tiling-microarray data showing that intergenic regions (containing pseudogenes) are transcribed to a great degree. Here we focus on the transcriptional activity of pseudogenes on human chromosome 22. First, we integrated several sets of annotation to define a unified list of 525 pseudogenes on the chromosome. To characterize these further, we developed a comprehensive list of genomic features based on conservation in related organisms, expression evidence, and the presence of upstream regulatory sites. Of the 525 unified pseudogenes we could confidently classify 154 as processed and 49 as duplicated. Using data from tiling microarrays, especially from recent high-resolution oligonucleotide arrays, we found some evidence that up to a fifth of the 525 pseudogenes are potentially transcribed. Expressed sequence tags (EST) comparison further validated a number of these, and overall we found 17 pseudogenes with strong support for transcription. In particular, one of the pseudogenes with both EST and microarray evidence for transcription turned out to be a duplicated pseudogene in the cat eye syndrome critical region. Although we could not identify a meaningful number of transcription factor-binding sites (based on chromatin immunoprecipitation-chip data) near pseudogenes, we did find that approximately 12% of the pseudogenes had upstream CpG islands. Finally, analysis of corresponding syntenic regions in the mouse, rat and chimp genomes indicates, as previously suggested, that pseudogenes are less conserved than genes, but more preserved than the intergenic background (all notation is available from http://www.pseudogene.org).
假基因是一类可遗传的遗传元件,其正式定义基于两个特性:它们与功能基因的相似性以及假定的无活性状态。然而,它们的精确特征,尤其是关于后一种特性,已被证明难以捉摸。最近出现的平铺式微阵列数据表明基因间区域(包含假基因)在很大程度上会被转录,这为探索这个问题提供了一个契机。在这里,我们聚焦于人类22号染色体上假基因的转录活性。首先,我们整合了几套注释信息,以定义该染色体上525个假基因的统一列表。为了进一步表征这些假基因,我们基于相关生物体中的保守性、表达证据以及上游调控位点的存在,编制了一份全面的基因组特征列表。在这525个统一的假基因中,我们能够确定地将154个归类为加工型假基因,49个归类为复制型假基因。利用平铺式微阵列的数据,特别是来自近期高分辨率寡核苷酸阵列的数据,我们发现有证据表明,在这525个假基因中,多达五分之一可能被转录。表达序列标签(EST)比较进一步验证了其中一些,总体而言,我们发现有17个假基因有强有力的转录支持证据。特别是,其中一个既有EST又有微阵列转录证据的假基因,结果证明是猫眼综合征关键区域中的一个复制型假基因。尽管我们在假基因附近未能识别出数量可观的转录因子结合位点(基于染色质免疫沉淀芯片数据),但我们确实发现约12%的假基因有上游CpG岛。最后,对小鼠、大鼠和黑猩猩基因组中相应同线区域的分析表明,如之前所指出的,假基因的保守性低于基因,但比基因间背景更保守(所有注释可从http://www.pseudogene.org获取)。