CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China.
Genomics Proteomics Bioinformatics. 2012 Feb;10(1):11-22. doi: 10.1016/S1672-0229(11)60029-6.
Since the human genome is mostly transcribed, genetic variations must exhibit sequence signatures reflecting the relationship between transcription processes and chromosomal structures as we have observed in unicellular organisms. In this study, a set of 646 ubiquitous expression-invariable genes (EIGs) which are present in germline cells were defined and examined based on RNA-sequencing data from multiple high-throughput transcriptomic data. We demonstrated a relationship between gene expression level and transcript-centric mutations in the human genome based on single nucleotide polymorphism (SNP) data. A significant positive correlation was shown between gene expression and mutation, where highly-expressed genes accumulate more mutations than lowly-expressed genes. Furthermore, we found four major types of transcript-centric mutations: C→T, A→G, C→G, and G→T in human genomes and identified a negative gradient of the sequence variations aligning from the 5' end to the 3' end of the transcription units (TUs). The periodical occurrence of these genetic variations across TUs is associated with nucleosome phasing. We propose that transcript-centric mutations are one of the major driving forces for gene and genome evolution along with creation of new genes, gene/genome duplication, and horizontal gene transfer.
由于人类基因组大部分是转录的,因此遗传变异必须表现出序列特征,反映转录过程与染色体结构之间的关系,正如我们在单细胞生物中观察到的那样。在这项研究中,基于来自多个高通量转录组数据的 RNA-seq 数据,定义并研究了一组 646 个普遍表达不变基因 (EIG),这些基因存在于生殖细胞中。我们基于单核苷酸多态性 (SNP) 数据,展示了人类基因组中基因表达水平与转录中心突变之间的关系。基因表达与突变之间存在显著的正相关,高表达基因比低表达基因积累更多的突变。此外,我们发现了人类基因组中四种主要类型的转录中心突变:C→T、A→G、C→G 和 G→T,并鉴定了从转录单元 (TU) 的 5' 端到 3' 端排列的序列变异的负梯度。这些遗传变异在 TU 中的周期性发生与核小体相位有关。我们提出,转录中心突变是基因和基因组进化的主要驱动力之一,与新基因的产生、基因/基因组复制和水平基因转移一起推动了进化。