Conley Andrew B, Piriyapongsa Jittima, Jordan I King
School of Biology, Georgia Institute of Technology, 310 Ferst Drive, Atlanta, GA 30306, USA.
Bioinformatics. 2008 Jul 15;24(14):1563-7. doi: 10.1093/bioinformatics/btn243. Epub 2008 Jun 5.
Endogenous retrovirus (ERV) elements have been shown to contribute promoter sequences that can initiate transcription of adjacent human genes. However, the extent to which retroviral sequences initiate transcription within the human genome is currently unknown. We analyzed genome sequence and high-throughput expression data to systematically evaluate the presence of retroviral promoters in the human genome.
We report the existence of 51,197 ERV-derived promoter sequences that initiate transcription within the human genome, including 1743 cases where transcription is initiated from ERV sequences that are located in gene proximal promoter or 5' untranslated regions (UTRs). A total of 114 of the ERV-derived transcription start sites can be demonstrated to drive transcription of 97 human genes, producing chimeric transcripts that are initiated within ERV long terminal repeat (LTR) sequences and read-through into known gene sequences. ERV promoters drive tissue-specific and lineage-specific patterns of gene expression and contribute to expression divergence between paralogs. These data illustrate the potential of retroviral sequences to regulate human transcription on a large scale consistent with a substantial effect of ERVs on the function and evolution of the human genome.
内源性逆转录病毒(ERV)元件已被证明可提供启动相邻人类基因转录的启动子序列。然而,逆转录病毒序列在人类基因组内启动转录的程度目前尚不清楚。我们分析了基因组序列和高通量表达数据,以系统评估人类基因组中逆转录病毒启动子的存在情况。
我们报告了在人类基因组内启动转录的51,197个源自ERV的启动子序列的存在,其中包括1743例转录起始于位于基因近端启动子或5'非翻译区(UTR)的ERV序列。总共114个源自ERV的转录起始位点可被证明驱动97个人类基因的转录,产生嵌合转录本,这些转录本在ERV长末端重复序列(LTR)内起始并通读至已知基因序列。ERV启动子驱动基因表达的组织特异性和谱系特异性模式,并导致旁系同源基因之间的表达差异。这些数据表明逆转录病毒序列在大规模调节人类转录方面的潜力,这与ERV对人类基因组的功能和进化具有重大影响相一致。