在ENCODE区域中5'远端转录起始位点的显著使用以及大量额外外显子的发现。

Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions.

作者信息

Denoeud France, Kapranov Philipp, Ucla Catherine, Frankish Adam, Castelo Robert, Drenkow Jorg, Lagarde Julien, Alioto Tyler, Manzano Caroline, Chrast Jacqueline, Dike Sujit, Wyss Carine, Henrichsen Charlotte N, Holroyd Nancy, Dickson Mark C, Taylor Ruth, Hance Zahra, Foissac Sylvain, Myers Richard M, Rogers Jane, Hubbard Tim, Harrow Jennifer, Guigó Roderic, Gingeras Thomas R, Antonarakis Stylianos E, Reymond Alexandre

机构信息

Grup de Recerca en Informática Biomèdica, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain.

出版信息

Genome Res. 2007 Jun;17(6):746-59. doi: 10.1101/gr.5660607.

DOI:10.1101/gr.5660607

PMID:17567994

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1891335/

Abstract

This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.

摘要

本报告展示了对DNA元件百科全书（ENCODE）试点项目所针对的人类基因组1%区域内399个注释蛋白编码基因座的转录产物进行的系统实证注释，采用了5' cDNA末端快速扩增（RACE）和高密度分辨率平铺阵列相结合的方法。我们鉴定出了先前未注释的、通常具有组织或细胞系特异性的转录片段（RACE片段），这些片段位于注释的5'末端的5'远端以及绝大多数（81.5%）测试基因的注释基因边界内。一半的远端RACE片段跨越了远离编码转录本主要部分的大片段基因组序列，并且常常与上游注释的基因重叠。值得注意的是，至少20%的新转录本在其开放阅读框（ORF）中有变化，其中大多数融合了相邻转录本的ORF。相当一部分远端RACE片段的表达水平与同一基因座已知外显子的表达水平相当，这表明它们并非极少数剪接形式的一部分。这些结果对于（1）我们目前对蛋白编码基因结构的理解；（2）我们对基因组中调控区域位置的看法；以及（3）映射到迄今被认为是“非编码”区域的序列多态性的解释具有重要意义，最终与疾病相关序列改变的鉴定有关。

相似文献

Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions.在ENCODE区域中5'远端转录起始位点的显著使用以及大量额外外显子的发现。

Genome Res. 2007 Jun;17(6):746-59. doi: 10.1101/gr.5660607.

Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome.利用 RACE 测序对 ENCODE 区域中转录基因座进行系统分析，揭示了人类基因组中广泛的转录。

Genome Biol. 2008 Jan 3;9(1):R3. doi: 10.1186/gb-2008-9-1-r3.

GENCODE: the reference human genome annotation for The ENCODE Project.GENCODE：ENCODE 项目的人类参考基因组注释。

Genome Res. 2012 Sep;22(9):1760-74. doi: 10.1101/gr.135350.111.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Characterizing the splice map of Turkey Hemorrhagic Enteritis Virus.描述土耳其出血性肠炎病毒的剪接图谱。

Virol J. 2024 Aug 6;21(1):175. doi: 10.1186/s12985-024-02449-0.

Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome.对实验数据集的综合分析揭示了人类基因组1%中许多新的启动子。

Genome Res. 2007 Jun;17(6):720-31. doi: 10.1101/gr.5716607.

Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome.评估不同高密度平铺微阵列策略用于绘制人类基因组转录区域图谱的性能。

Genome Res. 2007 Jun;17(6):886-97. doi: 10.1101/gr.5014606. Epub 2006 Nov 21.

The DART classification of unannotated transcription within the ENCODE regions: associating transcription with known and novel loci.ENCODE区域内未注释转录本的DART分类：将转录与已知和新基因座相关联。

Genome Res. 2007 Jun;17(6):732-45. doi: 10.1101/gr.5696007.

Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci.全基因组 PhyloCSF 发现高可信度的人类蛋白编码基因和外显子，有助于阐明 118 个 GWAS 基因座。

Genome Res. 2019 Dec;29(12):2073-2087. doi: 10.1101/gr.246462.118. Epub 2019 Sep 19.

Transcribed dark matter: meaning or myth?转录暗物质：意义还是神话？

Hum Mol Genet. 2010 Oct 15;19(R2):R162-8. doi: 10.1093/hmg/ddq362. Epub 2010 Aug 25.

引用本文的文献

RACE-Nano-Seq: Profiling Transcriptome Diversity of a Genomic Locus.RACE-Nano-Seq：基因组位点转录组多样性分析

Bio Protoc. 2025 Jul 5;15(13):e5374. doi: 10.21769/BioProtoc.5374.

The Evolution of Ultraconserved Elements in Vertebrates.脊椎动物中超保守元件的进化。

Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae146.

Alternative isoform expression of key thermogenic genes in human beige adipocytes.关键生热基因在人褐色脂肪细胞中的异构体表达。

Front Endocrinol (Lausanne). 2024 May 24;15:1395750. doi: 10.3389/fendo.2024.1395750. eCollection 2024.

RTCpredictor: identification of read-through chimeric RNAs from RNA sequencing data.RTCpredictor：从 RNA 测序数据中识别通读嵌合 RNA。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae251.

Evidence for widespread existence of functional novel and non-canonical human transcripts.有证据表明，功能新颖的非规范人类转录本广泛存在。

BMC Biol. 2023 Nov 24;21(1):271. doi: 10.1186/s12915-023-01753-5.

Hotspots of single-strand DNA "breakome" are enriched at transcriptional start sites of genes.单链DNA“断裂组”的热点在基因的转录起始位点富集。

Front Mol Biosci. 2022 Aug 15;9:895795. doi: 10.3389/fmolb.2022.895795. eCollection 2022.

Very long intergenic non-coding (vlinc) RNAs directly regulate multiple genes in cis and trans.非常长的基因间非编码（vlinc）RNA 可直接在顺式和反式中调控多个基因。

BMC Biol. 2021 May 20;19(1):108. doi: 10.1186/s12915-021-01044-x.

Regulation of mTOR signaling by long non-coding RNA.mTOR 信号通路的长链非编码 RNA 调控。

Biochim Biophys Acta Gene Regul Mech. 2020 Apr;1863(4):194449. doi: 10.1016/j.bbagrm.2019.194449. Epub 2019 Nov 18.

ChiTaRS 5.0: the comprehensive database of chimeric transcripts matched with druggable fusions and 3D chromatin maps.ChiTaRS 5.0：匹配可成药性融合和 3D 染色质图谱的嵌合转录本综合数据库。

Nucleic Acids Res. 2020 Jan 8;48(D1):D825-D834. doi: 10.1093/nar/gkz1025.

Gene Fusions Derived by Transcriptional Readthrough are Driven by Segmental Duplication in Human.人类转录通读导致的基因融合是由片段重复驱动的。

Genome Biol Evol. 2019 Sep 1;11(9):2678-2690. doi: 10.1093/gbe/evz163.

本文引用的文献

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.ENCODE试点项目对人类基因组1%的功能元件进行鉴定与分析。

Nature. 2007 Jun 14;447(7146):799-816. doi: 10.1038/nature05874.

Structured RNAs in the ENCODE selected regions of the human genome.人类基因组ENCODE选定区域中的结构化RNA

Genome Res. 2007 Jun;17(6):852-64. doi: 10.1101/gr.5650707.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution.ENCODE区域中的假基因：共识注释、转录分析及进化

Genome Res. 2007 Jun;17(6):839-51. doi: 10.1101/gr.5586307.

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome.对1%人类基因组的深度哺乳动物序列比对和约束预测分析。

Genome Res. 2007 Jun;17(6):760-74. doi: 10.1101/gr.6034307.

Biological function of unannotated transcription during the early development of Drosophila melanogaster.黑腹果蝇早期发育过程中未注释转录的生物学功能。

Nat Genet. 2006 Oct;38(10):1151-8. doi: 10.1038/ng1875. Epub 2006 Sep 3.

GENCODE: producing a reference annotation for ENCODE.GENCODE：为ENCODE生成参考注释。

Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7.

Complex Loci in human and mouse genomes.人类和小鼠基因组中的复杂基因座。

PLoS Genet. 2006 Apr;2(4):e47. doi: 10.1371/journal.pgen.0020047. Epub 2006 Apr 28.

Genome-wide analysis of mammalian promoter architecture and evolution.哺乳动物启动子结构与进化的全基因组分析。

Nat Genet. 2006 Jun;38(6):626-35. doi: 10.1038/ng1789. Epub 2006 Apr 28.

Evolutionary fate of retroposed gene copies in the human genome.人类基因组中反转录基因拷贝的进化命运。

Proc Natl Acad Sci U S A. 2006 Feb 28;103(9):3220-5. doi: 10.1073/pnas.0511307103. Epub 2006 Feb 21.

Quantitative microarray profiling provides evidence against widespread coupling of alternative splicing with nonsense-mediated mRNA decay to control gene expression.定量微阵列分析提供了证据，反驳了可变剪接与无义介导的mRNA降解广泛偶联以控制基因表达的观点。

Genes Dev. 2006 Jan 15;20(2):153-8. doi: 10.1101/gad.1382806.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验