Suppr超能文献

ENCODE区域内未注释转录本的DART分类:将转录与已知和新基因座相关联。

The DART classification of unannotated transcription within the ENCODE regions: associating transcription with known and novel loci.

作者信息

Rozowsky Joel S, Newburger Daniel, Sayward Fred, Wu Jiaqian, Jordan Greg, Korbel Jan O, Nagalakshmi Ugrappa, Yang Jin, Zheng Deyou, Guigó Roderic, Gingeras Thomas R, Weissman Sherman, Miller Perry, Snyder Michael, Gerstein Mark B

机构信息

Molecular Biophysics and Biochemistry Department, Yale University, New Haven, Connecticut 06520-8114, USA.

出版信息

Genome Res. 2007 Jun;17(6):732-45. doi: 10.1101/gr.5696007.

Abstract

For the approximately 1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of "unannotated transcription." We use a number of disparate features to classify the 6988 novel TARs-array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that approximately 14% of the novel TARs can be associated with known genes, while approximately 21% can be clustered into approximately 200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.

摘要

在ENCODE区域约占人类基因组1%的区域中,用平铺微阵列鉴定出的转录活性区域(TAR)中只有大约一半对应于注释外显子。在此,我们对这大量的“未注释转录”进行分类。我们使用多种不同特征对6988个新型TAR进行分类——跨细胞系和条件的阵列表达谱、序列组成、系统发育谱(17个物种间同线保守性的有无)以及相对于基因的位置。在分类过程中,我们首先滤除序列组成异常以及可能由交叉杂交导致的TAR。然后,我们将其余一些TAR与具有相关表达谱的近端外显子联系起来。最后,我们根据相似的表达和系统发育谱将未分类的TAR聚类为假定的新基因座。为了概括我们的分类,我们构建了一个活性区域与工具数据库(DART.gersteinlab.org)。DART具有特殊功能,可快速处理和比较多组TAR及其异质性特征、跨版本同步以及与其他资源对接。总体而言,我们发现约14%的新型TAR可与已知基因相关联,而约21%可聚类为约200个新基因座。我们观察到与基因相关的TAR在形成结构RNA的潜力方面富集,并且许多新的TAR簇与附近的启动子相关联。为了对我们的分类进行基准测试,我们设计了一组实验来测试新型TAR的连通性。总体而言,我们发现46个测试连接中有18个通过RT-PCR得到验证,5个测序的PCR产物中有4个明确证实了连通性。

相似文献

7
9
Identification of a novel gene by whole human genome tiling array.全基因组平铺阵列鉴定新基因。
Gene. 2013 Mar 1;516(1):33-8. doi: 10.1016/j.gene.2012.11.076. Epub 2012 Dec 19.

引用本文的文献

1
Long noncoding RNAs in T lymphocytes.T淋巴细胞中的长链非编码RNA
J Leukoc Biol. 2016 Jan;99(1):31-44. doi: 10.1189/jlb.1RI0815-389R. Epub 2015 Nov 4.
2
Long non-coding RNAs in innate and adaptive immunity.天然免疫和适应性免疫中的长链非编码RNA
Virus Res. 2016 Jan 2;212:146-60. doi: 10.1016/j.virusres.2015.07.003. Epub 2015 Jul 9.
5
Pseudogene: lessons from PCR bias, identification and resurrection.假基因:来自 PCR 偏倚、鉴定和复活的教训。
Mol Biol Rep. 2011 Aug;38(6):3709-15. doi: 10.1007/s11033-010-0485-4. Epub 2010 Nov 30.
6
Annotating non-coding regions of the genome.注释基因组的非编码区域。
Nat Rev Genet. 2010 Aug;11(8):559-71. doi: 10.1038/nrg2814. Epub 2010 Jul 13.
7
Mapping accessible chromatin regions using Sono-Seq.使用超声测序法绘制可及染色质区域图谱。
Proc Natl Acad Sci U S A. 2009 Sep 1;106(35):14926-31. doi: 10.1073/pnas.0905443106. Epub 2009 Aug 18.
8
10
Identifying protein-coding genes in genomic sequences.在基因组序列中识别蛋白质编码基因。
Genome Biol. 2009;10(1):201. doi: 10.1186/gb-2009-10-1-201. Epub 2009 Jan 30.

本文引用的文献

5
GENCODE: producing a reference annotation for ENCODE.GENCODE:为ENCODE生成参考注释。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7.
6
A high-resolution map of transcription in the yeast genome.酵母基因组转录的高分辨率图谱。
Proc Natl Acad Sci U S A. 2006 Apr 4;103(14):5320-5. doi: 10.1073/pnas.0601091103. Epub 2006 Mar 28.
7
Ensembl 2006.Ensembl 2006。
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D556-61. doi: 10.1093/nar/gkj133.
9
The transcriptional landscape of the mammalian genome.哺乳动物基因组的转录图谱。
Science. 2005 Sep 2;309(5740):1559-63. doi: 10.1126/science.1112014.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验