Suppr超能文献

利用全长cDNA发现的拟南芥基因和基因组特征。

Features of Arabidopsis genes and genome discovered using full-length cDNAs.

作者信息

Alexandrov Nickolai N, Troukhan Maxim E, Brover Vyacheslav V, Tatarinova Tatiana, Flavell Richard B, Feldmann Kenneth A

机构信息

Ceres Inc., 1535 Rancho Conejo Blvd., Thousand Oaks, CA 91320, USA.

出版信息

Plant Mol Biol. 2006 Jan;60(1):69-85. doi: 10.1007/s11103-005-2564-9.

Abstract

Arabidopsis is currently the reference genome for higher plants. A new, more detailed statistical analysis of Arabidopsis gene structure is presented including intron and exon lengths, intergenic distances, features of promoters, and variant 5'-ends of mRNAs transcribed from the same transcription unit. We also provide a statistical characterization of Arabidopsis transcripts in terms of their size, UTR lengths, 3'-end cleavage sites, splicing variants, and coding potential. These analyses were facilitated by scrutiny of our collection of sequenced full-length cDNAs and much larger collection of 5'-ESTs, together with another set of full-length cDNAs from Salk/Stanford/Plant Gene Expression Center/RIKEN. Examples of alternative splicing are observed for transcripts from 7% of the genes and many of these genes display multiple spliced isoforms. Most splicing variants lie in non-coding regions of the transcripts. Non-canonical splice sites constitute less than 1% of all splice sites. Genes with fewer than four introns display reduced average mRNA levels. Putative alternative transcription start sites were observed in 30% of highly expressed genes and in more than 50% of the genes with low expression. Transcription start sites correlate remarkably well with a CG skew peak in the DNA sequences. The intergenic distances vary considerably, those where genes are transcribed towards one another being significantly shorter. New transcripts, missing in the current TIGR genome annotation and ESTs that are non-coding, including those antisense to known genes, are derived and cataloged in the Supplementary Material. They identify 148 new loci in the Arabidopsis genome. The conclusions drawn provide a better understanding of the Arabidopsis genome and how the gene transcripts are processed. The results also allow better predictions to be made for, as yet, poorly defined genes and provide a reference for comparisons with other plant genomes whose complete sequences are currently being determined. Some comparisons with rice are included in this paper.

摘要

拟南芥目前是高等植物的参考基因组。本文提出了一种新的、更详细的拟南芥基因结构统计分析方法,包括内含子和外显子长度、基因间距离、启动子特征以及来自同一转录单元转录的mRNA的可变5'端。我们还根据转录本的大小、非翻译区长度、3'端切割位点、剪接变体和编码潜力对拟南芥转录本进行了统计表征。对我们收集的已测序全长cDNA和大量5'-ESTs,以及来自索尔克/斯坦福/植物基因表达中心/理化学研究所的另一组全长cDNA进行仔细研究,有助于这些分析。观察到7%的基因转录本存在可变剪接的例子,其中许多基因表现出多种剪接异构体。大多数剪接变体位于转录本的非编码区。非规范剪接位点占所有剪接位点的比例不到1%。内含子少于四个的基因平均mRNA水平较低。在30%的高表达基因和超过50%的低表达基因中观察到推定的可变转录起始位点。转录起始位点与DNA序列中的CG偏斜峰显著相关。基因间距离差异很大,那些基因相互转录的区域明显更短。在补充材料中推导并编目了当前TIGR基因组注释中缺失的新转录本以及非编码ESTs,包括那些与已知基因反义的ESTs。它们在拟南芥基因组中鉴定出148个新位点。所得出的结论有助于更好地理解拟南芥基因组以及基因转录本是如何加工的。这些结果还能对尚未明确的基因做出更好的预测,并为与目前正在测定完整序列的其他植物基因组进行比较提供参考。本文还包括了与水稻的一些比较。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验