Suppr超能文献

由基因组序列草图支持的EST组装:莱茵衣藻转录组分析

EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome.

作者信息

Jain Monica, Shrager Jeff, Harris Elizabeth H, Halbrook Renee, Grossman Arthur R, Hauser Charles, Vallon Olivier

机构信息

The Carnegie Institution, Department of Plant Biology, Stanford, CA 94305, USA.

出版信息

Nucleic Acids Res. 2007;35(6):2074-83. doi: 10.1093/nar/gkm081. Epub 2007 Mar 13.

Abstract

Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12,063 ACEGs (assembly of contiguous ESTs based on genome) and generated 15,857 contigs of average length 934 nt. We predict that roughly 3000 of our contigs represent full-length transcripts. Compared to previous assemblies, ACEGs show extended contig length, increased accuracy and a reduction in redundancy. Because our assembly protocol also uses ESTs with no corresponding genomic sequences, it provides sequence information for genes interrupted by sequence gaps. Detailed analysis of randomly sampled ACEGs reveals several hundred putative cases of alternative splicing, many overlapping transcription units and new genes not identified by gene prediction algorithms. Our protocol, although developed for and tailored to the C. reinhardtii dataset, can be exploited by any eukaryotic genome project for which both a draft genome sequence and ESTs are available.

摘要

表达序列标签(EST)的聚类和组装构成了转录组大多数全基因组描述的基础。这种方法受到每个EST末端序列质量下降的限制,影响了序列聚类和组装。在这里,我们利用单细胞绿藻莱茵衣藻的现有基因组草图序列来指导聚类并纠正EST中的错误。我们已将所有可用的EST和cDNA序列分组为12,063个基于基因组的连续EST组装(ACEG),并生成了平均长度为934 nt的15,857个重叠群。我们预测,我们的重叠群中约有3000个代表全长转录本。与以前的组装相比,ACEG显示出更长的重叠群长度、更高的准确性和更低的冗余度。由于我们的组装方案还使用了没有相应基因组序列的EST,因此它为被序列间隙中断的基因提供了序列信息。对随机抽样的ACEG进行的详细分析揭示了数百个推定的可变剪接案例、许多重叠的转录单位以及基因预测算法未识别的新基因。我们的方案虽然是针对莱茵衣藻数据集开发并量身定制的,但任何拥有基因组草图序列和EST的真核生物基因组项目都可以利用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6feb/1874618/ed01e29b9c27/gkm081f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验