Jain Monica, Shrager Jeff, Harris Elizabeth H, Halbrook Renee, Grossman Arthur R, Hauser Charles, Vallon Olivier
The Carnegie Institution, Department of Plant Biology, Stanford, CA 94305, USA.
Nucleic Acids Res. 2007;35(6):2074-83. doi: 10.1093/nar/gkm081. Epub 2007 Mar 13.
Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12,063 ACEGs (assembly of contiguous ESTs based on genome) and generated 15,857 contigs of average length 934 nt. We predict that roughly 3000 of our contigs represent full-length transcripts. Compared to previous assemblies, ACEGs show extended contig length, increased accuracy and a reduction in redundancy. Because our assembly protocol also uses ESTs with no corresponding genomic sequences, it provides sequence information for genes interrupted by sequence gaps. Detailed analysis of randomly sampled ACEGs reveals several hundred putative cases of alternative splicing, many overlapping transcription units and new genes not identified by gene prediction algorithms. Our protocol, although developed for and tailored to the C. reinhardtii dataset, can be exploited by any eukaryotic genome project for which both a draft genome sequence and ESTs are available.
表达序列标签(EST)的聚类和组装构成了转录组大多数全基因组描述的基础。这种方法受到每个EST末端序列质量下降的限制,影响了序列聚类和组装。在这里,我们利用单细胞绿藻莱茵衣藻的现有基因组草图序列来指导聚类并纠正EST中的错误。我们已将所有可用的EST和cDNA序列分组为12,063个基于基因组的连续EST组装(ACEG),并生成了平均长度为934 nt的15,857个重叠群。我们预测,我们的重叠群中约有3000个代表全长转录本。与以前的组装相比,ACEG显示出更长的重叠群长度、更高的准确性和更低的冗余度。由于我们的组装方案还使用了没有相应基因组序列的EST,因此它为被序列间隙中断的基因提供了序列信息。对随机抽样的ACEG进行的详细分析揭示了数百个推定的可变剪接案例、许多重叠的转录单位以及基因预测算法未识别的新基因。我们的方案虽然是针对莱茵衣藻数据集开发并量身定制的,但任何拥有基因组草图序列和EST的真核生物基因组项目都可以利用。