Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources/Institute of BioScience and Technology, College of Agriculture, Hainan University, Haikou, 570228, People's Republic of China.
Plant Mol Biol. 2011 Oct;77(3):299-308. doi: 10.1007/s11103-011-9811-z. Epub 2011 Aug 3.
Hevea brasiliensis, being the only source of commercial natural rubber, is an extremely economically important crop. In an effort to facilitate biological, biochemical and molecular research in rubber biosynthesis, here we report the use of next-generation massively parallel sequencing technologies and de novo transcriptome assembly to gain a comprehensive overview of the H. brasiliensis transcriptome. The sequencing output generated more than 12 million reads with an average length of 90 nt. In total 48,768 unigenes (mean size = 436 bp, median size = 328 bp) were assembled through de novo transcriptome assembly. Out of 13,807 H. brasiliensis cDNA sequences deposited in Genbank of the National Center for Biotechnology Information (NCBI) (as of Feb 2011), 11,746 sequences (84.5%) could be matched with the assembled unigenes through nucleotide BLAST. The assembled sequences were annotated with gene descriptions, Gene Ontology (GO) and Clusters of Orthologous Group (COG) terms. In all, 37,432 unigenes were successfully annotated, of which 24,545 (65.5%) aligned to Ricinus communis proteins. Furthermore, the annotated uingenes were functionally classified according to the GO, COG and Kyoto Encyclopedia of Genes and Genomes databases. Our data provides the most comprehensive sequence resource available for the study of rubber trees as well as demonstrates effective use of Illumina sequencing and de novo transcriptome assembly in a species lacking genomic information.
巴西橡胶树是唯一的商用天然橡胶来源,是一种极具经济重要性的作物。为了促进橡胶生物合成的生物学、生物化学和分子研究,我们在此报告使用新一代大规模平行测序技术和从头转录组组装,以全面了解巴西橡胶树的转录组。测序结果产生了超过 1200 万个平均长度为 90nt 的读长。总共组装了 48768 条 unigenes(平均大小为 436bp,中位数大小为 328bp)。在 NCBI(截至 2011 年 2 月)的 Genbank 中,已存入 13807 条巴西橡胶树 cDNA 序列,其中 11746 条(84.5%)序列通过核苷酸 BLAST 可与组装的 unigenes匹配。组装的序列用基因描述、GO 和 COG 术语进行注释。总共成功注释了 37432 条 unigenes,其中 24545 条(65.5%)与蓖麻蛋白对齐。此外,根据 GO、COG 和 KEGG 数据库对注释的 unigenes进行了功能分类。我们的数据为橡胶树的研究提供了最全面的序列资源,并展示了在缺乏基因组信息的物种中使用 Illumina 测序和从头转录组组装的有效性。