State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing 100871, China.
J Integr Plant Biol. 2013 Jul;55(7):576-85. doi: 10.1111/jipb.12066.
We assembled a total of 297,239 Gossypium hirsutum (Gh, a tetraploid cotton, AADD) expressed sequence tag (EST) sequences that were available in the National Center for Biotechnology Information database, with reference to the recently published G. raimondii (Gr, a diploid cotton, DD) genome, and obtained 49,125 UniGenes. The average lengths of the UniGenes were increased from 804 and 791 bp in two previous EST assemblies to 1,019 bp in the current analysis. The number of putative cotton UniGenes with lengths of 3 kb or more increased from 25 or 34 to 1,223. As a result, thousands of originally independent G. hirsutum ESTs were aligned to produce large contigs encoding transcripts with very long open reading frames, indicating that the G. raimondii genome sequence provided remarkable advantages to assemble the tetraploid cotton transcriptome. Significant different distribution patterns within several GO terms, including transcription factor activity, were observed between D- and A-derived assemblies. Transcriptome analysis showed that, in a tetraploid cotton cell, 29,547 UniGenes were possibly derived from the D subgenome while another 19,578 may come from the A subgenome. Finally, some of the in silico data were confirmed by reverse transcription polymerase chain reaction experiments to show the changes in transcript levels for several gene families known to play key role in cotton fiber development. We believe that our work provides a useful platform for functional and evolutionary genomic studies in cotton.
我们共组装了 297239 条可在国家生物技术信息中心数据库中获得的陆地棉(Gh,四倍体棉花,AADD)表达序列标签(EST)序列,参考最近发表的雷蒙德氏棉(Gr,二倍体棉花,DD)基因组,得到了 49125 条 UniGene。UniGene 的平均长度从之前两个 EST 组装的 804 和 791bp 增加到当前分析的 1019bp。长度为 3kb 或更长的假定棉花 UniGene 的数量从 25 或 34 增加到 1223。结果,数千个原本独立的陆地棉 EST 被排列在一起,生成了编码具有非常长开放阅读框的转录本的大片段,这表明雷蒙德氏棉基因组序列为组装四倍体棉花转录组提供了显著优势。在几个 GO 术语中观察到 D-和 A-衍生组装之间的显著不同分布模式,包括转录因子活性。转录组分析表明,在四倍体棉花细胞中,29547 个 UniGene 可能来自 D 亚基因组,而另 19578 个可能来自 A 亚基因组。最后,通过反转录聚合酶链反应实验证实了一些计算机数据,以显示在几个已知在棉花纤维发育中起关键作用的基因家族中转录水平的变化。我们相信我们的工作为棉花的功能和进化基因组研究提供了一个有用的平台。