Udall Joshua A, Swanson Jordan M, Haller Karl, Rapp Ryan A, Sparks Michael E, Hatfield Jamie, Yu Yeisoo, Wu Yingru, Dowd Caitriona, Arpat Aladdin B, Sickler Brad A, Wilkins Thea A, Guo Jin Ying, Chen Xiao Ya, Scheffler Jodi, Taliercio Earl, Turley Ricky, McFadden Helen, Payton Paxton, Klueva Natalya, Allen Randell, Zhang Deshui, Haigler Candace, Wilkerson Curtis, Suo Jinfeng, Schulze Stefan R, Pierce Margaret L, Essenberg Margaret, Kim Hyeran, Llewellyn Danny J, Dennis Elizabeth S, Kudrna David, Wing Rod, Paterson Andrew H, Soderlund Cari, Wendel Jonathan F
Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa 50011, USA.
Genome Res. 2006 Mar;16(3):441-50. doi: 10.1101/gr.4602906. Epub 2006 Feb 14.
Approximately 185,000 Gossypium EST sequences comprising >94,800,000 nucleotides were amassed from 30 cDNA libraries constructed from a variety of tissues and organs under a range of conditions, including drought stress and pathogen challenges. These libraries were derived from allopolyploid cotton (Gossypium hirsutum; A(T) and D(T) genomes) as well as its two diploid progenitors, Gossypium arboreum (A genome) and Gossypium raimondii (D genome). ESTs were assembled using the Program for Assembling and Viewing ESTs (PAVE), resulting in 22,030 contigs and 29,077 singletons (51,107 unigenes). Further comparisons among the singletons and contigs led to recognition of 33,665 exemplar sequences that represent a nonredundant set of putative Gossypium genes containing partial or full-length coding regions and usually one or two UTRs. The assembly, along with their UniProt BLASTX hits, GO annotation, and Pfam analysis results, are freely accessible as a public resource for cotton genomics. Because ESTs from diploid and allotetraploid Gossypium were combined in a single assembly, we were in many cases able to bioinformatically distinguish duplicated genes in allotetraploid cotton and assign them to either the A or D genome. The assembly and associated information provide a framework for future investigation of cotton functional and evolutionary genomics.
从30个cDNA文库中收集了约185,000个棉属EST序列,这些序列包含超过9480万个核苷酸。这些文库构建于一系列条件下的各种组织和器官,包括干旱胁迫和病原体挑战。这些文库来自异源四倍体棉花(陆地棉;A(T)和D(T)基因组)及其两个二倍体祖先,亚洲棉(A基因组)和雷蒙德氏棉(D基因组)。使用EST组装和查看程序(PAVE)对EST进行组装,得到22,030个重叠群和29,077个单拷贝序列(51,107个单基因)。对单拷贝序列和重叠群进行进一步比较,识别出33,665个代表性序列,这些序列代表了一组非冗余的推定棉属基因,包含部分或全长编码区以及通常一个或两个UTR。该组装结果及其UniProt BLASTX比对结果、GO注释和Pfam分析结果可作为棉花基因组学的公共资源免费获取。由于来自二倍体和异源四倍体棉属的EST被合并在一个组装中,在许多情况下,我们能够通过生物信息学方法区分异源四倍体棉花中的重复基因,并将它们分配到A或D基因组中。该组装结果及相关信息为未来棉花功能和进化基因组学的研究提供了框架。