College of Agriculture & Related Sciences, Delaware State University, Dover, DE 19901, USA.
BMC Plant Biol. 2011 Oct 11;11:135. doi: 10.1186/1471-2229-11-135.
Common bean (Phaseolus vulgaris) is the most important food legume in the world. Although this crop is very important to both the developed and developing world as a means of dietary protein supply, resources available in common bean are limited. Global transcriptome analysis is important to better understand gene expression, genetic variation, and gene structure annotation in addition to other important features. However, the number and description of common bean sequences are very limited, which greatly inhibits genome and transcriptome research. Here we used 454 pyrosequencing to obtain a substantial transcriptome dataset for common bean.
We obtained 1,692,972 reads with an average read length of 207 nucleotides (nt). These reads were assembled into 59,295 unigenes including 39,572 contigs and 19,723 singletons, in addition to 35,328 singletons less than 100 bp. Comparing the unigenes to common bean ESTs deposited in GenBank, we found that 53.40% or 31,664 of these unigenes had no matches to this dataset and can be considered as new common bean transcripts. Functional annotation of the unigenes carried out by Gene Ontology assignments from hits to Arabidopsis and soybean indicated coverage of a broad range of GO categories. The common bean unigenes were also compared to the bean bacterial artificial chromosome (BAC) end sequences, and a total of 21% of the unigenes (12,724) including 9,199 contigs and 3,256 singletons match to the 8,823 BAC-end sequences. In addition, a large number of simple sequence repeats (SSRs) and transcription factors were also identified in this study.
This work provides the first large scale identification of the common bean transcriptome derived by 454 pyrosequencing. This research has resulted in a 150% increase in the number of Phaseolus vulgaris ESTs. The dataset obtained through this analysis will provide a platform for functional genomics in common bean and related legumes and will aid in the development of molecular markers that can be used for tagging genes of interest. Additionally, these sequences will provide a means for better annotation of the on-going common bean whole genome sequencing.
普通菜豆(Phaseolus vulgaris)是世界上最重要的食用豆类。尽管这种作物作为膳食蛋白质供应的一种手段对发达国家和发展中国家都非常重要,但普通菜豆的资源是有限的。全球转录组分析对于更好地了解基因表达、遗传变异和基因结构注释以及其他重要特征非常重要。然而,普通菜豆的序列数量和描述非常有限,这极大地抑制了基因组和转录组的研究。在这里,我们使用 454 焦磷酸测序技术获得了普通菜豆大量的转录组数据集。
我们获得了 1,692,972 条平均长度为 207 个核苷酸(nt)的reads。这些reads 被组装成 59,295 个unigenes,包括 39,572 个 contigs 和 19,723 个 singletons,此外还有 35,328 个小于 100 bp 的 singletons。将 unigenes 与 GenBank 中已存入的普通菜豆 ESTs 进行比较,我们发现其中 53.40%(31,664 个)没有与该数据集匹配,可以被认为是新的普通菜豆转录本。通过将 unigenes 与 Arabidopsis 和大豆的基因本体(GO)分配命中进行功能注释,表明这些 unigenes 涵盖了广泛的 GO 类别。普通菜豆的 unigenes 也与豆科细菌人工染色体(BAC)末端序列进行了比较,共有 21%(12,724 个)的 unigenes,包括 9,199 个 contigs 和 3,256 个 singletons,与 8,823 个 BAC 末端序列匹配。此外,在这项研究中还鉴定了大量的简单序列重复(SSR)和转录因子。
这项工作提供了第一个通过 454 焦磷酸测序技术对普通菜豆转录组进行的大规模鉴定。这项研究使普通菜豆 ESTs 的数量增加了 150%。通过这项分析获得的数据集将为普通菜豆和相关豆科植物的功能基因组学提供一个平台,并有助于开发可用于标记感兴趣基因的分子标记。此外,这些序列将为正在进行的普通菜豆全基因组测序的更好注释提供一种手段。