da Silva Francisco Goes, Iandolino Alberto, Al-Kayal Fadi, Bohlmann Marlene C, Cushman Mary Ann, Lim Hyunju, Ergul Ali, Figueroa Rubi, Kabuloglu Elif K, Osborne Craig, Rowe Joan, Tattersall Elizabeth, Leslie Anna, Xu Jane, Baek Jongmin, Cramer Grant R, Cushman John C, Cook Douglas R
Department of Plant Pathology, University of California, Davis, 95616, USA.
Plant Physiol. 2005 Oct;139(2):574-97. doi: 10.1104/pp.105.065748.
We report the analysis and annotation of 146,075 expressed sequence tags from Vitis species. The majority of these sequences were derived from different cultivars of Vitis vinifera, comprising an estimated 25,746 unique contig and singleton sequences that survey transcription in various tissues and developmental stages and during biotic and abiotic stress. Putatively homologous proteins were identified for over 17,752 of the transcripts, with 1,962 transcripts further subdivided into one or more Gene Ontology categories. A simple structured vocabulary, with modules for plant genotype, plant development, and stress, was developed to describe the relationship between individual expressed sequence tags and cDNA libraries; the resulting vocabulary provides query terms to facilitate data mining within the context of a relational database. As a measure of the extent to which characterized metabolic pathways were encompassed by the data set, we searched for homologs of the enzymes leading from glycolysis, through the oxidative/nonoxidative pentose phosphate pathway, and into the general phenylpropanoid pathway. Homologs were identified for 65 of these 77 enzymes, with 86% of enzymatic steps represented by paralogous genes. Differentially expressed transcripts were identified by means of a stringent believability index cutoff of > or =98.4%. Correlation analysis and two-dimensional hierarchical clustering grouped these transcripts according to similarity of expression. In the broadest analysis, 665 differentially expressed transcripts were identified across 29 cDNA libraries, representing a range of developmental and stress conditions. The groupings revealed expected associations between plant developmental stages and tissue types, with the notable exception of abiotic stress treatments. A more focused analysis of flower and berry development identified 87 differentially expressed transcripts and provides the basis for a compendium that relates gene expression and annotation to previously characterized aspects of berry development and physiology. Comparison with published results for select genes, as well as correlation analysis between independent data sets, suggests that the inferred in silico patterns of expression are likely to be an accurate representation of transcript abundance for the conditions surveyed. Thus, the combined data set reveals the in silico expression patterns for hundreds of genes in V. vinifera, the majority of which have not been previously studied within this species.
我们报告了对葡萄属物种146,075个表达序列标签的分析和注释。这些序列大多来自酿酒葡萄的不同品种,包括估计25,746个独特的重叠群和单拷贝序列,这些序列对各种组织、发育阶段以及生物和非生物胁迫下的转录情况进行了调查。已为超过17,752个转录本鉴定出推定的同源蛋白,其中1,962个转录本进一步细分为一个或多个基因本体类别。开发了一个简单的结构化词汇表,包含植物基因型、植物发育和胁迫等模块,以描述单个表达序列标签与cDNA文库之间的关系;所得词汇表提供了查询词,便于在关系数据库的背景下进行数据挖掘。作为衡量数据集涵盖特定代谢途径程度的一种方法,我们搜索了从糖酵解、通过氧化/非氧化戊糖磷酸途径以及进入一般苯丙烷类途径的酶的同源物。在这77种酶中,鉴定出了65种的同源物,86%的酶促步骤由旁系同源基因代表。通过严格的可信度指数阈值≥98.4%鉴定出差异表达的转录本。相关性分析和二维层次聚类根据表达相似性对这些转录本进行了分组。在最广泛的分析中,在29个cDNA文库中鉴定出665个差异表达的转录本,代表了一系列发育和胁迫条件。这些分组揭示了植物发育阶段与组织类型之间预期的关联,但非生物胁迫处理除外。对花和浆果发育进行的更有针对性的分析鉴定出87个差异表达的转录本,并为将基因表达和注释与浆果发育和生理学的先前特征方面相关联的纲要提供了基础。与选定基因的已发表结果进行比较,以及独立数据集之间的相关性分析表明,推断的计算机模拟表达模式可能准确反映了所调查条件下的转录本丰度。因此,合并后的数据集揭示了酿酒葡萄中数百个基因的计算机模拟表达模式,其中大多数基因此前尚未在该物种中进行过研究。