Pavy Nathalie, Laroche Jérôme, Bousquet Jean, Mackay John
ARBOREA and Centre de Recherche en Biologie Forestiére, Universitè Laval, Pavillon Charles-Eugéne-Marchand, Sainte-Foy, Que., G1K 7P4, Canada.
Plant Mol Biol. 2005 Jan;57(2):203-24. doi: 10.1007/s11103-004-6969-7.
A computational analysis of pine transcripts was conducted to contribute to the functional annotation of conifer sequences. A statistical analysis of expressed sequential tags(ESTs) belonging the 7732 contigs in the TIGR Pinus Gene Index (PGI1.0) identified 260 differentially represented gene sequences across six cDNA libraries from loblolly pine secondary xylem. Cluster analysis of this subset of contigs resulted in five groups representing genes preferentially represented in one of the xylem samples (compression wood, plannings, root xylem, latewood) and one group containing mostly genes simultaneously present in compression and side wood libraries. To complement the sequence annotation, 27 cDNA clones representing selected transcripts were completely sequenced. Several genes were identified that could represent putative markers for xylem from different organs, at different stages of development. Several sequences encoding regulatory proteins were over-represented in root xylem as opposed to the other xylem samples. Some of them belonged to known families of plant transcription factors, but two genes were previously uncharacterized in plants. One transcript was homologous to the gene encoding the Smad4 interacting factor, a key co-activator in TGFbeta (transforming growth factor) signalling in animals. Thus, the digital analysis of pine ESTs highlighted a putative gene function of potentially broad interest but that has yet to be investigated in plants. More generally, this study showed that the application of numerical approaches to EST databases should be helpful in establishing priorities among genes to consider for targeted functional studies. Thus, we illustrated the potential of extracting information from conifer sequences already accessible through well-structured public databases.
对松树转录本进行了计算分析,以促进针叶树序列的功能注释。对属于TIGR松树基因索引(PGI1.0)中7732个重叠群的表达序列标签(EST)进行统计分析,确定了来自火炬松次生木质部的六个cDNA文库中260个差异表达的基因序列。对该重叠群子集进行聚类分析,得到五组,分别代表在一种木质部样本(受压木、计划木、根木质部、晚材)中优先表达的基因,以及一组主要包含同时存在于受压木和边材文库中的基因。为了补充序列注释,对代表选定转录本的27个cDNA克隆进行了全序列测定。鉴定出了几个基因,它们可能代表不同器官、不同发育阶段木质部的推定标记。与其他木质部样本相比,几个编码调控蛋白的序列在根木质部中过度表达。其中一些属于已知的植物转录因子家族,但有两个基因以前在植物中未被表征。一个转录本与编码Smad4相互作用因子的基因同源,Smad4相互作用因子是动物TGFβ(转化生长因子)信号传导中的关键共激活因子。因此,对松树EST的数字分析突出了一个可能具有广泛兴趣但尚未在植物中研究的推定基因功能。更一般地说,这项研究表明,将数值方法应用于EST数据库有助于在用于靶向功能研究的基因中确定优先次序。因此,我们展示了从通过结构良好的公共数据库已经可以获得的针叶树序列中提取信息的潜力。