Department of Biochemistry and Molecular Biology, University of Calgary, 3330 Hospital Drive NW, Calgary, Alberta T2N 4N1, Canada.
J Biotechnol. 2013 Jul 10;166(3):122-34. doi: 10.1016/j.jbiotec.2013.04.004. Epub 2013 Apr 16.
Plants produce a vast array of specialized metabolites, many of which are used as pharmaceuticals, flavors, fragrances, and other high-value fine chemicals. However, most of these compounds occur in non-model plants for which genomic sequence information is not yet available. The production of a large amount of nucleotide sequence data using next-generation technologies is now relatively fast and cost-effective, especially when using the latest Roche-454 and Illumina sequencers with enhanced base-calling accuracy. To investigate specialized metabolite biosynthesis in non-model plants we have established a data-mining framework, employing next-generation sequencing and computational algorithms, to construct and analyze the transcriptomes of 75 non-model plants that produce compounds of interest for biotechnological applications. After sequence assembly an extensive annotation approach was applied to assign functional information to over 800,000 putative transcripts. The annotation is based on direct searches against public databases, including RefSeq and InterPro. Gene Ontology (GO), Enzyme Commission (EC) annotations and associated Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway maps are also collected. As a proof-of-concept, the selection of biosynthetic gene candidates associated with six specialized metabolic pathways is described. A web-based BLAST server has been established to allow public access to assembled transcriptome databases for all 75 plant species of the PhytoMetaSyn Project (www.phytometasyn.ca).
植物产生了大量的特殊代谢产物,其中许多被用作药物、香料、香精和其他高价值的精细化学品。然而,这些化合物中的大多数存在于非模式植物中,这些植物还没有基因组序列信息。利用新一代技术产生大量核苷酸序列数据现在相对快速且具有成本效益,尤其是当使用最新的罗氏 454 和 Illumina 测序仪时,其碱基准确率更高。为了研究非模式植物中的特殊代谢物生物合成,我们建立了一个数据挖掘框架,采用下一代测序和计算算法,构建和分析 75 种产生生物技术应用化合物的非模式植物的转录组。在序列组装后,我们采用了广泛的注释方法,将功能信息分配给超过 80 万个假定的转录本。注释是基于对公共数据库(包括 RefSeq 和 InterPro)的直接搜索。还收集了基因本体论(GO)、酶委员会(EC)注释和相关的京都基因与基因组百科全书(KEGG)途径图。作为概念验证,描述了与六个特殊代谢途径相关的生物合成基因候选物的选择。建立了一个基于网络的 BLAST 服务器,允许公众访问 PhytoMetaSyn 项目的所有 75 种植物的组装转录组数据库(www.phytometasyn.ca)。