The Key Laboratory of Bioactive Substances and Resources Utilization of Chinese Herbal Medicine, Ministry of Education, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100193, PR China.
BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S5. doi: 10.1186/1471-2164-12-S5-S5.
Panax notoginseng (Burk) F.H. Chen is important medicinal plant of the Araliacease family. Triterpene saponins are the bioactive constituents in P. notoginseng. However, available genomic information regarding this plant is limited. Moreover, details of triterpene saponin biosynthesis in the Panax species are largely unknown.
Using the 454 pyrosequencing technology, a one-quarter GS FLX titanium run resulted in 188,185 reads with an average length of 410 bases for P. notoginseng root. These reads were processed and assembled by 454 GS De Novo Assembler software into 30,852 unique sequences. A total of 70.2% of unique sequences were annotated by Basic Local Alignment Search Tool (BLAST) similarity searches against public sequence databases. The Kyoto Encyclopedia of Genes and Genomes (KEGG) assignment discovered 41 unique sequences representing 11 genes involved in triterpene saponin backbone biosynthesis in the 454-EST dataset. In particular, the transcript encoding dammarenediol synthase (DS), which is the first committed enzyme in the biosynthetic pathway of major triterpene saponins, is highly expressed in the root of four-year-old P. notoginseng. It is worth emphasizing that the candidate cytochrome P450 (Pn02132 and Pn00158) and UDP-glycosyltransferase (Pn00082) gene most likely to be involved in hydroxylation or glycosylation of aglycones for triterpene saponin biosynthesis were discovered from 174 cytochrome P450s and 242 glycosyltransferases by phylogenetic analysis, respectively. Putative transcription factors were detected in 906 unique sequences, including Myb, homeobox, WRKY, basic helix-loop-helix (bHLH), and other family proteins. Additionally, a total of 2,772 simple sequence repeat (SSR) were identified from 2,361 unique sequences, of which, di-nucleotide motifs were the most abundant motif.
This study is the first to present a large-scale EST dataset for P. notoginseng root acquired by next-generation sequencing (NGS) technology. The candidate genes involved in triterpene saponin biosynthesis, including the putative CYP450s and UGTs, were obtained in this study. Additionally, the identification of SSRs provided plenty of genetic makers for molecular breeding and genetics applications in this species. These data will provide information on gene discovery, transcriptional regulation and marker-assisted selection for P. notoginseng. The dataset establishes an important foundation for the study with the purpose of ensuring adequate drug resources for this species.
三七(Panax notoginseng)是五加科的重要药用植物。三萜皂苷是三七中的生物活性成分。然而,关于这种植物的可用基因组信息是有限的。此外,在 Panax 属中三萜皂苷生物合成的详细信息在很大程度上尚不清楚。
使用 454 焦磷酸测序技术,一个四分之一的 GS FLX 钛运行产生了 188185 条平均长度为 410 个碱基的三七根的读段。这些读段通过 454 GS De Novo 组装软件进行处理和组装,形成 30852 个独特的序列。通过与公共序列数据库的基本局部比对搜索工具(BLAST)相似性搜索,对 70.2%的独特序列进行了注释。京都基因与基因组百科全书(KEGG)分配发现,454-EST 数据集中有 41 个独特序列代表 11 个基因参与三萜皂苷骨干生物合成。特别是,编码达玛烯二醇合酶(DS)的转录本在四年生三七根中高度表达,DS 是主要三萜皂苷生物合成途径中的第一个关键酶。值得强调的是,通过系统发育分析分别从 174 个细胞色素 P450 和 242 个糖基转移酶中发现了候选细胞色素 P450(Pn02132 和 Pn00158)和 UDP-糖基转移酶(Pn00082)基因,这些基因最有可能参与三萜皂苷生物合成中苷元的羟化或糖基化。在 906 个独特序列中检测到了假定的转录因子,包括 Myb、同源盒、WRKY、碱性螺旋-环-螺旋(bHLH)和其他家族蛋白。此外,从 2361 个独特序列中总共鉴定出 2772 个简单重复序列(SSR),其中二核苷酸基序是最丰富的基序。
本研究首次通过下一代测序(NGS)技术获得了大规模的三七根 EST 数据集。在本研究中获得了参与三萜皂苷生物合成的候选基因,包括假定的 CYP450s 和 UGTs。此外,SSR 的鉴定为该物种的分子育种和遗传学应用提供了大量的遗传标记。这些数据将为三七的基因发现、转录调控和标记辅助选择提供信息。该数据集为该物种的研究奠定了重要基础,旨在确保该物种有足够的药物资源。