Agricultural Biotechnology Research Center, Academia Sinica, Taipei, 11529, Taiwan.
Plant Cell Physiol. 2011 Sep;52(9):1501-14. doi: 10.1093/pcp/pcr097. Epub 2011 Jul 19.
Being one of the largest families in the angiosperms, Orchidaceae display a great biodiversity resulting from adaptation to diverse habitats. Genomic information on orchids is rather limited, despite their unique and interesting biological features, thus impeding advanced molecular research. Here we report a strategy to integrate sequence outputs of the moth orchid, Phalaenopsis aphrodite, from two high-throughput sequencing platform technologies, Roche 454 and Illumina/Solexa, in order to maximize assembly efficiency. Tissues collected for cDNA library preparation included a wide range of vegetative and reproductive tissues. We also designed an effective workflow for annotation and functional analysis. After assembly and trimming processes, 233,823 unique sequences were obtained. Among them, 42,590 contigs averaging 875 bp in length were annotated to protein-coding genes, of which 7,263 coding genes were found to be nearly full length. The sequence accuracy of the assembled contigs was validated to be as high as 99.9%. Genes with tissue-specific expression were also categorized by profiling analysis with RNA-Seq. Gene products targeted to specific subcellular localizations were identified by their annotations. We concluded that, with proper assembly to combine outputs of next-generation sequencing platforms, transcriptome information can be enriched in gene discovery, functional annotation and expression profiling of a non-model organism.
作为被子植物中最大的科之一,兰科植物通过适应不同的生境表现出丰富的生物多样性。尽管兰科植物具有独特而有趣的生物学特征,但它们的基因组信息却相当有限,这阻碍了先进的分子研究。在这里,我们报告了一种整合大彗星兜兰(Phalaenopsis aphrodite)两种高通量测序平台(罗氏 454 和 Illumina/Solexa)序列输出的策略,以最大限度地提高组装效率。用于 cDNA 文库制备的组织包括各种营养和生殖组织。我们还设计了一种有效的注释和功能分析工作流程。经过组装和修剪过程,获得了 233823 个独特序列。其中,42590 个平均长度为 875bp 的重叠群被注释为编码蛋白的基因,其中 7263 个编码基因被发现几乎是全长的。组装重叠群的序列准确性被验证高达 99.9%。通过 RNA-Seq 进行的表达谱分析还对具有组织特异性表达的基因进行了分类。我们得出结论,通过适当的组装来整合下一代测序平台的输出,可以丰富非模式生物的基因发现、功能注释和表达谱分析的转录组信息。