Visser Erik A, Wegrzyn Jill L, Steenkmap Emma T, Myburg Alexander A, Naidoo Sanushka
Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private bag X20, Pretoria, 0028, South Africa.
Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, 06269, USA.
BMC Genomics. 2015 Dec 12;16:1057. doi: 10.1186/s12864-015-2277-7.
Pines are the most important tree species to the international forestry industry, covering 42 % of the global industrial forest plantation area. One of the most pressing threats to cultivation of some pine species is the pitch canker fungus, Fusarium circinatum, which can have devastating effects in both the field and nursery. Investigation of the Pinus-F. circinatum host-pathogen interaction is crucial for development of effective disease management strategies. As with many non-model organisms, investigation of host-pathogen interactions in pine species is hampered by limited genomic resources. This was partially alleviated through release of the 22 Gbp Pinus taeda v1.01 genome sequence ( http://pinegenome.org/pinerefseq/ ) in 2014. Despite the fact that the fragmented state of the genome may hamper comprehensive transcriptome analysis, it is possible to leverage the inherent redundancy resulting from deep RNA sequencing with Illumina short reads to assemble transcripts in the absence of a completed reference sequence. These data can then be integrated with available genomic data to produce a comprehensive transcriptome resource. The aim of this study was to provide a foundation for gene expression analysis of disease response mechanisms in Pinus patula through transcriptome assembly.
Eighteen de novo and two reference based assemblies were produced for P. patula shoot tissue. For this purpose three transcriptome assemblers, Trinity, Velvet/OASES and SOAPdenovo-Trans, were used to maximise diversity and completeness of assembled transcripts. Redundancy in the assembly was reduced using the EvidentialGene pipeline. The resulting 52 Mb P. patula v1.0 shoot transcriptome consists of 52 112 unigenes, 60 % of which could be functionally annotated.
The assembled transcriptome will serve as a major genomic resource for future investigation of P. patula and represents the largest gene catalogue produced to date for this species. Furthermore, this assembly can help detect gene-based genetic markers for P. patula and the comparative assembly workflow could be applied to generate similar resources for other non-model species.
松树是国际林业产业中最重要的树种,占全球工业人工林面积的42%。对某些松树品种种植最紧迫的威胁之一是松材溃疡病菌(Fusarium circinatum),它在田间和苗圃都可能产生毁灭性影响。研究松树与松材溃疡病菌的宿主 - 病原体相互作用对于制定有效的病害管理策略至关重要。与许多非模式生物一样,松树品种中宿主 - 病原体相互作用的研究因基因组资源有限而受阻。2014年发布的22 Gbp火炬松v1.01基因组序列(http://pinegenome.org/pinerefseq/ )在一定程度上缓解了这一问题。尽管基因组的碎片化状态可能会妨碍全面的转录组分析,但利用Illumina短读长进行深度RNA测序产生的固有冗余,在没有完整参考序列的情况下也有可能组装转录本。然后可以将这些数据与可用的基因组数据整合,以产生全面的转录组资源。本研究的目的是通过转录组组装为展叶松病害反应机制的基因表达分析提供基础。
针对展叶松嫩枝组织产生了18个从头组装和2个基于参考的组装。为此,使用了三种转录组组装工具,即Trinity、Velvet/OASES和SOAPdenovo-Trans,以最大限度地提高组装转录本的多样性和完整性。使用EvidentialGene管道减少了组装中的冗余。最终得到的52 Mb展叶松v1.0嫩枝转录组由52112个单基因组成,其中60%可以进行功能注释。
组装得到的转录组将作为未来展叶松研究的主要基因组资源,代表了该物种迄今为止产生的最大基因目录。此外,这种组装有助于检测展叶松基于基因的遗传标记,并且比较组装工作流程可应用于为其他非模式物种生成类似资源。