Department of Botany, University of Wyoming, Laramie, WY 82071, USA.
BMC Genomics. 2010 Mar 16;11:180. doi: 10.1186/1471-2164-11-180.
Massively parallel sequencing of cDNA is now an efficient route for generating enormous sequence collections that represent expressed genes. This approach provides a valuable starting point for characterizing functional genetic variation in non-model organisms, especially where whole genome sequencing efforts are currently cost and time prohibitive. The large and complex genomes of pines (Pinus spp.) have hindered the development of genomic resources, despite the ecological and economical importance of the group. While most genomic studies have focused on a single species (P. taeda), genomic level resources for other pines are insufficiently developed to facilitate ecological genomic research. Lodgepole pine (P. contorta) is an ecologically important foundation species of montane forest ecosystems and exhibits substantial adaptive variation across its range in western North America. Here we describe a sequencing study of expressed genes from P. contorta, including their assembly and annotation, and their potential for molecular marker development to support population and association genetic studies.
We obtained 586,732 sequencing reads from a 454 GS XLR70 Titanium pyrosequencer (mean length: 306 base pairs). A combination of reference-based and de novo assemblies yielded 63,657 contigs, with 239,793 reads remaining as singletons. Based on sequence similarity with known proteins, these sequences represent approximately 17,000 unique genes, many of which are well covered by contig sequences. This sequence collection also included a surprisingly large number of retrotransposon sequences, suggesting that they are highly transcriptionally active in the tissues we sampled. We located and characterized thousands of simple sequence repeats and single nucleotide polymorphisms as potential molecular markers in our assembled and annotated sequences. High quality PCR primers were designed for a substantial number of the SSR loci, and a large number of these were amplified successfully in initial screening.
This sequence collection represents a major genomic resource for P. contorta, and the large number of genetic markers characterized should contribute to future research in this and other pines. Our results illustrate the utility of next generation sequencing as a basis for marker development and population genomics in non-model species.
cDNA 的大规模平行测序现在是生成代表表达基因的巨大序列集合的有效途径。这种方法为非模式生物的功能遗传变异提供了一个有价值的起点,特别是在全基因组测序目前成本和时间都不可行的情况下。松树(Pinus spp.)的基因组庞大而复杂,尽管该群体具有生态和经济重要性,但仍阻碍了基因组资源的发展。虽然大多数基因组研究都集中在一个物种(P. taeda)上,但其他松树的基因组水平资源还不够发达,无法促进生态基因组研究。辐射松(P. contorta)是山地森林生态系统的重要基础物种,在其分布范围内表现出显著的适应性变化。在这里,我们描述了对辐射松表达基因的测序研究,包括它们的组装和注释,以及它们在支持种群和关联遗传研究方面发展分子标记的潜力。
我们从 454 GS XLR70 Titanium 焦磷酸测序仪获得了 586,732 条测序reads(平均长度:306 个碱基对)。基于参考序列和从头组装的组合产生了 63,657 个 contigs,有 239,793 条 reads仍然是单序列。根据与已知蛋白质的序列相似性,这些序列代表了大约 17000 个独特的基因,其中许多基因的 contig 序列覆盖良好。这个序列集合还包含了大量的逆转录转座子序列,这表明它们在我们采样的组织中具有高度的转录活性。我们在组装和注释的序列中找到了并鉴定了数千个简单序列重复和单核苷酸多态性作为潜在的分子标记。为大量 SSR 基因座设计了高质量的 PCR 引物,并且在初始筛选中成功扩增了其中的许多基因座。
这个序列集合代表了辐射松的一个主要基因组资源,并且所鉴定的大量遗传标记应该有助于该物种和其他松树的未来研究。我们的结果说明了下一代测序作为非模式物种标记开发和种群基因组学的基础的实用性。