Institute of Environmental Sciences, Jagiellonian University, 30-387 Krakow, Poland.
BMC Genomics. 2010 Jun 21;11:390. doi: 10.1186/1471-2164-11-390.
BACKGROUND: Understanding the genetic basis of adaptive changes has been a major goal of evolutionary biology. In complex organisms without sequenced genomes, de novo transcriptome assembly using a longer read sequencing technology followed by expression profiling using short reads is likely to provide comprehensive identification of adaptive variation at the expression level and sequence polymorphisms in coding regions. We performed sequencing and de novo assembly of the bank vole heart transcriptome in lines selected for high metabolism and unselected controls. RESULTS: A single 454 Titanium run produced over million reads, which were assembled into 63,581 contigs. Searches against the SwissProt protein database and the ENSEMBL collection of mouse transcripts detected similarity to 11,181 and 14,051 genes, respectively. As judged by the representation of genes from the heart-related Gene Ontology categories and UniGenes detected in the mouse heart, our detection of the genes expressed in the heart was nearly complete (> 95% and almost 90% respectively). On average, 38.7% of the transcript length was covered by our sequences, with notably higher (45.0%) coverage of coding regions than of untranslated regions (24.5% of 5' and 32.7% of 3'UTRs). Lower sequence conservation between mouse and bank vole in untranslated regions was found to be partially responsible for poorer UTR representation. Our data might suggest a widespread transcription from noncoding genomic regions, a finding not reported in previous studies regarding transcriptomes in non-model organisms. We also identified over 19 thousand putative single nucleotide polymorphisms (SNPs). A much higher fraction of the SNPs than expected by chance exhibited variant frequency differences between selection regimes. CONCLUSION: Longer reads and higher sequence yield per run provided by the 454 Titanium technology in comparison to earlier generations of pyrosequencing proved beneficial for the quality of assembly. An almost full representation of genes known to be expressed in the mouse heart was identified. Usage of the extensive genomic resources available for the house mouse, a moderately (20-40 mln years) divergent relative of the voles, enabled a comprehensive assessment of the transcript completeness. Transcript sequences generated in the present study allowed the identification of candidate SNPs associated with divergence of selection lines and constitute a valuable permanent resource forming a foundation for RNAseq experiments aiming at detection of adaptive changes both at the level of gene expression and sequence variants, that would facilitate studies of the genetic basis of evolutionary divergence.
背景:理解适应性变化的遗传基础一直是进化生物学的主要目标。在没有测序基因组的复杂生物中,使用较长读测序技术进行从头转录组组装,然后使用短读进行表达谱分析,可能会全面识别表达水平的适应性变异和编码区的序列多态性。我们对代谢水平高的选择线和未选择的对照线的田鼠心脏进行了测序和从头组装。
结果:单个 454 Titanium 运行产生了超过百万的读长,这些读长被组装成 63581 个 contigs。与 SwissProt 蛋白质数据库和 ENSEMBL 收集的小鼠转录本进行搜索,分别检测到与 11181 和 14051 个基因相似。根据心脏相关基因本体论类别和在小鼠心脏中检测到的 UniGenes 代表的基因判断,我们检测到心脏表达的基因几乎是完整的(分别超过 95%和近 90%)。我们的序列平均覆盖了 38.7%的转录本长度,编码区的覆盖度明显高于非翻译区(5'区的 45.0%和 3'UTR 的 32.7%)。在非翻译区中,鼠标和田鼠之间的序列保守性较低,这部分导致了 UTR 表示较差。我们的数据可能表明广泛的转录来自非编码基因组区域,这是以前关于非模型生物转录组的研究中没有报道的发现。我们还鉴定了超过 19000 个可能的单核苷酸多态性(SNP)。与早期焦磷酸测序相比,454 Titanium 技术提供的更长的读长和每个运行更高的序列产量证明对组装质量有益。鉴定出几乎完整的已知在小鼠心脏中表达的基因。利用家鼠(与田鼠的亲缘关系中度(20-40 百万年)分化)可获得的广泛基因组资源,全面评估了转录本的完整性。本研究中生成的转录本序列允许鉴定与选择线分歧相关的候选 SNP,并构成一个有价值的永久资源,为旨在检测基因表达水平和序列变异的适应性变化的 RNAseq 实验奠定基础,这将有助于研究进化分歧的遗传基础。
BMC Genomics. 2011-4-20
PLoS One. 2017-9-20
Mol Cell Proteomics. 2016-4
BMC Genomics. 2013-10-11
J Comp Physiol B. 2012-7-31
BMC Res Notes. 2011-8-25
BMC Bioinformatics. 2010-3-15
Nat Rev Genet. 2009-12-8
Nat Biotechnol. 2009-11-6
Trends Genet. 2009-10-2