Hale Matthew C, McCormick Cory R, Jackson James R, Dewoody J Andrew
Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN 47907, USA.
BMC Genomics. 2009 Apr 29;10:203. doi: 10.1186/1471-2164-10-203.
Next-generation sequencing technologies have been applied most often to model organisms or species closely related to a model. However, these methods have the potential to be valuable in many wild organisms, including those of conservation concern. We used Roche 454 pyrosequencing to characterize gene expression in polyploid lake sturgeon (Acipenser fulvescens) gonads.
Titration runs on a Roche 454 GS-FLX produced more than 47,000 sequencing reads. These reads represented 20,741 unique sequences that passed quality control (mean length = 186 bp). These were assembled into 1,831 contigs (mean contig depth = 4.1 sequences). Over 4,000 sequencing reads (approximately 19%) were assigned gene ontologies, mostly to protein, RNA, and ion binding. A total of 877 candidate SNPs were identified from > 50 different genes. We employed an analytical approach from theoretical ecology (rarefaction) to evaluate depth of sequencing coverage relative to gene discovery. We also considered the relative merits of normalized versus native cDNA libraries when using next-generation sequencing platforms. Not surprisingly, fewer genes from the normalized libraries were rRNA subunits. Rarefaction suggests that normalization has little influence on the efficiency of gene discovery, at least when working with thousands of reads from a single tissue type.
Our data indicate that titration runs on 454 sequencers can characterize thousands of expressed sequence tags which can be used to identify SNPs, gene ontologies, and levels of gene expression in species of conservation concern. We anticipate that rarefaction will be useful in evaluations of gene discovery and that next-generation sequencing technologies hold great potential for the study of other non-model organisms.
新一代测序技术最常应用于模式生物或与模式生物密切相关的物种。然而,这些方法在许多野生生物中也具有潜在价值,包括那些受保护关注的生物。我们使用罗氏454焦磷酸测序技术来表征多倍体湖鲟(Acipenser fulvescens)性腺中的基因表达。
在罗氏454 GS-FLX上进行的滴定运行产生了超过47,000条测序读数。这些读数代表了20,741个通过质量控制的独特序列(平均长度 = 186 bp)。这些序列被组装成1,831个重叠群(平均重叠群深度 = 4.1个序列)。超过4,000条测序读数(约19%)被赋予了基因本体,主要是蛋白质、RNA和离子结合相关的本体。从超过50个不同基因中总共鉴定出877个候选单核苷酸多态性(SNP)。我们采用了理论生态学中的一种分析方法(稀疏化)来评估相对于基因发现的测序覆盖深度。我们还考虑了使用新一代测序平台时标准化cDNA文库与天然cDNA文库的相对优点。不出所料,标准化文库中来自rRNA亚基的基因较少。稀疏化表明标准化对基因发现效率的影响很小,至少在处理来自单一组织类型的数千条读数时是这样。
我们的数据表明,在454测序仪上进行的滴定运行可以表征数千个表达序列标签,这些标签可用于识别SNP、基因本体以及受保护关注物种中的基因表达水平。我们预计稀疏化将有助于评估基因发现,并且新一代测序技术在研究其他非模式生物方面具有巨大潜力。