Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA.
BMC Genomics. 2011 Jan 25;12:61. doi: 10.1186/1471-2164-12-61.
Most evolutionary developmental biology ("evo-devo") studies of emerging model organisms focus on small numbers of candidate genes cloned individually using degenerate PCR. However, newly available sequencing technologies such as 454 pyrosequencing have recently begun to allow for massive gene discovery in animals without sequenced genomes. Within insects, although large volumes of sequence data are available for holometabolous insects, developmental studies of basally branching hemimetabolous insects typically suffer from low rates of gene discovery.
We used 454 pyrosequencing to sequence over 500 million bases of cDNA from the ovaries and embryos of the milkweed bug Oncopeltus fasciatus, which lacks a sequenced genome. This indirectly developing insect occupies an important phylogenetic position, branching basal to Diptera (including fruit flies) and Hymenoptera (including honeybees), and is an experimentally tractable model for short-germ development. 2,087,410 reads from both normalized and non-normalized cDNA assembled into 21,097 sequences (isotigs) and 112,531 singletons. The assembled sequences fell into 16,617 unique gene models, and included predictions of splicing isoforms, which we examined experimentally. Discovery of new genes plateaued after assembly of ~1.5 million reads, suggesting that we have sequenced nearly all transcripts present in the cDNA sampled. Many transcripts have been assembled at close to full length, and there is a net gain of sequence data for over half of the pre-existing O. fasciatus accessions for developmental genes in GenBank. We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes. We also specifically address the effects of cDNA normalization on gene discovery in de novo transcriptome analyses.
Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome. These data will have applications to the study of the evolution of arthropod genes and genetic pathways, and to the wider evolution, development and genomics communities working with emerging model organisms.[The sequence data from this study have been submitted to GenBank under study accession number SRP002610 (http://www.ncbi.nlm.nih.gov/sra?term=SRP002610). Custom scripts generated are available at http://www.extavourlab.com/protocols/index.html. Seven Additional files are available.].
大多数新兴模式生物的进化发育生物学(“evo-devo”)研究都集中在使用简并 PCR 逐个克隆的少数候选基因上。然而,新出现的测序技术,如 454 焦磷酸测序,最近已经开始允许在没有基因组测序的动物中进行大规模的基因发现。在昆虫中,尽管有大量的同源变态昆虫的序列数据,但基础分支的半变态昆虫的发育研究通常发现基因的速度较低。
我们使用 454 焦磷酸测序技术从无基因组的乳草虫 Oncopeltus fasciatus 的卵巢和胚胎中测序了超过 5 亿个碱基对的 cDNA。这种间接发育的昆虫占据了一个重要的进化位置,分支在双翅目(包括果蝇)和膜翅目(包括蜜蜂)之前,是一种实验上可处理的短胚发育模型。从标准化和非标准化 cDNA 中获得的 2087410 个读数组装成 21097 个序列(isotigs)和 112531 个单序列。组装的序列分为 16617 个独特的基因模型,包括我们实验检查的剪接异构体的预测。在组装约 150 万个读数后,新基因的发现达到了一个平台期,这表明我们已经对 cDNA 取样中存在的几乎所有转录本进行了测序。许多转录本已经接近全长组装,并且在 GenBank 中,超过一半的现有的 O. fasciatus 发育基因的序列数据都有净增加。我们鉴定了 10775 个独特的基因,包括所有主要的保守后生动物信号通路的成员,以及参与几个主要早期发育过程类别的基因。我们还特别解决了 cDNA 标准化对从头转录组分析中基因发现的影响。
我们的测序、组装和注释框架为缺乏基因组测序的生物体提供了一种简单有效的高通量基因发现方法。这些数据将应用于节肢动物基因和遗传途径进化的研究,以及更广泛的进化、发育和基因组学领域,与新兴模式生物合作。[本研究的序列数据已根据研究访问号 SRP002610 提交给 GenBank(http://www.ncbi.nlm.nih.gov/sra?term=SRP002610)。生成的自定义脚本可在 http://www.extavourlab.com/protocols/index.html 上获得。有七个附加文件可用。]