Suppr超能文献

比较描述十种新测序的无脊椎动物转录组和非模式类群基因组采样效率估计。

Comparative description of ten transcriptomes of newly sequenced invertebrates and efficiency estimation of genomic sampling in non-model taxa.

机构信息

Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 02138, USA.

出版信息

Front Zool. 2012 Nov 29;9(1):33. doi: 10.1186/1742-9994-9-33.

Abstract

INTRODUCTION

Traditionally, genomic or transcriptomic data have been restricted to a few model or emerging model organisms, and to a handful of species of medical and/or environmental importance. Next-generation sequencing techniques have the capability of yielding massive amounts of gene sequence data for virtually any species at a modest cost. Here we provide a comparative analysis of de novo assembled transcriptomic data for ten non-model species of previously understudied animal taxa.

RESULTS

cDNA libraries of ten species belonging to five animal phyla (2 Annelida [including Sipuncula], 2 Arthropoda, 2 Mollusca, 2 Nemertea, and 2 Porifera) were sequenced in different batches with an Illumina Genome Analyzer II (read length 100 or 150 bp), rendering between ca. 25 and 52 million reads per species. Read thinning, trimming, and de novo assembly were performed under different parameters to optimize output. Between 67,423 and 207,559 contigs were obtained across the ten species, post-optimization. Of those, 9,069 to 25,681 contigs retrieved blast hits against the NCBI non-redundant database, and approximately 50% of these were assigned with Gene Ontology terms, covering all major categories, and with similar percentages in all species. Local blasts against our datasets, using selected genes from major signaling pathways and housekeeping genes, revealed high efficiency in gene recovery compared to available genomes of closely related species. Intriguingly, our transcriptomic datasets detected multiple paralogues in all phyla and in nearly all gene pathways, including housekeeping genes that are traditionally used in phylogenetic applications for their purported single-copy nature.

CONCLUSIONS

We generated the first study of comparative transcriptomics across multiple animal phyla (comparing two species per phylum in most cases), established the first Illumina-based transcriptomic datasets for sponge, nemertean, and sipunculan species, and generated a tractable catalogue of annotated genes (or gene fragments) and protein families for ten newly sequenced non-model organisms, some of commercial importance (i.e., Octopus vulgaris). These comprehensive sets of genes can be readily used for phylogenetic analysis, gene expression profiling, developmental analysis, and can also be a powerful resource for gene discovery. The characterization of the transcriptomes of such a diverse array of animal species permitted the comparison of sequencing depth, functional annotation, and efficiency of genomic sampling using the same pipelines, which proved to be similar for all considered species. In addition, the datasets revealed their potential as a resource for paralogue detection, a recurrent concern in various aspects of biological inquiry, including phylogenetics, molecular evolution, development, and cellular biochemistry.

摘要

简介

传统上,基因组或转录组数据仅限于少数模式或新兴模式生物,以及少数具有医学和/或环境重要性的物种。下一代测序技术具有为几乎任何物种产生大量基因序列数据的能力,成本适中。在这里,我们对以前研究较少的十个动物门的非模式物种的从头组装转录组数据进行了比较分析。

结果

用 Illumina Genome Analyzer II(读长 100 或 150bp)对 10 个物种(2 个环节动物[包括星虫]、2 个节肢动物、2 个软体动物、2 个纽形动物和 2 个多孔动物)的 cDNA 文库进行了测序,每个物种产生约 25 到 5200 万条reads。为了优化输出,在不同的参数下进行了读薄、修剪和从头组装。经过优化后,十个物种共获得 67423 到 207559 个 contigs。其中,9069 到 25681 个 contigs与 NCBI 非冗余数据库有blast 命中,约 50%的 contigs被赋予 Gene Ontology 术语,涵盖所有主要类别,并且在所有物种中都有相似的百分比。使用主要信号通路和管家基因的选定基因对我们的数据集进行局部blast,与密切相关的物种的可用基因组相比,基因回收率高。有趣的是,我们的转录组数据集在所有门中都检测到了多个直系同源物,并且几乎所有基因途径中都检测到了直系同源物,包括传统上用于其假定单拷贝性质的系统发育应用的管家基因。

结论

我们首次对多个动物门进行了比较转录组学研究(在大多数情况下,每个门比较两个物种),建立了首个基于 Illumina 的海绵、纽形动物和星虫物种的转录组数据集,并生成了可用于十个新测序非模式生物的注释基因(或基因片段)和蛋白质家族的可管理目录,其中一些具有商业重要性(即,普通章鱼)。这些综合的基因集可用于系统发育分析、基因表达谱分析、发育分析,也可作为基因发现的强大资源。对如此多样化的动物物种的转录组进行特征描述,允许使用相同的管道比较测序深度、功能注释和基因组采样效率,事实证明,所有考虑的物种都是相似的。此外,这些数据集还揭示了它们作为检测直系同源物的资源的潜力,这在系统发育学、分子进化、发育和细胞生物化学等各个方面都是一个反复出现的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74c2/3538665/4be4ad2d7a5b/1742-9994-9-33-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验