Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA.
Database (Oxford). 2012 Nov 23;2012:bas048. doi: 10.1093/database/bas048. Print 2012.
The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental biology, physiology, evolutionary biology, ecology, comparative genomics and phylogenomics. Database URL: asgard.rc.fas.harvard.edu.
下一代测序(NGS)的高通量和低成本已经将基因组研究的瓶颈从测序转移到了注释、分析和可及性上。对于那些缺乏已测序基因组的基本基础设施的研究社区,或者对于那些没有有效利用可用序列数据的方法的研究社区来说,这尤其具有挑战性。在这里,我们提出了一个新的数据库,即组装可搜索的巨型节肢动物阅读数据库(ASGARD)。这个数据库是一个用于储存和搜索对多个研究社区具有高度兴趣的节肢动物转录组数据的资源库和搜索引擎,但这些节肢动物目前缺乏已测序的基因组。我们使用乳草盲蝽 Oncopeltus fasciatus、蟋蟀 Gryllus bimaculatus 和片脚类甲壳动物 Parhyale hawaiensis 的从头组装转录组演示了 ASGARD 的功能和实用性。我们对这些转录组进行了注释,以将假定的直系同源物、编码区确定、蛋白结构域识别和基因本体论(GO)术语注释分配给所有可能的组装产物。ASGARD 允许用户通过直系同源物注释、GO 术语注释或基本局部比对搜索工具(BLAST)来搜索所有的组装产物。ASGARD 的用户友好功能包括基于数据库内容的搜索词自动完成建议、以 FASTA 格式下载组装产物序列的能力、直接链接到预测直系同源物的 NCBI 数据以及蛋白质结构域位置和与 NCBI 非冗余数据库中相似序列的匹配的图形表示。ASGARD 将成为未来对这些和其他新兴模式节肢动物进行 NGS 研究的转录组数据的有用资源库,无论测序平台、组装或注释状态如何。这个数据库为多物种注释转录组信息提供了简单的一站式访问途径。我们预计这个数据库将对多个研究社区的成员有用,包括发育生物学、生理学、进化生物学、生态学、比较基因组学和系统发育基因组学。数据库网址:asgard.rc.fas.harvard.edu。