Hammesfahr Björn, Odronitz Florian, Hellkamp Marcel, Kollmar Martin
Abteilung NMR basierte Strukturbiologie, Max-Planck-Institut für Biophysikalische Chemie, Am Fassberg 11, D-37077 Göttingen, Germany.
BMC Res Notes. 2011 Sep 9;4:338. doi: 10.1186/1756-0500-4-338.
Nowadays, the sequencing of even the largest mammalian genomes has become a question of days with current next-generation sequencing methods. It comes as no surprise that dozens of genome assemblies are released per months now. Since the number of next-generation sequencing machines increases worldwide and new major sequencing plans are announced, a further increase in the speed of releasing genome assemblies is expected. Thus it becomes increasingly important to get an overview as well as detailed information about available sequenced genomes. The different sequencing and assembly methods have specific characteristics that need to be known to evaluate the various genome assemblies before performing subsequent analyses.
diArk has been developed to provide fast and easy access to all sequenced eukaryotic genomes worldwide. Currently, diArk 2.0 contains information about more than 880 species and more than 2350 genome assembly files. Many meta-data like sequencing and read-assembly methods, sequencing coverage, GC-content, extended lists of alternatively used scientific names and common species names, and various kinds of statistics are provided. To intuitively approach the data the web interface makes extensive usage of modern web techniques. A number of search modules and result views facilitate finding and judging the data of interest. Subscribing to the RSS feed is the easiest way to stay up-to-date with the latest genome data.
diArk 2.0 is the most up-to-date database of sequenced eukaryotic genomes compared to databases like GOLD, NCBI Genome, NHGRI, and ISC. It is different in that only those projects are stored for which genome assembly data or considerable amounts of cDNA data are available. Projects in planning stage or in the process of being sequenced are not included. The user can easily search through the provided data and directly access the genome assembly files of the sequenced genome of interest. diArk 2.0 is available at http://www.diark.org.
如今,使用当前的新一代测序方法,即使是最大的哺乳动物基因组测序也只需几天时间。现在每月发布数十个基因组组装结果也就不足为奇了。由于全球新一代测序仪的数量在增加,并且新的重大测序计划也在不断公布,预计基因组组装结果的发布速度将进一步提高。因此,全面了解可用的已测序基因组并获取详细信息变得越来越重要。不同的测序和组装方法具有特定的特征,在进行后续分析之前,需要了解这些特征才能评估各种基因组组装结果。
diArk旨在提供对全球所有已测序真核生物基因组的快速便捷访问。目前,diArk 2.0包含了超过880个物种和超过2350个基因组组装文件的信息。它提供了许多元数据,如测序和读段组装方法、测序覆盖度、GC含量、交替使用的学名和常见物种名的扩展列表以及各种统计数据。为了直观地处理数据,网络界面大量使用了现代网络技术。多个搜索模块和结果视图便于查找和判断感兴趣的数据。订阅RSS源是了解最新基因组数据的最简单方法。
与GOLD、NCBI基因组、NHGRI和ISC等数据库相比,diArk 2.0是最新的已测序真核生物基因组数据库。它的不同之处在于,只存储那些有基因组组装数据或大量cDNA数据的项目。处于规划阶段或正在测序过程中的项目不包括在内。用户可以轻松地在提供的数据中进行搜索,并直接访问感兴趣的已测序基因组的基因组组装文件。可通过http://www.diark.org访问diArk 2.0。