The European Molecular Biology Laboratory, The European Bioinformatics Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.
Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA.
Nucleic Acids Res. 2018 Jan 4;46(D1):D802-D808. doi: 10.1093/nar/gkx1011.
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.
Ensembl Genomes(http://www.ensemblgenomes.org)是一个整合了非脊椎动物物种基因组规模数据的资源,与 Ensembl 项目(http://www.ensembl.org)开发的脊椎动物基因组学资源相辅相成。这两个资源共同提供了一组一致的编程和交互式接口,可访问包括基因组序列、基因模型、转录序列、遗传变异和比较分析在内的丰富数据。本文对上一篇关于该资源的论文进行了更新,重点介绍了最近的发展和扩展。其中包括整合了近 20000 个额外的基因组序列和超过 35000 个 RNA-Seq 数据的轨道,这些数据已经与基因组序列对齐并可供可视化。自 2015 年以来的其他进展包括以资源描述框架(RDF)格式发布数据库、社区主导的注释大幅增加、新的高性能蛋白质序列搜索、更多交叉引用、非蛋白质编码基因的注释改进,以及预发布和档案站点的推出。这些变化共同构成了对不断增加的公开可用基因组规模数据的持续响应的一部分,因此需要使用自动化、可扩展的方法来存档、整合、注释和传播这些数据。