DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
Nucleic Acids Res. 2021 Jan 8;49(D1):D723-D733. doi: 10.1093/nar/gkaa983.
The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) is a manually curated, daily updated collection of genome projects and their metadata accumulated from around the world. The current version of the database includes over 1.17 million entries organized broadly into Studies (45 770), Organisms (387 382) or Biosamples (101 207), Sequencing Projects (355 364) and Analysis Projects (283 481). These four levels contain over 600 metadata fields, which includes 76 controlled vocabulary (CV) tables containing 3873 terms. GOLD provides an interactive web user interface for browsing and searching by a wide range of project and metadata fields. Users can enter details about their own projects in GOLD, which acts as a gatekeeper to ensure that metadata is accurately documented before submitting sequence information to the Integrated Microbial Genomes (IMG) system for analysis. In order to maintain a reference dataset for use by members of the scientific community, GOLD also imports projects from public repositories such as GenBank and SRA. The current status of the database, along with recent updates and improvements are described in this manuscript.
基因组在线数据库(GOLD)(https://gold.jgi.doe.gov/)是一个人工 curated 的、每日更新的基因组项目及其元数据集合,这些数据来自世界各地。该数据库的当前版本包含超过 117 万个条目,广泛组织为研究(45770)、生物(387382)或生物样本(101207)、测序项目(355364)和分析项目(283481)。这四个层次包含超过 600 个元数据字段,其中包含 76 个受控词汇表(CV),其中包含 3873 个术语。GOLD 提供了一个交互式 Web 用户界面,用于通过广泛的项目和元数据字段进行浏览和搜索。用户可以在 GOLD 中输入有关其自己项目的详细信息,GOLD 充当守门员,以确保在将序列信息提交到用于分析的集成微生物基因组(IMG)系统之前,元数据被准确记录。为了维护一个供科学界成员使用的参考数据集,GOLD 还从公共存储库(如 GenBank 和 SRA)导入项目。本文描述了数据库的当前状态以及最近的更新和改进。