Hartl Chris, Zhuang Jiali, Tyler Aaron, Zhou Bing, Wong Emily, Merberg David, Farrell Brad, DeBoever Chris, Bryant Julie, Diogo Dorothée
Rancho BioSciences LLC, San Diego, California, USA.
Genetics and Systems Biology, Takeda Development Center Americas, Inc, San Diego, CA, 92121, USA.
Epigenetics Chromatin. 2024 Jul 16;17(1):21. doi: 10.1186/s13072-024-00545-7.
Cis-regulatory elements (CREs) play a pivotal role in gene expression regulation, allowing cells to serve diverse functions and respond to external stimuli. Understanding CREs is essential for personalized medicine and disease research, as an increasing number of genetic variants associated with phenotypes and diseases overlap with CREs. However, existing databases often focus on subsets of regulatory elements and present each identified instance of element individually, confounding the effort to obtain a comprehensive view. To address this gap, we have created CREdb, a comprehensive database with over 10 million human regulatory elements across 1,058 cell types and 315 tissues harmonized from different data sources. We curated and aligned the cell types and tissues to standard ontologies for efficient data query.
Data from 11 sources were curated and mapped to standard ontological terms. 11,223,434 combined elements are present in the final database, and these were merged into 5,666,240 consensus elements representing the combined ranges of the individual elements informed by their overlap. Each consensus element contains curated metadata including the number of elements supporting it and a hash linking to the source databases. The inferred activity of each consensus element in various cell-type and tissue context is also provided. Examples presented here show the potential utility of CREdb in annotating non-coding genetic variants and informing chromatin accessibility profiling analysis.
We developed CREdb, a comprehensive database of CREs, to simplify the analysis of CREs by providing a unified framework for researchers. CREdb compiles consensus ranges for each element by integrating the information from all instances identified across various source databases. This unified database facilitates the functional annotation of non-coding genetic variants and complements chromatin accessibility profiling analysis. CREdb will serve as an important resource in expanding our knowledge of the epigenome and its role in human diseases.
顺式调控元件(CREs)在基因表达调控中起关键作用,使细胞能够发挥多种功能并对外界刺激做出反应。了解CREs对于个性化医疗和疾病研究至关重要,因为越来越多与表型和疾病相关的基因变异与CREs重叠。然而,现有数据库通常专注于调控元件的子集,并且单独呈现每个已识别的元件实例,这使得难以获得全面的视图。为了填补这一空白,我们创建了CREdb,这是一个综合数据库,包含来自不同数据源的、跨越1058种细胞类型和315种组织的超过1000万个人类调控元件。我们将细胞类型和组织进行整理并与标准本体对齐,以实现高效的数据查询。
对来自11个来源的数据进行了整理并映射到标准本体术语。最终数据库中存在11,223,434个组合元件,这些元件被合并为5,666,240个共有元件,这些共有元件代表了根据其重叠情况得出的各个元件的组合范围。每个共有元件都包含整理后的元数据,包括支持它的元件数量以及指向源数据库的哈希链接。还提供了每个共有元件在各种细胞类型和组织背景下的推断活性。此处给出的示例展示了CREdb在注释非编码基因变异和为染色质可及性分析提供信息方面的潜在用途。
我们开发了CREdb,这是一个全面的CREs数据库,旨在为研究人员提供一个统一的框架,简化CREs的分析。CREdb通过整合来自各种源数据库中识别出的所有实例的信息,为每个元件编制共有范围。这个统一的数据库有助于非编码基因变异的功能注释,并补充染色质可及性分析。CREdb将成为扩展我们对表观基因组及其在人类疾病中作用的认识的重要资源。