School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China.
School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing 163318, China.
Nucleic Acids Res. 2021 Jan 8;49(D1):D55-D64. doi: 10.1093/nar/gkaa943.
Accessible chromatin is a highly informative structural feature for identifying regulatory elements, which provides a large amount of information about transcriptional activity and gene regulatory mechanisms. Human ATAC-seq datasets are accumulating rapidly, prompting an urgent need to comprehensively collect and effectively process these data. We developed a comprehensive human chromatin accessibility database (ATACdb, http://www.licpathway.net/ATACdb), with the aim of providing a large amount of publicly available resources on human chromatin accessibility data, and to annotate and illustrate potential roles in a tissue/cell type-specific manner. The current version of ATACdb documented a total of 52 078 883 regions from over 1400 ATAC-seq samples. These samples have been manually curated from over 2200 chromatin accessibility samples from NCBI GEO/SRA. To make these datasets more accessible to the research community, ATACdb provides a quality assurance process including four quality control (QC) metrics. ATACdb provides detailed (epi)genetic annotations in chromatin accessibility regions, including super-enhancers, typical enhancers, transcription factors (TFs), common single-nucleotide polymorphisms (SNPs), risk SNPs, eQTLs, LD SNPs, methylations, chromatin interactions and TADs. Especially, ATACdb provides accurate inference of TF footprints within chromatin accessibility regions. ATACdb is a powerful platform that provides the most comprehensive accessible chromatin data, QC, TF footprint and various other annotations.
可及染色质是识别调控元件的高度信息结构特征,提供了大量关于转录活性和基因调控机制的信息。人类 ATAC-seq 数据集正在迅速积累,这促使我们迫切需要全面收集和有效地处理这些数据。我们开发了一个全面的人类染色质可及性数据库(ATACdb,http://www.licpathway.net/ATACdb),旨在提供大量公开的人类染色质可及性数据资源,并以组织/细胞类型特异性的方式注释和说明潜在作用。目前的 ATACdb 版本共记录了超过 1400 个 ATAC-seq 样本中的 5207883 个区域。这些样本是从 NCBI GEO/SRA 中超过 2200 个染色质可及性样本中手动整理出来的。为了使这些数据集更容易被研究界访问,ATACdb 提供了一个质量保证过程,包括四个质量控制(QC)指标。ATACdb 提供了染色质可及性区域的详细(表观)遗传注释,包括超级增强子、典型增强子、转录因子(TFs)、常见单核苷酸多态性(SNPs)、风险 SNPs、eQTLs、LD SNPs、甲基化、染色质相互作用和 TADs。特别是,ATACdb 提供了染色质可及性区域内 TF 足迹的准确推断。ATACdb 是一个强大的平台,提供了最全面的可及染色质数据、QC、TF 足迹和各种其他注释。