• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

针对hg38人类参考基因组的全基因组表观基因组图谱数据,分辨率为100个碱基对。

Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome.

作者信息

Li Ronnie Y, Huang Yanting, Zhao Zhiyue, Qin Zhaohui S

机构信息

Graduate program in Neuroscience, Emory University, United States.

Department of Computer Science, Emory University, United States.

出版信息

Data Brief. 2022 Dec 14;46:108827. doi: 10.1016/j.dib.2022.108827. eCollection 2023 Feb.

DOI:10.1016/j.dib.2022.108827
PMID:36582986
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9792340/
Abstract

This manuscript presents a comprehensive collection of diverse epigenomic profiling data for the human genome in 100-bp resolution with full genome-wide coverage. The datasets are processed from raw read count data collected from five types of sequencing-based assays collected by the Encyclopedia of DNA Elements consortium (ENCODE, http://www.encodeproject.org). Data from high-throughput sequencing assays were processed and crystallized into a total of 6,305 genome-wide profiles. To ensure the quality of the features, we filtered out assays with low read depth, inconsistent read counts, and poor data quality. The types of sequencing-based experiment assays include DNase-seq, histone and TF ChIP-seq, ATAC-seq, and Poly(A) RNA-seq. Merging of processed data was done by averaging read counts across technical replicates to obtain signals in about 30 million predefined 100-bp bins that tile the entire genome. We provide an example of fetching read counts using disease-related risk variants from the GWAS Catalog. Additionally, we have created a tabix index enabling fast user retrieval of read counts given coordinates in the human genome. The data processing pipeline is replicable for users' own purposes and for other experimental assays. The processed data can be found on Zenodo at https://zenodo.org/record/7015783. These data can be used as features for statistical and machine learning models to predict or infer a wide range of variables of biological interest. They can also be applied to generate novel insights into gene expression, chromatin accessibility, and epigenetic modifications across the human genome. Finally, the processing pipeline can be easily applied to data from any other genome-wide profiling assays, expanding the amount of available data.

摘要

本手稿展示了一套全面的人类基因组表观基因组分析数据,分辨率为100碱基对,覆盖全基因组。这些数据集是从DNA元件百科全书联盟(ENCODE,http://www.encodeproject.org)收集的五种基于测序的检测方法所采集的原始读取计数数据中处理而来。来自高通量测序检测的数据经过处理,汇总成总共6305个全基因组图谱。为确保特征的质量,我们过滤掉了读取深度低、读取计数不一致和数据质量差的检测。基于测序的实验检测类型包括DNase-seq、组蛋白和转录因子ChIP-seq、ATAC-seq以及Poly(A) RNA-seq。通过对技术重复样本的读取计数求平均值来合并处理后的数据,以在覆盖整个基因组的约3000万个预定义100碱基对区间中获得信号。我们提供了一个使用来自全基因组关联研究目录的疾病相关风险变异来获取读取计数的示例。此外,我们创建了一个tabix索引,使用户能够根据人类基因组中的坐标快速检索读取计数。数据处理流程可复制,供用户用于自身目的以及其他实验检测。处理后的数据可在Zenodo上获取,网址为https://zenodo.org/record/7015783。这些数据可作为统计和机器学习模型的特征,用于预测或推断广泛的生物学相关变量。它们还可用于生成关于人类基因组中基因表达、染色质可及性和表观遗传修饰的新见解。最后,该处理流程可轻松应用于任何其他全基因组分析检测的数据,从而增加可用数据量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8197/9792340/291b8a6a73e3/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8197/9792340/291b8a6a73e3/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8197/9792340/291b8a6a73e3/gr1.jpg

相似文献

1
Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome.针对hg38人类参考基因组的全基因组表观基因组图谱数据,分辨率为100个碱基对。
Data Brief. 2022 Dec 14;46:108827. doi: 10.1016/j.dib.2022.108827. eCollection 2023 Feb.
2
Efficient chromatin accessibility mapping in situ by nucleosome-tethered tagmentation.通过核小体连接的标签酶切技术进行高效的染色质可及性原位作图。
Elife. 2020 Nov 16;9:e63274. doi: 10.7554/eLife.63274.
3
Cistrome Data Browser: a data portal for ChIP-Seq and chromatin accessibility data in human and mouse.顺式作用元件数据浏览器:一个用于人类和小鼠的ChIP-Seq及染色质可及性数据的数据门户。
Nucleic Acids Res. 2017 Jan 4;45(D1):D658-D662. doi: 10.1093/nar/gkw983. Epub 2016 Oct 26.
4
: A Tool for Searching Putative Factors Regulating Gene Expression Using ChIP-seq Data.: 一种使用 ChIP-seq 数据搜索调控基因表达的潜在因子的工具。
Int J Biol Sci. 2018 Sep 7;14(12):1724-1731. doi: 10.7150/ijbs.28850. eCollection 2018.
5
maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks.maxATAC:基于深度神经网络的 ATAC-seq 全基因组转录因子结合预测
PLoS Comput Biol. 2023 Jan 31;19(1):e1010863. doi: 10.1371/journal.pcbi.1010863. eCollection 2023 Jan.
6
ATACgraph: Profiling Genome-Wide Chromatin Accessibility From ATAC-seq.ATACgraph:通过ATAC-seq对全基因组染色质可及性进行分析
Front Genet. 2021 Jan 13;11:618478. doi: 10.3389/fgene.2020.618478. eCollection 2020.
7
Profiling the Epigenetic Landscape of the Spermatogonial Stem Cell: Part 2-Computational Analysis of Epigenomics Data.精确定位精原干细胞的表观遗传学特征:第 2 部分——表观基因组学数据的计算分析。
Methods Mol Biol. 2023;2656:109-125. doi: 10.1007/978-1-0716-3139-3_6.
8
[Advances in assay for transposase-accessible chromatin with high-throughput sequencing].[转座酶可及染色质高通量测序分析的进展]
Yi Chuan. 2020 Apr 20;42(4):333-346. doi: 10.16288/j.yczz.19-279.
9
ATAC-STARR-seq reveals transcription factor-bound activators and silencers within chromatin-accessible regions of the human genome.ATAC-STARR-seq 揭示了人类基因组中染色质可及区域内转录因子结合的激活子和沉默子。
Genome Res. 2022 Aug 25;32(8):1529-1541. doi: 10.1101/gr.276766.122.
10
Profiling the Epigenetic Landscape of the Spermatogonial Stem Cell-Part 1: Epigenomics Assays.精确定位精原干细胞的表观遗传景观——第 1 部分:表观基因组学检测。
Methods Mol Biol. 2023;2656:71-108. doi: 10.1007/978-1-0716-3139-3_5.

引用本文的文献

1
Sequence-based prioritization of i-Motif candidates in the human genome.基于序列的人类基因组中i-基序候选序列的优先级排序。
Front Bioinform. 2025 Aug 12;5:1657841. doi: 10.3389/fbinf.2025.1657841. eCollection 2025.

本文引用的文献

1
Disease category-specific annotation of variants using an ensemble learning framework.基于集成学习框架的疾病类别特异性变异注释。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab438.
2
A machine learning approach to brain epigenetic analysis reveals kinases associated with Alzheimer's disease.机器学习方法分析大脑表观遗传学揭示与阿尔茨海默病相关的激酶。
Nat Commun. 2021 Jul 22;12(1):4472. doi: 10.1038/s41467-021-24710-8.
3
The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.
NHGRI-EBI GWAS Catalog 于 2019 年发布的已发表全基因组关联研究、靶向基因芯片和汇总统计数据
Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012. doi: 10.1093/nar/gky1120.
4
CADD: predicting the deleteriousness of variants throughout the human genome.CADD:预测整个人类基因组中变异的有害性。
Nucleic Acids Res. 2019 Jan 8;47(D1):D886-D894. doi: 10.1093/nar/gky1016.
5
Prioritization and functional assessment of noncoding variants associated with complex diseases.优先考虑与复杂疾病相关的非编码变异,并进行功能评估。
Genome Med. 2018 Jul 11;10(1):53. doi: 10.1186/s13073-018-0565-y.
6
DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles.DIVAN:利用多组学图谱准确识别非编码疾病特异性风险变异体。
Genome Biol. 2016 Dec 6;17(1):252. doi: 10.1186/s13059-016-1112-z.
7
A spectral approach integrating functional genomic annotations for coding and noncoding variants.一种整合编码和非编码变异功能基因组注释的光谱方法。
Nat Genet. 2016 Feb;48(2):214-20. doi: 10.1038/ng.3477. Epub 2016 Jan 4.
8
A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.一种通过注释数据的综合分析来预测人类基因组中功能性非编码区域的统计框架。
Sci Rep. 2015 May 27;5:10576. doi: 10.1038/srep10576.
9
Functional annotation of noncoding sequence variants.非编码序列变异的功能注释。
Nat Methods. 2014 Mar;11(3):294-6. doi: 10.1038/nmeth.2832. Epub 2014 Feb 2.
10
Software for computing and annotating genomic ranges.基因组范围计算和注释软件。
PLoS Comput Biol. 2013;9(8):e1003118. doi: 10.1371/journal.pcbi.1003118. Epub 2013 Aug 8.