• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用基于序列的预测模型评估和优化染色质可及性数据的质量。

Quality assessment and refinement of chromatin accessibility data using a sequence-based predictive model.

机构信息

Department of Pediatrics, Division of Nephrology, Boston Children's Hospital, Boston & Harvard Medical School, Boston, MA 02115.

Kidney Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA 02142.

出版信息

Proc Natl Acad Sci U S A. 2022 Dec 20;119(51):e2212810119. doi: 10.1073/pnas.2212810119. Epub 2022 Dec 12.

DOI:10.1073/pnas.2212810119
PMID:36508674
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9907136/
Abstract

Chromatin accessibility assays are central to the genome-wide identification of gene regulatory elements associated with transcriptional regulation. However, the data have highly variable quality arising from several biological and technical factors. To surmount this problem, we developed a sequence-based machine learning method to evaluate and refine chromatin accessibility data. Our framework, gapped k-mer SVM quality check (gkmQC), provides the quality metrics for a sample based on the prediction accuracy of the trained models. We tested 886 DNase-seq samples from the ENCODE/Roadmap projects to demonstrate that gkmQC can effectively identify "high-quality" (HQ) samples with low conventional quality scores owing to marginal read depths. Peaks identified in HQ samples are more accurately aligned at functional regulatory elements, show greater enrichment of regulatory elements harboring functional variants, and explain greater heritability of phenotypes from their relevant tissues. Moreover, gkmQC can optimize the peak-calling threshold to identify additional peaks, especially for rare cell types in single-cell chromatin accessibility data.

摘要

染色质可及性分析是全基因组鉴定与转录调控相关的基因调控元件的核心方法。然而,由于多种生物学和技术因素的影响,数据质量具有高度可变性。为了解决这个问题,我们开发了一种基于序列的机器学习方法来评估和优化染色质可及性数据。我们的框架,缺口 k-mer SVM 质量检查(gkmQC),基于训练模型的预测准确性为样本提供质量指标。我们测试了 ENCODE/Roadmap 项目中的 886 个 DNase-seq 样本,证明 gkmQC 可以有效地识别由于边缘读取深度而导致常规质量分数较低的“高质量”(HQ)样本。在 HQ 样本中鉴定的峰在功能调节元件上的对齐更准确,表现出更多富含具有功能变异的调节元件,并且可以从相关组织中解释更大的表型遗传率。此外,gkmQC 可以优化峰调用阈值来识别更多的峰,特别是在单细胞染色质可及性数据中罕见的细胞类型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/5adff843c658/pnas.2212810119fig06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/fc53462c4cbf/pnas.2212810119fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/fd9274bf09e4/pnas.2212810119fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/2b6de72b604c/pnas.2212810119fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/97392371462b/pnas.2212810119fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/400b2861039f/pnas.2212810119fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/5adff843c658/pnas.2212810119fig06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/fc53462c4cbf/pnas.2212810119fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/fd9274bf09e4/pnas.2212810119fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/2b6de72b604c/pnas.2212810119fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/97392371462b/pnas.2212810119fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/400b2861039f/pnas.2212810119fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b5/9907136/5adff843c658/pnas.2212810119fig06.jpg

相似文献

1
Quality assessment and refinement of chromatin accessibility data using a sequence-based predictive model.利用基于序列的预测模型评估和优化染色质可及性数据的质量。
Proc Natl Acad Sci U S A. 2022 Dec 20;119(51):e2212810119. doi: 10.1073/pnas.2212810119. Epub 2022 Dec 12.
2
Chromatin accessibility and gene expression during adipocyte differentiation identify context-dependent effects at cardiometabolic GWAS loci.脂肪细胞分化过程中的染色质可及性和基因表达鉴定出心脏代谢 GWAS 位点的上下文相关效应。
PLoS Genet. 2021 Oct 26;17(10):e1009865. doi: 10.1371/journal.pgen.1009865. eCollection 2021 Oct.
3
Prediction of Chromatin Accessibility in Gene-Regulatory Regions from Transcriptomics Data.从转录组学数据预测基因调控区域的染色质可及性。
Sci Rep. 2017 Jul 5;7(1):4660. doi: 10.1038/s41598-017-04929-6.
4
Enhanced regulatory sequence prediction using gapped k-mer features.使用带缺口的 k-mer 特征增强调控序列预测。
PLoS Comput Biol. 2014 Jul 17;10(7):e1003711. doi: 10.1371/journal.pcbi.1003711. eCollection 2014 Jul.
5
Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique.从序列预测调控元件的准确性因功能测序技术而异。
Front Cell Infect Microbiol. 2023 Aug 2;13:1182567. doi: 10.3389/fcimb.2023.1182567. eCollection 2023.
6
CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction.CIPHER:一个用于整合下一代测序数据分析和基因组调控元件预测的灵活且功能广泛的工作流程平台。
BMC Bioinformatics. 2017 Aug 8;18(1):363. doi: 10.1186/s12859-017-1770-1.
7
Global prediction of chromatin accessibility using small-cell-number and single-cell RNA-seq.利用少量细胞和单细胞 RNA 测序进行全基因组染色质可及性预测。
Nucleic Acids Res. 2019 Nov 4;47(19):e121. doi: 10.1093/nar/gkz716.
8
Genome-Wide Analysis of Chromatin Accessibility in Arabidopsis Infected with Pseudomonas syringae.丁香假单胞菌感染的拟南芥染色质可及性的全基因组分析
Methods Mol Biol. 2017;1578:263-272. doi: 10.1007/978-1-4939-6859-6_22.
9
Sequence and chromatin determinants of cell-type-specific transcription factor binding.细胞类型特异性转录因子结合的序列和染色质决定因素。
Genome Res. 2012 Sep;22(9):1723-34. doi: 10.1101/gr.127712.111.
10
Genome-wide analysis of chromatin accessibility using ATAC-seq.使用ATAC-seq进行全基因组染色质可及性分析。
Methods Cell Biol. 2019;151:219-235. doi: 10.1016/bs.mcb.2018.11.002. Epub 2018 Dec 21.

引用本文的文献

1
Should Artificial Intelligence Play a Durable Role in Biomedical Research and Practice?人工智能在生物医学研究与实践中应扮演持久的角色吗?
Int J Mol Sci. 2024 Dec 13;25(24):13371. doi: 10.3390/ijms252413371.
2
Overloading And unpacKing (OAK) - droplet-based combinatorial indexing for ultra-high throughput single-cell multiomic profiling.超载和非包封(OAK)-基于液滴的组合索引,用于超高通量单细胞多组学分析。
Nat Commun. 2024 Oct 23;15(1):9146. doi: 10.1038/s41467-024-53227-z.
3
Circulating proteins linked to apoptosis processes and fast development of end-stage kidney disease in diabetes.

本文引用的文献

1
Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney.单细胞转录组和染色质可及性分析重新定义了成人肾脏中的细胞异质性。
Nat Commun. 2021 Apr 13;12(1):2190. doi: 10.1038/s41467-021-22368-w.
2
Genome-wide enhancer maps link risk variants to disease genes.全基因组增强子图谱将风险变异与疾病基因联系起来。
Nature. 2021 May;593(7858):238-243. doi: 10.1038/s41586-021-03446-x. Epub 2021 Apr 7.
3
Comprehensive analysis of single cell ATAC-seq data with SnapATAC.利用 SnapATAC 对单细胞 ATAC-seq 数据进行全面分析。
与细胞凋亡过程相关的循环蛋白与糖尿病快速进展至终末期肾病有关。
JCI Insight. 2024 Oct 22;9(20):e178373. doi: 10.1172/jci.insight.178373.
4
Identification of the Molecular Components of Enhancer-Mediated Gene Expression Variation in Multiple Tissues Regulating Blood Pressure.鉴定多个组织中调节血压的增强子介导的基因表达变化的分子组成部分。
Hypertension. 2024 Jul;81(7):1500-1510. doi: 10.1161/HYPERTENSIONAHA.123.22538. Epub 2024 May 15.
5
Tissue-specific and tissue-agnostic effects of genome sequence variation modulating blood pressure.基因组序列变异调节血压的组织特异性和组织非特异性效应。
Cell Rep. 2023 Nov 28;42(11):113351. doi: 10.1016/j.celrep.2023.113351. Epub 2023 Nov 1.
6
Multi-population genome-wide association study implicates immune and non-immune factors in pediatric steroid-sensitive nephrotic syndrome.多人群全基因组关联研究提示免疫和非免疫因素在儿童激素敏感性肾病综合征中的作用。
Nat Commun. 2023 Apr 29;14(1):2481. doi: 10.1038/s41467-023-37985-w.
7
Mapping genomic regulation of kidney disease and traits through high-resolution and interpretable eQTLs.通过高分辨率和可解释的 eQTL 绘制肾脏疾病和特征的基因组调控图谱。
Nat Commun. 2023 Apr 19;14(1):2229. doi: 10.1038/s41467-023-37691-7.
Nat Commun. 2021 Feb 26;12(1):1337. doi: 10.1038/s41467-021-21583-9.
4
Base-resolution models of transcription-factor binding reveal soft motif syntax.基于分辨率的转录因子结合模型揭示了软基序语法。
Nat Genet. 2021 Mar;53(3):354-366. doi: 10.1038/s41588-021-00782-6. Epub 2021 Feb 18.
5
Functional studies of GWAS variants are gaining momentum.全基因组关联研究变异体的功能研究正在兴起。
Nat Commun. 2020 Dec 8;11(1):6283. doi: 10.1038/s41467-020-20188-y.
6
A human cell atlas of fetal chromatin accessibility.人类胚胎染色质可及性细胞图谱。
Science. 2020 Nov 13;370(6518). doi: 10.1126/science.aba7612.
7
Global reference mapping of human transcription factor footprints.人类转录因子足迹的全球参考图谱绘制。
Nature. 2020 Jul;583(7818):729-736. doi: 10.1038/s41586-020-2528-x. Epub 2020 Jul 29.
8
Expanded encyclopaedias of DNA elements in the human and mouse genomes.人类和小鼠基因组中 DNA 元件的扩展百科全书。
Nature. 2020 Jul;583(7818):699-710. doi: 10.1038/s41586-020-2493-4. Epub 2020 Jul 29.
9
Index and biological spectrum of human DNase I hypersensitive sites.人类DNA酶I超敏感位点的索引与生物学谱
Nature. 2020 Aug;584(7820):244-251. doi: 10.1038/s41586-020-2559-3. Epub 2020 Jul 29.
10
Analysis of putative cis-regulatory elements regulating blood pressure variation.分析调节血压变化的假定顺式调控元件。
Hum Mol Genet. 2020 Jul 21;29(11):1922-1932. doi: 10.1093/hmg/ddaa098.