• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用等位基因特异性和等位基因非特异性转录因子结合数据进行基准测试和构建 DNA 结合亲和力模型。

Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data.

机构信息

Department of Biological Sciences, Columbia University, New York, NY, 10027, USA.

Department of Systems Biology, Columbia University, New York, NY, 10032, USA.

出版信息

Genome Biol. 2024 Oct 31;25(1):284. doi: 10.1186/s13059-024-03424-2.

DOI:10.1186/s13059-024-03424-2
PMID:39482734
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11529166/
Abstract

BACKGROUND

Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed.

RESULTS

We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts.

CONCLUSION

Our work provides new strategies for predicting the functional impact of non-coding variants.

摘要

背景

转录因子(TFs)以高度序列特异性的方式与 DNA 结合。这种特异性在体内表现为杂合基因座中两个等位基因之间 TF 占据的差异。ChIP-seq 等基因组规模的测定方法在检测等位基因特异性结合(ASB)方面,无论是在读取覆盖率还是在细胞系中个体变体的代表性方面,目前都受到限制。这使得仅从序列预测 TF 结合的等位基因差异变得可取,前提是可以对这些预测的可靠性进行定量评估。

结果

我们在此提出了用于根据它们预测 ChIP-seq 计数中等位基因失衡的能力来对 TF 结合的序列到亲和力模型进行基准测试的方法。我们使用基于过分散二项式分布的似然函数来在不要求个体变体具有统计学意义的情况下跨基因组聚合等位基因偏好的证据。这使我们能够在有多个相同 TF 的结合模型可用时系统地比较预测性能。为了促进从 ChIP-seq、ChIP-exo 和 CUT&Tag 等体内结合数据(如 ChIP-seq、ChIP-exo 和 CUT&Tag)中无需读取映射或峰调用,从头推断高质量模型,我们引入了我们的生物物理可解释机器学习框架 PyProBound 的可扩展重新实现。在 ChIP-seq 上进行训练时,明确考虑到 DNA 片段化率的测定方法特异性偏倚,可提高 TF 结合模型的性能。此外,我们展示了 PyProBound 如何利用我们无阈值的 ASB 似然函数来使用等位基因特异性 ChIP-seq 计数进行从头 motif 发现。

结论

我们的工作为预测非编码变体的功能影响提供了新策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/8899fd4bfd9a/13059_2024_3424_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/b92f11baf241/13059_2024_3424_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/063fc92d9d90/13059_2024_3424_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/d85c63408906/13059_2024_3424_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/21081669181a/13059_2024_3424_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/8899fd4bfd9a/13059_2024_3424_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/b92f11baf241/13059_2024_3424_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/063fc92d9d90/13059_2024_3424_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/d85c63408906/13059_2024_3424_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/21081669181a/13059_2024_3424_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/8899fd4bfd9a/13059_2024_3424_Fig5_HTML.jpg

相似文献

1
Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data.使用等位基因特异性和等位基因非特异性转录因子结合数据进行基准测试和构建 DNA 结合亲和力模型。
Genome Biol. 2024 Oct 31;25(1):284. doi: 10.1186/s13059-024-03424-2.
2
Benchmarking DNA binding affinity models using allele-specific transcription factor binding data.使用等位基因特异性转录因子结合数据对DNA结合亲和力模型进行基准测试。
bioRxiv. 2023 Dec 15:2023.12.15.571887. doi: 10.1101/2023.12.15.571887.
3
Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans.转录因子结合 k- -mer 分析阐明了人类结合特异性和顺式调控 SNP 的细胞类型依赖性。
BMC Genomics. 2023 Oct 7;24(1):597. doi: 10.1186/s12864-023-09692-9.
4
Allele-specific transcription factor binding across human brain regions offers mechanistic insight into eQTLs.跨人类大脑区域的等位基因特异性转录因子结合为 eQTLs 提供了机制上的见解。
Genome Res. 2024 Sep 20;34(8):1224-1234. doi: 10.1101/gr.278601.123.
5
Statistics of protein-DNA binding and the total number of binding sites for a transcription factor in the mammalian genome.蛋白质-DNA 结合的统计数据和哺乳动物基因组中转录因子的总结合位点数量。
BMC Genomics. 2010 Feb 10;11 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2164-11-S1-S12.
6
Evaluating the impact of single nucleotide variants on transcription factor binding.评估单核苷酸变异对转录因子结合的影响。
Nucleic Acids Res. 2016 Dec 1;44(21):10106-10116. doi: 10.1093/nar/gkw691. Epub 2016 Aug 4.
7
The Role of Genome Accessibility in Transcription Factor Binding in Bacteria.基因组可及性在细菌转录因子结合中的作用
PLoS Comput Biol. 2016 Apr 22;12(4):e1004891. doi: 10.1371/journal.pcbi.1004891. eCollection 2016 Apr.
8
A map of direct TF-DNA interactions in the human genome.人类基因组中直接 TF-DNA 相互作用的图谱。
Nucleic Acids Res. 2019 Feb 28;47(4):e21. doi: 10.1093/nar/gky1210.
9
The next generation of transcription factor binding site prediction.下一代转录因子结合位点预测。
PLoS Comput Biol. 2013;9(9):e1003214. doi: 10.1371/journal.pcbi.1003214. Epub 2013 Sep 5.
10
High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions.转录因子-DNA 亲和力的高分辨率模型可改善体外和体内结合预测。
PLoS Comput Biol. 2010 Sep 9;6(9):e1000916. doi: 10.1371/journal.pcbi.1000916.

引用本文的文献

1
Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning.利用家族水平的具有生物物理可解释性的机器学习预测转录因子突变体的DNA结合特异性。
Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf831.
2
Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning.利用家族水平的具有生物物理可解释性的机器学习预测转录因子突变体的DNA结合特异性
bioRxiv. 2025 Apr 2:2024.01.24.577115. doi: 10.1101/2024.01.24.577115.

本文引用的文献

1
Prediction of protein-ligand binding affinity from sequencing data with interpretable machine learning.基于可解释机器学习的测序数据预测蛋白-配体结合亲和力。
Nat Biotechnol. 2022 Oct;40(10):1520-1527. doi: 10.1038/s41587-022-01307-0. Epub 2022 May 23.
2
Landscape of allele-specific transcription factor binding in the human genome.人类基因组中等位基因特异性转录因子结合的全景
Nat Commun. 2021 May 12;12(1):2751. doi: 10.1038/s41467-021-23007-0.
3
New developments on the Encyclopedia of DNA Elements (ENCODE) data portal.DNA 元件百科全书(ENCODE)数据门户的新进展。
Nucleic Acids Res. 2020 Jan 8;48(D1):D882-D889. doi: 10.1093/nar/gkz1062.
4
JASPAR 2020: update of the open-access database of transcription factor binding profiles.JASPAR 2020:转录因子结合谱开放获取数据库的更新。
Nucleic Acids Res. 2020 Jan 8;48(D1):D87-D92. doi: 10.1093/nar/gkz1001.
5
Low-Affinity Binding Sites and the Transcription Factor Specificity Paradox in Eukaryotes.真核生物中低亲和力结合位点和转录因子特异性悖论。
Annu Rev Cell Dev Biol. 2019 Oct 6;35:357-379. doi: 10.1146/annurev-cellbio-100617-062719. Epub 2019 Jul 5.
6
CUT&Tag for efficient epigenomic profiling of small samples and single cells.CUT&Tag 技术可高效地对小样本和单细胞进行表观基因组分析。
Nat Commun. 2019 Apr 29;10(1):1930. doi: 10.1038/s41467-019-09982-5.
7
Simplified ChIP-exo assays.简化的 ChIP-exo 分析。
Nat Commun. 2018 Jul 20;9(1):2842. doi: 10.1038/s41467-018-05265-7.
8
Accurate and sensitive quantification of protein-DNA binding affinity.准确且灵敏的蛋白质-DNA 结合亲和力定量分析。
Proc Natl Acad Sci U S A. 2018 Apr 17;115(16):E3692-E3701. doi: 10.1073/pnas.1714376115. Epub 2018 Apr 2.
9
The Human Transcription Factors.人类转录因子。
Cell. 2018 Feb 8;172(4):650-665. doi: 10.1016/j.cell.2018.01.029.
10
HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis.HOCOMOCO:通过大规模的 ChIP-Seq 分析,构建人类和小鼠转录因子结合模型的完整集合。
Nucleic Acids Res. 2018 Jan 4;46(D1):D252-D259. doi: 10.1093/nar/gkx1106.