Suppr超能文献

使用等位基因特异性和等位基因非特异性转录因子结合数据进行基准测试和构建 DNA 结合亲和力模型。

Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data.

机构信息

Department of Biological Sciences, Columbia University, New York, NY, 10027, USA.

Department of Systems Biology, Columbia University, New York, NY, 10032, USA.

出版信息

Genome Biol. 2024 Oct 31;25(1):284. doi: 10.1186/s13059-024-03424-2.

Abstract

BACKGROUND

Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed.

RESULTS

We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts.

CONCLUSION

Our work provides new strategies for predicting the functional impact of non-coding variants.

摘要

背景

转录因子(TFs)以高度序列特异性的方式与 DNA 结合。这种特异性在体内表现为杂合基因座中两个等位基因之间 TF 占据的差异。ChIP-seq 等基因组规模的测定方法在检测等位基因特异性结合(ASB)方面,无论是在读取覆盖率还是在细胞系中个体变体的代表性方面,目前都受到限制。这使得仅从序列预测 TF 结合的等位基因差异变得可取,前提是可以对这些预测的可靠性进行定量评估。

结果

我们在此提出了用于根据它们预测 ChIP-seq 计数中等位基因失衡的能力来对 TF 结合的序列到亲和力模型进行基准测试的方法。我们使用基于过分散二项式分布的似然函数来在不要求个体变体具有统计学意义的情况下跨基因组聚合等位基因偏好的证据。这使我们能够在有多个相同 TF 的结合模型可用时系统地比较预测性能。为了促进从 ChIP-seq、ChIP-exo 和 CUT&Tag 等体内结合数据(如 ChIP-seq、ChIP-exo 和 CUT&Tag)中无需读取映射或峰调用,从头推断高质量模型,我们引入了我们的生物物理可解释机器学习框架 PyProBound 的可扩展重新实现。在 ChIP-seq 上进行训练时,明确考虑到 DNA 片段化率的测定方法特异性偏倚,可提高 TF 结合模型的性能。此外,我们展示了 PyProBound 如何利用我们无阈值的 ASB 似然函数来使用等位基因特异性 ChIP-seq 计数进行从头 motif 发现。

结论

我们的工作为预测非编码变体的功能影响提供了新策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a74a/11529166/b92f11baf241/13059_2024_3424_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验