基于可解释机器学习的测序数据预测蛋白-配体结合亲和力。

Prediction of protein-ligand binding affinity from sequencing data with interpretable machine learning.

机构信息

Department of Bioengineering, University of California, Merced, Merced, CA, USA.

Department of Biological Sciences, Columbia University, New York, NY, USA.

出版信息

Nat Biotechnol. 2022 Oct;40(10):1520-1527. doi: 10.1038/s41587-022-01307-0. Epub 2022 May 23.

DOI:10.1038/s41587-022-01307-0

PMID:35606422

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9546773/

Abstract

Protein-ligand interactions are increasingly profiled at high throughput using affinity selection and massively parallel sequencing. However, these assays do not provide the biophysical parameters that most rigorously quantify molecular interactions. Here we describe a flexible machine learning method, called ProBound, that accurately defines sequence recognition in terms of equilibrium binding constants or kinetic rates. This is achieved using a multi-layered maximum-likelihood framework that models both the molecular interactions and the data generation process. We show that ProBound quantifies transcription factor (TF) behavior with models that predict binding affinity over a range exceeding that of previous resources; captures the impact of DNA modifications and conformational flexibility of multi-TF complexes; and infers specificity directly from in vivo data such as ChIP-seq without peak calling. When coupled with an assay called K-seq, it determines the absolute affinity of protein-ligand interactions. We also apply ProBound to profile the kinetics of kinase-substrate interactions. ProBound opens new avenues for decoding biological networks and rationally engineering protein-ligand interactions.

摘要

蛋白质-配体相互作用正越来越多地通过亲和选择和大规模平行测序进行高通量分析。然而，这些检测方法并不能提供最严格量化分子相互作用的生物物理参数。在这里，我们描述了一种灵活的机器学习方法，称为 ProBound，它可以根据平衡结合常数或动力学速率准确地定义序列识别。这是通过使用多层最大似然框架来实现的，该框架既可以对分子相互作用进行建模，也可以对数据生成过程进行建模。我们表明，ProBound 使用能够预测结合亲和力的模型来定量转录因子 (TF) 的行为，该模型的预测范围超过了以前的资源；捕捉到 DNA 修饰和多 TF 复合物构象灵活性的影响；并直接从 ChIP-seq 等体内数据推断特异性，而无需峰调用。当与称为 K-seq 的检测方法结合使用时，它可以确定蛋白质-配体相互作用的绝对亲和力。我们还将 ProBound 应用于分析激酶-底物相互作用的动力学。ProBound 为解码生物网络和合理设计蛋白质-配体相互作用开辟了新的途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3181/9546773/710b25bb5f04/41587_2022_1307_Fig1_HTML.jpg

相似文献

Prediction of protein-ligand binding affinity from sequencing data with interpretable machine learning.

Nat Biotechnol. 2022 Oct;40(10):1520-1527. doi: 10.1038/s41587-022-01307-0. Epub 2022 May 23.

Imputation for transcription factor binding predictions based on deep learning.

PLoS Comput Biol. 2017 Feb 24;13(2):e1005403. doi: 10.1371/journal.pcbi.1005403. eCollection 2017 Feb.

Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):4. doi: 10.1186/s12859-015-0846-z.

Benchmarking DNA binding affinity models using allele-specific transcription factor binding data.

bioRxiv. 2023 Dec 15:2023.12.15.571887. doi: 10.1101/2023.12.15.571887.

Improving ChIP-seq peak-calling for functional co-regulator binding by integrating multiple sources of biological information.

BMC Genomics. 2012;13 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-13-S1-S1. Epub 2012 Jan 17.

Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction.

Nucleic Acids Res. 2017 Jan 9;45(1):54-66. doi: 10.1093/nar/gkw1061. Epub 2016 Nov 29.

A map of direct TF-DNA interactions in the human genome.

Nucleic Acids Res. 2019 Feb 28;47(4):e21. doi: 10.1093/nar/gky1210.

Chromatin immunoprecipitation and multiplex sequencing (ChIP-Seq) to identify global transcription factor binding sites in the nematode Caenorhabditis elegans.

Methods Enzymol. 2014;539:89-111. doi: 10.1016/B978-0-12-420120-0.00007-4.

A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data.

PLoS One. 2009 Dec 1;4(12):e8155. doi: 10.1371/journal.pone.0008155.

Cell-type and transcription factor specific enrichment of transcriptional cofactor motifs in ENCODE ChIP-seq data.

BMC Genomics. 2013;14 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2164-14-S5-S2. Epub 2013 Oct 16.

引用本文的文献

Multiple overlapping binding sites determine transcription factor occupancy.

Nature. 2025 Sep 3. doi: 10.1038/s41586-025-09472-3.

Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning.

Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf831.

Symmetry, gauge freedoms, and the interpretability of sequence-function relationships.

Phys Rev Res. 2025 Apr-Jun;7(2). doi: 10.1103/physrevresearch.7.023005. Epub 2025 Apr 2.

Interpretable protein-DNA interactions captured by structure-sequence optimization.

Elife. 2025 Jul 17;14:RP105565. doi: 10.7554/eLife.105565.

Bio-Inspired Mamba for Antibody-Antigen Interaction Prediction.

Biomolecules. 2025 May 26;15(6):764. doi: 10.3390/biom15060764.

A self-conformation-aware pre-training framework for molecular property prediction with substructure interpretability.

Nat Commun. 2025 May 12;16(1):4382. doi: 10.1038/s41467-025-59634-0.

Predictive biophysical neural network modeling of a compendium of in vivo transcription factor DNA binding profiles for Escherichia coli.

Nat Commun. 2025 May 7;16(1):4255. doi: 10.1038/s41467-025-58862-8.

A new paradigm for the regulation of A40926B0 biosynthesis.

Synth Syst Biotechnol. 2025 Apr 7;10(3):794-806. doi: 10.1016/j.synbio.2025.03.012. eCollection 2025 Sep.

Learning Universal Representations of Intermolecular Interactions with ATOMICA.

bioRxiv. 2025 Jul 15:2025.04.02.646906. doi: 10.1101/2025.04.02.646906.

Integrating genetic variation with deep learning provides context for variants impacting transcription factor binding during embryogenesis.

Genome Res. 2025 May 2;35(5):1138-1153. doi: 10.1101/gr.279652.124.

本文引用的文献

DeepSELEX: inferring DNA-binding preferences from HT-SELEX data using multi-class CNNs.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i634-i642. doi: 10.1093/bioinformatics/btaa789.

NoPeak: k-mer-based motif discovery in ChIP-Seq data without peak calling.

Bioinformatics. 2021 May 5;37(5):596-602. doi: 10.1093/bioinformatics/btaa845.

Deep profiling of protease substrate specificity enabled by dual random and scanned human proteome substrate phage libraries.

Proc Natl Acad Sci U S A. 2020 Oct 13;117(41):25464-25475. doi: 10.1073/pnas.2009279117. Epub 2020 Sep 24.

How to measure and evaluate binding affinities.

Elife. 2020 Aug 6;9:e57264. doi: 10.7554/eLife.57264.

Antibody complementarity determining region design using high-capacity machine learning.

Bioinformatics. 2020 Apr 1;36(7):2126-2133. doi: 10.1093/bioinformatics/btz895.

BindSpace decodes transcription factor binding signals by large-scale sequence embedding.

Nat Methods. 2019 Sep;16(9):858-861. doi: 10.1038/s41592-019-0511-y. Epub 2019 Aug 12.

Epigenetics of Modified DNA Bases: 5-Methylcytosine and Beyond.

Front Genet. 2018 Dec 18;9:640. doi: 10.3389/fgene.2018.00640. eCollection 2018.

A deep neural network approach for learning intrinsic protein-RNA binding preferences.

Bioinformatics. 2018 Sep 1;34(17):i638-i646. doi: 10.1093/bioinformatics/bty600.

The interaction landscape between transcription factors and the nucleosome.

Nature. 2018 Oct;562(7725):76-81. doi: 10.1038/s41586-018-0549-5. Epub 2018 Sep 24.

N-Methyladenine DNA Modification in the Human Genome.

Mol Cell. 2018 Jul 19;71(2):306-318.e7. doi: 10.1016/j.molcel.2018.06.015. Epub 2018 Jul 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于可解释机器学习的测序数据预测蛋白-配体结合亲和力。

Prediction of protein-ligand binding affinity from sequencing data with interpretable machine learning.

机构信息

Department of Bioengineering, University of California, Merced, Merced, CA, USA.

Department of Biological Sciences, Columbia University, New York, NY, USA.

出版信息

Nat Biotechnol. 2022 Oct;40(10):1520-1527. doi: 10.1038/s41587-022-01307-0. Epub 2022 May 23.

DOI:10.1038/s41587-022-01307-0

PMID:35606422

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9546773/

Abstract

摘要

基于可解释机器学习的测序数据预测蛋白-配体结合亲和力。

Prediction of protein-ligand binding affinity from sequencing data with interpretable machine learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于可解释机器学习的测序数据预测蛋白-配体结合亲和力。

Prediction of protein-ligand binding affinity from sequencing data with interpretable machine learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献