Department of Bioengineering, University of California, Merced, Merced, CA, USA.
Department of Biological Sciences, Columbia University, New York, NY, USA.
Nat Biotechnol. 2022 Oct;40(10):1520-1527. doi: 10.1038/s41587-022-01307-0. Epub 2022 May 23.
Protein-ligand interactions are increasingly profiled at high throughput using affinity selection and massively parallel sequencing. However, these assays do not provide the biophysical parameters that most rigorously quantify molecular interactions. Here we describe a flexible machine learning method, called ProBound, that accurately defines sequence recognition in terms of equilibrium binding constants or kinetic rates. This is achieved using a multi-layered maximum-likelihood framework that models both the molecular interactions and the data generation process. We show that ProBound quantifies transcription factor (TF) behavior with models that predict binding affinity over a range exceeding that of previous resources; captures the impact of DNA modifications and conformational flexibility of multi-TF complexes; and infers specificity directly from in vivo data such as ChIP-seq without peak calling. When coupled with an assay called K-seq, it determines the absolute affinity of protein-ligand interactions. We also apply ProBound to profile the kinetics of kinase-substrate interactions. ProBound opens new avenues for decoding biological networks and rationally engineering protein-ligand interactions.
蛋白质-配体相互作用正越来越多地通过亲和选择和大规模平行测序进行高通量分析。然而,这些检测方法并不能提供最严格量化分子相互作用的生物物理参数。在这里,我们描述了一种灵活的机器学习方法,称为 ProBound,它可以根据平衡结合常数或动力学速率准确地定义序列识别。这是通过使用多层最大似然框架来实现的,该框架既可以对分子相互作用进行建模,也可以对数据生成过程进行建模。我们表明,ProBound 使用能够预测结合亲和力的模型来定量转录因子 (TF) 的行为,该模型的预测范围超过了以前的资源;捕捉到 DNA 修饰和多 TF 复合物构象灵活性的影响;并直接从 ChIP-seq 等体内数据推断特异性,而无需峰调用。当与称为 K-seq 的检测方法结合使用时,它可以确定蛋白质-配体相互作用的绝对亲和力。我们还将 ProBound 应用于分析激酶-底物相互作用的动力学。ProBound 为解码生物网络和合理设计蛋白质-配体相互作用开辟了新的途径。