• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

优化蛋白质-DNA 相互作用的最小集合:具有最小过拟合的准精确解。

Optimization of minimum set of protein-DNA interactions: a quasi exact solution with minimum over-fitting.

机构信息

Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA.

出版信息

Bioinformatics. 2010 Feb 1;26(3):319-25. doi: 10.1093/bioinformatics/btp664. Epub 2009 Dec 4.

DOI:10.1093/bioinformatics/btp664
PMID:19965883
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2815656/
Abstract

MOTIVATION

A major limitation in modeling protein interactions is the difficulty of assessing the over-fitting of the training set. Recently, an experimentally based approach that integrates crystallographic information of C2H2 zinc finger-DNA complexes with binding data from 11 mutants, 7 from EGR finger I, was used to define an improved interaction code (no optimization). Here, we present a novel mixed integer programming (MIP)-based method that transforms this type of data into an optimized code, demonstrating both the advantages of the mathematical formulation to minimize over- and under-fitting and the robustness of the underlying physical parameters mapped by the code.

RESULTS

Based on the structural models of feasible interaction networks for 35 mutants of EGR-DNA complexes, the MIP method minimizes the cumulative binding energy over all complexes for a general set of fundamental protein-DNA interactions. To guard against over-fitting, we use the scalability of the method to probe against the elimination of related interactions. From an initial set of 12 parameters (six hydrogen bonds, five desolvation penalties and a water factor), we proceed to eliminate five of them with only a marginal reduction of the correlation coefficient to 0.9983. Further reduction of parameters negatively impacts the performance of the code (under-fitting). Besides accurately predicting the change in binding affinity of validation sets, the code identifies possible context-dependent effects in the definition of the interaction networks. Yet, the approach of constraining predictions to within a pre-selected set of interactions limits the impact of these potential errors to related low-affinity complexes.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在建模蛋白质相互作用时,一个主要的限制是难以评估训练集的过拟合。最近,一种基于实验的方法,该方法整合了 C2H2 锌指-DNA 复合物的晶体学信息和来自 11 个突变体的结合数据,其中 7 个来自 EGR 手指 I,用于定义改进的相互作用代码(无优化)。在这里,我们提出了一种新的基于混合整数规划(MIP)的方法,该方法将这种类型的数据转化为优化代码,展示了数学公式在最小化过拟合和欠拟合方面的优势,以及代码所映射的基础物理参数的稳健性。

结果

基于 35 个 EGR-DNA 复合物突变体的可行相互作用网络的结构模型,MIP 方法最小化了所有复合物的累积结合能,适用于一组基本的蛋白质-DNA 相互作用。为了防止过拟合,我们使用该方法的可扩展性来探测相关相互作用的消除。从一组初始的 12 个参数(六个氢键、五个去溶剂化罚分和一个水因子)开始,我们继续消除其中的五个,而相关系数仅略有下降,降至 0.9983。进一步减少参数会对代码的性能产生负面影响(欠拟合)。除了准确预测验证集的结合亲和力变化外,该代码还确定了相互作用网络定义中的可能上下文相关效应。然而,将预测约束在预先选择的相互作用集内的方法限制了这些潜在错误对相关低亲和力复合物的影响。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/7f72ddc60772/btp664f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/3028c23601c8/btp664f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/ca0cff2de3b7/btp664f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/6d3abb8713c3/btp664f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/8871fa954f83/btp664f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/9e06b14ceec6/btp664f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/7f72ddc60772/btp664f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/3028c23601c8/btp664f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/ca0cff2de3b7/btp664f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/6d3abb8713c3/btp664f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/8871fa954f83/btp664f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/9e06b14ceec6/btp664f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c45/2815656/7f72ddc60772/btp664f6.jpg

相似文献

1
Optimization of minimum set of protein-DNA interactions: a quasi exact solution with minimum over-fitting.优化蛋白质-DNA 相互作用的最小集合:具有最小过拟合的准精确解。
Bioinformatics. 2010 Feb 1;26(3):319-25. doi: 10.1093/bioinformatics/btp664. Epub 2009 Dec 4.
2
Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors.C2H2型锌指转录因子的上下文依赖型DNA识别密码
Bioinformatics. 2008 Sep 1;24(17):1850-7. doi: 10.1093/bioinformatics/btn331. Epub 2008 Jun 27.
3
Probabilistic code for DNA recognition by proteins of the EGR family.EGR家族蛋白质识别DNA的概率编码。
J Mol Biol. 2002 Nov 1;323(4):701-27. doi: 10.1016/s0022-2836(02)00917-8.
4
Experimentally based contact energies decode interactions responsible for protein-DNA affinity and the role of molecular waters at the binding interface.基于实验的接触能解析了蛋白质与DNA亲和力相关的相互作用以及结合界面处水分子的作用。
Nucleic Acids Res. 2009 Jul;37(12):4076-88. doi: 10.1093/nar/gkp289. Epub 2009 May 8.
5
Exploiting the recognition code for elucidating the mechanism of zinc finger protein-DNA interactions.利用识别密码阐明锌指蛋白与DNA相互作用的机制。
BMC Genomics. 2016 Dec 22;17(Suppl 13):1037. doi: 10.1186/s12864-016-3324-8.
6
A simple physical model for the prediction and design of protein-DNA interactions.一种用于预测和设计蛋白质 - DNA 相互作用的简单物理模型。
J Mol Biol. 2004 Nov 12;344(1):59-70. doi: 10.1016/j.jmb.2004.09.029.
7
Analysis of zinc fingers optimized via phage display: evaluating the utility of a recognition code.通过噬菌体展示优化的锌指分析:评估识别密码的效用。
J Mol Biol. 1999 Feb 5;285(5):1917-34. doi: 10.1006/jmbi.1998.2421.
8
Quantitative evaluation of protein-DNA interactions using an optimized knowledge-based potential.使用优化的基于知识的势能对蛋白质 - DNA 相互作用进行定量评估。
Nucleic Acids Res. 2005 Jan 26;33(2):546-58. doi: 10.1093/nar/gki204. Print 2005.
9
Assessment of major and minor groove DNA interactions by the zinc fingers of Xenopus transcription factor IIIA.非洲爪蟾转录因子IIIA的锌指对DNA大沟和小沟相互作用的评估
Nucleic Acids Res. 1996 Jul 1;24(13):2567-74. doi: 10.1093/nar/24.13.2567.
10
DNA-induced alpha-helix capping in conserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers.保守连接序列中DNA诱导的α-螺旋封端是Cys(2)-His(2)锌指结合亲和力的决定因素。
J Mol Biol. 2000 Jan 28;295(4):719-27. doi: 10.1006/jmbi.1999.3406.

引用本文的文献

1
Multi-task bioassay pre-training for protein-ligand binding affinity prediction.多任务生物测定预训练用于蛋白质-配体结合亲和力预测。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad451.
2
Optimal affinity ranking for automated virtual screening validated in prospective D3R grand challenges.在 D3R 大型挑战的前瞻性验证中,自动化虚拟筛选的最佳亲和力排名。
J Comput Aided Mol Des. 2018 Jan;32(1):287-297. doi: 10.1007/s10822-017-0065-y. Epub 2017 Sep 16.
3
Challenges, applications, and recent advances of protein-ligand docking in structure-based drug design.

本文引用的文献

1
Experimentally based contact energies decode interactions responsible for protein-DNA affinity and the role of molecular waters at the binding interface.基于实验的接触能解析了蛋白质与DNA亲和力相关的相互作用以及结合界面处水分子的作用。
Nucleic Acids Res. 2009 Jul;37(12):4076-88. doi: 10.1093/nar/gkp289. Epub 2009 May 8.
2
Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors.C2H2型锌指转录因子的上下文依赖型DNA识别密码
Bioinformatics. 2008 Sep 1;24(17):1850-7. doi: 10.1093/bioinformatics/btn331. Epub 2008 Jun 27.
3
Critical parameters for genome editing using zinc finger nucleases.
基于结构的药物设计中蛋白质-配体对接的挑战、应用及最新进展。
Molecules. 2014 Jul 11;19(7):10150-76. doi: 10.3390/molecules190710150.
使用锌指核酸酶进行基因组编辑的关键参数。
Mini Rev Med Chem. 2008 Jun;8(7):669-76. doi: 10.2174/138955708784567458.
4
Topology independent protein structural alignment.拓扑结构无关的蛋白质结构比对
BMC Bioinformatics. 2007 Oct 15;8:388. doi: 10.1186/1471-2105-8-388.
5
Acidic groups docked to well defined wetted pockets at the core of the binding interface: a tale of scoring and missing protein interactions in CAPRI.酸性基团对接在结合界面核心处明确界定的湿润口袋中:一个关于在CAPRI中评分和错过蛋白质相互作用的故事。
Proteins. 2007 Dec 1;69(4):786-92. doi: 10.1002/prot.21722.
6
SIMPLE estimate of the free energy change due to aliphatic mutations: superior predictions based on first principles.基于第一性原理的简单方法:对脂肪族突变引起的自由能变化的卓越预测。
Proteins. 2007 Sep 1;68(4):850-62. doi: 10.1002/prot.21453.
7
Correlation between functional and binding activities of designer zinc-finger proteins.设计型锌指蛋白的功能活性与结合活性之间的相关性
Biochem J. 2007 Apr 1;403(1):177-82. doi: 10.1042/BJ20061644.
8
Structure of Aart, a designed six-finger zinc finger peptide, bound to DNA.与DNA结合的Aart(一种设计的六指锌指肽)的结构。
J Mol Biol. 2006 Oct 20;363(2):405-21. doi: 10.1016/j.jmb.2006.08.016. Epub 2006 Aug 11.
9
Scoring a diverse set of high-quality docked conformations: a metascore based on electrostatic and desolvation interactions.对一系列多样的高质量对接构象进行评分:基于静电和去溶剂化相互作用的元评分。
Proteins. 2006 Jun 1;63(4):868-77. doi: 10.1002/prot.20932.
10
Protein-DNA binding specificity predictions with structural models.利用结构模型预测蛋白质与DNA的结合特异性
Nucleic Acids Res. 2005 Oct 24;33(18):5781-98. doi: 10.1093/nar/gki875. Print 2005.