• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

预测同源域蛋白 DNA 结合特异性的识别模型。

Recognition models to predict DNA-binding specificities of homeodomain proteins.

机构信息

Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA.

出版信息

Bioinformatics. 2012 Jun 15;28(12):i84-9. doi: 10.1093/bioinformatics/bts202.

DOI:10.1093/bioinformatics/bts202
PMID:22689783
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3371834/
Abstract

MOTIVATION

Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C(2)H(2) zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes.

RESULTS

Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests (RFs) can produce recognition models for HD proteins that are significant improvements over KNN-based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http://stormo.wustl.edu/PreMoTF), for predicting position frequency matrices from protein sequence using a RF-based model.

摘要

动机

识别蛋白质-DNA 相互作用的模型,仅基于其序列或通过合理设计改变特异性,允许预测 DNA 结合域的特异性,长期以来一直是计算生物学的目标。在构建有用的模型方面已经取得了一些进展,特别是对于 C(2)H(2)锌指蛋白,但这仍然是一个具有很大改进空间的具有挑战性的问题。对于大多数转录因子家族,最好的可用方法是使用 K-最近邻 (KNN) 算法根据具有明确定义特异性的 k 个最相似蛋白质的特异性平均值来进行特异性预测。同源域 (HD) 蛋白是大多数后生动物基因组中仅次于锌指的第二大转录因子家族,因此,针对该家族的有效识别模型将有助于预测这些基因组中许多转录调控网络的模型。

结果

我们使用广泛的实验数据测试了几种机器学习方法,发现支持向量机和随机森林 (RF) 都可以为 HD 蛋白生成识别模型,这些模型比基于 KNN 的方法有显著改进。交叉验证分析表明,所得模型能够以高精度预测特异性。我们已经开发了一个基于网络的预测工具 PreMoTF(转录因子的预测基序)(http://stormo.wustl.edu/PreMoTF),用于使用基于 RF 的模型从蛋白质序列预测位置频率矩阵。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/f999d5d1bc5f/bts202f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/f1fdb894a008/bts202f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/211f509abbf8/bts202f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/f577f96136b4/bts202f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/c163fe3a8fb1/bts202f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/f999d5d1bc5f/bts202f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/f1fdb894a008/bts202f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/211f509abbf8/bts202f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/f577f96136b4/bts202f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/c163fe3a8fb1/bts202f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/f999d5d1bc5f/bts202f5.jpg

相似文献

1
Recognition models to predict DNA-binding specificities of homeodomain proteins.预测同源域蛋白 DNA 结合特异性的识别模型。
Bioinformatics. 2012 Jun 15;28(12):i84-9. doi: 10.1093/bioinformatics/bts202.
2
Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors.C2H2型锌指转录因子的上下文依赖型DNA识别密码
Bioinformatics. 2008 Sep 1;24(17):1850-7. doi: 10.1093/bioinformatics/btn331. Epub 2008 Jun 27.
3
Global analysis of Drosophila Cys₂-His₂ zinc finger proteins reveals a multitude of novel recognition motifs and binding determinants.对果蝇 Cys₂-His₂锌指蛋白的全局分析揭示了大量新的识别基序和结合决定因素。
Genome Res. 2013 Jun;23(6):928-40. doi: 10.1101/gr.151472.112. Epub 2013 Mar 7.
4
An improved predictive recognition model for Cys(2)-His(2) zinc finger proteins.一种改进的 Cys(2)-His(2) 锌指蛋白预测识别模型。
Nucleic Acids Res. 2014 Apr;42(8):4800-12. doi: 10.1093/nar/gku132. Epub 2014 Feb 12.
5
ZiF-Predict: a web tool for predicting DNA-binding specificity in C2H2 zinc finger proteins.ZiF-Predict:用于预测 C2H2 锌指蛋白 DNA 结合特异性的网络工具。
Genomics Proteomics Bioinformatics. 2010 Jun;8(2):122-6. doi: 10.1016/S1672-0229(10)60013-7.
6
Prediction of DNA-binding residues from protein sequence information using random forests.利用随机森林从蛋白质序列信息预测DNA结合残基。
BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-10-S1-S1.
7
Combination of a zinc finger and homeodomain required for protein-interaction.蛋白质相互作用所需的锌指结构与同源结构域的组合。
Mol Biol Rep. 2003 Dec;30(4):199-206. doi: 10.1023/a:1026330907065.
8
De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins.从头预测 Cys2His2 锌指蛋白的 DNA 结合特异性。
Nucleic Acids Res. 2014 Jan;42(1):97-108. doi: 10.1093/nar/gkt890. Epub 2013 Oct 3.
9
Structure-based prediction of C2H2 zinc-finger binding specificity: sensitivity to docking geometry.基于结构的C2H2锌指结合特异性预测:对接几何结构的敏感性
Nucleic Acids Res. 2007;35(4):1085-97. doi: 10.1093/nar/gkl1155. Epub 2007 Jan 30.
10
An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA.一种用于阐明锌指蛋白与其靶DNA之间相互作用的集成微神经网络方法。
BMC Genomics. 2016 Dec 22;17(Suppl 13):1033. doi: 10.1186/s12864-016-3323-9.

引用本文的文献

1
Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning.利用家族水平的具有生物物理可解释性的机器学习预测转录因子突变体的DNA结合特异性。
Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf831.
2
Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.蛋白质序列中核酸结合残基预测二十年进展
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.
3
Current and future directions in network biology.网络生物学的当前与未来发展方向。

本文引用的文献

1
Quantitative analysis demonstrates most transcription factors require only simple models of specificity.定量分析表明,大多数转录因子只需要简单的特异性模型。
Nat Biotechnol. 2011 Jun 7;29(6):480-3. doi: 10.1038/nbt.1893.
2
An expanded binding model for Cys2His2 zinc finger protein-DNA interfaces.Cys2His2 锌指蛋白-DNA 界面的扩展结合模型。
Phys Biol. 2011 Jun;8(3):035010. doi: 10.1088/1478-3975/8/3/035010. Epub 2011 May 13.
3
Maximally efficient modeling of DNA sequence motifs at all levels of complexity.在所有复杂程度下对 DNA 序列基元进行最有效的建模。
Bioinform Adv. 2024 Aug 14;4(1):vbae099. doi: 10.1093/bioadv/vbae099. eCollection 2024.
4
Geometric deep learning of protein-DNA binding specificity.蛋白质-DNA 结合特异性的几何深度学习。
Nat Methods. 2024 Sep;21(9):1674-1683. doi: 10.1038/s41592-024-02372-w. Epub 2024 Aug 5.
5
Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in -regulatory elements.基于结构的学习,用于预测和建模调控元件中的蛋白质-DNA相互作用及转录因子协同作用。
NAR Genom Bioinform. 2024 Jun 12;6(2):lqae068. doi: 10.1093/nargab/lqae068. eCollection 2024 Jun.
6
DNA binding analysis of rare variants in homeodomains reveals homeodomain specificity-determining residues.在家蝶结构域中稀有变异的 DNA 结合分析揭示了决定同源结构域特异性的残基。
Nat Commun. 2024 Apr 10;15(1):3110. doi: 10.1038/s41467-024-47396-0.
7
Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning.利用家族水平的具有生物物理可解释性的机器学习预测转录因子突变体的DNA结合特异性
bioRxiv. 2025 Apr 2:2024.01.24.577115. doi: 10.1101/2024.01.24.577115.
8
The Arabidopsis Nodulin Homeobox Factor AtNDX Interacts with AtRING1A/B and Negatively Regulates Abscisic Acid Signaling.拟南芥类钙调蛋白同源盒因子 AtNDX 与 AtRING1A/B 相互作用,负调控脱落酸信号。
Plant Cell. 2020 Mar;32(3):703-721. doi: 10.1105/tpc.19.00604. Epub 2020 Jan 9.
9
Sharing DNA-binding information across structurally similar proteins enables accurate specificity determination.在结构相似的蛋白质之间共享 DNA 结合信息可实现特异性的准确判断。
Nucleic Acids Res. 2020 Jan 24;48(2):e9. doi: 10.1093/nar/gkz1087.
10
Dissecting the sharp response of a canonical developmental enhancer reveals multiple sources of cooperativity.解析一个典型发育增强子的尖锐反应揭示了协同作用的多种来源。
Elife. 2019 Jun 21;8:e41266. doi: 10.7554/eLife.41266.
Genetics. 2011 Apr;187(4):1219-24. doi: 10.1534/genetics.110.126052. Epub 2011 Feb 7.
4
FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system.果蝇因子调查:一个使用细菌单杂交系统确定的果蝇转录因子结合特异性数据库。
Nucleic Acids Res. 2011 Jan;39(Database issue):D111-7. doi: 10.1093/nar/gkq858. Epub 2010 Nov 19.
5
Determining the specificity of protein-DNA interactions.确定蛋白质-DNA 相互作用的特异性。
Nat Rev Genet. 2010 Nov;11(11):751-60. doi: 10.1038/nrg2845. Epub 2010 Sep 28.
6
The Pfam protein families database.Pfam 蛋白质家族数据库。
Nucleic Acids Res. 2010 Jan;38(Database issue):D211-22. doi: 10.1093/nar/gkp985. Epub 2009 Nov 17.
7
High-resolution DNA-binding specificity analysis of yeast transcription factors.酵母转录因子的高分辨率DNA结合特异性分析
Genome Res. 2009 Apr;19(4):556-66. doi: 10.1101/gr.090233.108. Epub 2009 Jan 21.
8
Predicting the binding preference of transcription factors to individual DNA k-mers.预测转录因子与单个DNA k聚体的结合偏好性。
Bioinformatics. 2009 Apr 15;25(8):1012-8. doi: 10.1093/bioinformatics/btn645. Epub 2008 Dec 16.
9
Predicting DNA recognition by Cys2His2 zinc finger proteins.预测Cys2His2型锌指蛋白对DNA的识别
Bioinformatics. 2009 Jan 1;25(1):22-9. doi: 10.1093/bioinformatics/btn580. Epub 2008 Nov 13.
10
UniPROBE: an online database of protein binding microarray data on protein-DNA interactions.UniPROBE:一个关于蛋白质与DNA相互作用的蛋白质结合微阵列数据在线数据库。
Nucleic Acids Res. 2009 Jan;37(Database issue):D77-82. doi: 10.1093/nar/gkn660. Epub 2008 Oct 8.