预测同源域蛋白 DNA 结合特异性的识别模型。

Recognition models to predict DNA-binding specificities of homeodomain proteins.

机构信息

Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA.

出版信息

Bioinformatics. 2012 Jun 15;28(12):i84-9. doi: 10.1093/bioinformatics/bts202.

DOI:10.1093/bioinformatics/bts202

PMID:22689783

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3371834/

Abstract

MOTIVATION

Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C(2)H(2) zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes.

RESULTS

Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests (RFs) can produce recognition models for HD proteins that are significant improvements over KNN-based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http://stormo.wustl.edu/PreMoTF), for predicting position frequency matrices from protein sequence using a RF-based model.

摘要

动机

识别蛋白质-DNA 相互作用的模型，仅基于其序列或通过合理设计改变特异性，允许预测 DNA 结合域的特异性，长期以来一直是计算生物学的目标。在构建有用的模型方面已经取得了一些进展，特别是对于 C(2)H(2)锌指蛋白，但这仍然是一个具有很大改进空间的具有挑战性的问题。对于大多数转录因子家族，最好的可用方法是使用 K-最近邻 (KNN) 算法根据具有明确定义特异性的 k 个最相似蛋白质的特异性平均值来进行特异性预测。同源域 (HD) 蛋白是大多数后生动物基因组中仅次于锌指的第二大转录因子家族，因此，针对该家族的有效识别模型将有助于预测这些基因组中许多转录调控网络的模型。

结果

我们使用广泛的实验数据测试了几种机器学习方法，发现支持向量机和随机森林 (RF) 都可以为 HD 蛋白生成识别模型，这些模型比基于 KNN 的方法有显著改进。交叉验证分析表明，所得模型能够以高精度预测特异性。我们已经开发了一个基于网络的预测工具 PreMoTF（转录因子的预测基序）（http://stormo.wustl.edu/PreMoTF），用于使用基于 RF 的模型从蛋白质序列预测位置频率矩阵。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f954/3371834/f1fdb894a008/bts202f1.jpg

相似文献

Recognition models to predict DNA-binding specificities of homeodomain proteins.预测同源域蛋白 DNA 结合特异性的识别模型。

Bioinformatics. 2012 Jun 15;28(12):i84-9. doi: 10.1093/bioinformatics/bts202.

Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors.C2H2型锌指转录因子的上下文依赖型DNA识别密码

Bioinformatics. 2008 Sep 1;24(17):1850-7. doi: 10.1093/bioinformatics/btn331. Epub 2008 Jun 27.

Global analysis of Drosophila Cys₂-His₂ zinc finger proteins reveals a multitude of novel recognition motifs and binding determinants.对果蝇 Cys₂-His₂锌指蛋白的全局分析揭示了大量新的识别基序和结合决定因素。

Genome Res. 2013 Jun;23(6):928-40. doi: 10.1101/gr.151472.112. Epub 2013 Mar 7.

An improved predictive recognition model for Cys(2)-His(2) zinc finger proteins.一种改进的 Cys(2)-His(2) 锌指蛋白预测识别模型。

Nucleic Acids Res. 2014 Apr;42(8):4800-12. doi: 10.1093/nar/gku132. Epub 2014 Feb 12.

ZiF-Predict: a web tool for predicting DNA-binding specificity in C2H2 zinc finger proteins.ZiF-Predict：用于预测 C2H2 锌指蛋白 DNA 结合特异性的网络工具。

Genomics Proteomics Bioinformatics. 2010 Jun;8(2):122-6. doi: 10.1016/S1672-0229(10)60013-7.

Prediction of DNA-binding residues from protein sequence information using random forests.利用随机森林从蛋白质序列信息预测DNA结合残基。

BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-10-S1-S1.

Combination of a zinc finger and homeodomain required for protein-interaction.蛋白质相互作用所需的锌指结构与同源结构域的组合。

Mol Biol Rep. 2003 Dec;30(4):199-206. doi: 10.1023/a:1026330907065.

De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins.从头预测 Cys2His2 锌指蛋白的 DNA 结合特异性。

Nucleic Acids Res. 2014 Jan;42(1):97-108. doi: 10.1093/nar/gkt890. Epub 2013 Oct 3.

Structure-based prediction of C2H2 zinc-finger binding specificity: sensitivity to docking geometry.基于结构的C2H2锌指结合特异性预测：对接几何结构的敏感性

Nucleic Acids Res. 2007;35(4):1085-97. doi: 10.1093/nar/gkl1155. Epub 2007 Jan 30.

An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA.一种用于阐明锌指蛋白与其靶DNA之间相互作用的集成微神经网络方法。

BMC Genomics. 2016 Dec 22;17(Suppl 13):1033. doi: 10.1186/s12864-016-3323-9.

引用本文的文献

Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning.利用家族水平的具有生物物理可解释性的机器学习预测转录因子突变体的DNA结合特异性。

Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf831.

Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.蛋白质序列中核酸结合残基预测二十年进展

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.

Current and future directions in network biology.网络生物学的当前与未来发展方向。

Bioinform Adv. 2024 Aug 14;4(1):vbae099. doi: 10.1093/bioadv/vbae099. eCollection 2024.

Geometric deep learning of protein-DNA binding specificity.蛋白质-DNA 结合特异性的几何深度学习。

Nat Methods. 2024 Sep;21(9):1674-1683. doi: 10.1038/s41592-024-02372-w. Epub 2024 Aug 5.

Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in -regulatory elements.基于结构的学习，用于预测和建模调控元件中的蛋白质-DNA相互作用及转录因子协同作用。

NAR Genom Bioinform. 2024 Jun 12;6(2):lqae068. doi: 10.1093/nargab/lqae068. eCollection 2024 Jun.

DNA binding analysis of rare variants in homeodomains reveals homeodomain specificity-determining residues.在家蝶结构域中稀有变异的 DNA 结合分析揭示了决定同源结构域特异性的残基。

Nat Commun. 2024 Apr 10;15(1):3110. doi: 10.1038/s41467-024-47396-0.

bioRxiv. 2025 Apr 2:2024.01.24.577115. doi: 10.1101/2024.01.24.577115.

The Arabidopsis Nodulin Homeobox Factor AtNDX Interacts with AtRING1A/B and Negatively Regulates Abscisic Acid Signaling.拟南芥类钙调蛋白同源盒因子 AtNDX 与 AtRING1A/B 相互作用，负调控脱落酸信号。

Plant Cell. 2020 Mar;32(3):703-721. doi: 10.1105/tpc.19.00604. Epub 2020 Jan 9.

Sharing DNA-binding information across structurally similar proteins enables accurate specificity determination.在结构相似的蛋白质之间共享 DNA 结合信息可实现特异性的准确判断。

Nucleic Acids Res. 2020 Jan 24;48(2):e9. doi: 10.1093/nar/gkz1087.

Dissecting the sharp response of a canonical developmental enhancer reveals multiple sources of cooperativity.解析一个典型发育增强子的尖锐反应揭示了协同作用的多种来源。

Elife. 2019 Jun 21;8:e41266. doi: 10.7554/eLife.41266.

本文引用的文献

Quantitative analysis demonstrates most transcription factors require only simple models of specificity.定量分析表明，大多数转录因子只需要简单的特异性模型。

Nat Biotechnol. 2011 Jun 7;29(6):480-3. doi: 10.1038/nbt.1893.

An expanded binding model for Cys2His2 zinc finger protein-DNA interfaces.Cys2His2 锌指蛋白-DNA 界面的扩展结合模型。

Phys Biol. 2011 Jun;8(3):035010. doi: 10.1088/1478-3975/8/3/035010. Epub 2011 May 13.

Maximally efficient modeling of DNA sequence motifs at all levels of complexity.在所有复杂程度下对 DNA 序列基元进行最有效的建模。

Genetics. 2011 Apr;187(4):1219-24. doi: 10.1534/genetics.110.126052. Epub 2011 Feb 7.

FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system.果蝇因子调查：一个使用细菌单杂交系统确定的果蝇转录因子结合特异性数据库。

Nucleic Acids Res. 2011 Jan;39(Database issue):D111-7. doi: 10.1093/nar/gkq858. Epub 2010 Nov 19.

Determining the specificity of protein-DNA interactions.确定蛋白质-DNA 相互作用的特异性。

Nat Rev Genet. 2010 Nov;11(11):751-60. doi: 10.1038/nrg2845. Epub 2010 Sep 28.

The Pfam protein families database.Pfam 蛋白质家族数据库。

Nucleic Acids Res. 2010 Jan;38(Database issue):D211-22. doi: 10.1093/nar/gkp985. Epub 2009 Nov 17.

High-resolution DNA-binding specificity analysis of yeast transcription factors.酵母转录因子的高分辨率DNA结合特异性分析

Genome Res. 2009 Apr;19(4):556-66. doi: 10.1101/gr.090233.108. Epub 2009 Jan 21.

Predicting the binding preference of transcription factors to individual DNA k-mers.预测转录因子与单个DNA k聚体的结合偏好性。

Bioinformatics. 2009 Apr 15;25(8):1012-8. doi: 10.1093/bioinformatics/btn645. Epub 2008 Dec 16.

Predicting DNA recognition by Cys2His2 zinc finger proteins.预测Cys2His2型锌指蛋白对DNA的识别

Bioinformatics. 2009 Jan 1;25(1):22-9. doi: 10.1093/bioinformatics/btn580. Epub 2008 Nov 13.

UniPROBE: an online database of protein binding microarray data on protein-DNA interactions.UniPROBE：一个关于蛋白质与DNA相互作用的蛋白质结合微阵列数据在线数据库。

Nucleic Acids Res. 2009 Jan;37(Database issue):D77-82. doi: 10.1093/nar/gkn660. Epub 2008 Oct 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

预测同源域蛋白 DNA 结合特异性的识别模型。

Recognition models to predict DNA-binding specificities of homeodomain proteins.

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献