计算机辅助预测、分类及界定核酸中的蛋白质结合位点

Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids.

作者信息

Frech K, Herrmann G, Werner T

机构信息

Institut für Säugetiergenetik, GSF-Forschungszentrum für Umwelt und Gesundheit mbH, Neuherberg, Germany.

出版信息

Nucleic Acids Res. 1993 Apr 11;21(7):1655-64. doi: 10.1093/nar/21.7.1655.

DOI:10.1093/nar/21.7.1655

PMID:8479918

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC309377/

Abstract

We present a method to determine the location and extent of protein binding regions in nucleic acids by computer-assisted analysis of sequence data. The program ConsIndex establishes a library of consensus descriptions based on sequence sets containing known regulatory elements. These defined consensus descriptions are used by the program ConsInspector to predict binding sites in new sequences. We show the programs to correctly determine the significant regions involved in transcriptional control of seven sequence elements. The internal profile of relative variability of individual nucleotide positions within these regions paralleled experimental profiles of biological significance. Consensus descriptions are determined by employing an anchored alignment scheme, the results of which are then evaluated by a novel method which is superior to cluster algorithms. The alignment procedure is able to include several closely related sequences without biasing the consensus description. Moreover, the algorithm detects additional elements on the basis of a moderate distance correlation and is capable of discriminating between real binding sites and false positive matches. The software is well suited to cope with the frequent phenomenon of optional elements present in a subset of functionally similar sequences, while taking maximal advantage of the existing sequence data base. Since it requires only a minimum of seven sequences for a single element, it is applicable to a wide range of binding sites.

摘要

我们提出了一种通过对序列数据进行计算机辅助分析来确定核酸中蛋白质结合区域的位置和范围的方法。ConsIndex程序基于包含已知调控元件的序列集建立了一个共有描述库。ConsInspector程序使用这些定义的共有描述来预测新序列中的结合位点。我们展示了这些程序能够正确确定七个序列元件转录调控中涉及的重要区域。这些区域内各个核苷酸位置的相对变异性的内部概况与具有生物学意义的实验概况相似。共有描述是通过采用一种锚定比对方案来确定的，然后通过一种优于聚类算法的新方法对其结果进行评估。比对过程能够纳入几个密切相关的序列而不会使共有描述产生偏差。此外，该算法基于适度的距离相关性检测额外的元件，并且能够区分真实的结合位点和假阳性匹配。该软件非常适合处理功能相似序列子集中存在的可选元件这一常见现象，同时最大程度地利用现有的序列数据库。由于单个元件仅需要最少七个序列，所以它适用于广泛的结合位点。

相似文献

Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids.

Nucleic Acids Res. 1993 Apr 11;21(7):1655-64. doi: 10.1093/nar/21.7.1655.

Specific modelling of regulatory units in DNA sequences.

Pac Symp Biocomput. 1997:151-62.

Computer tool FUNSITE for analysis of eukaryotic regulatory genomic sequences.

Proc Int Conf Intell Syst Mol Biol. 1995;3:197-205.

Context specific transcription factor prediction.

Ann Biomed Eng. 2007 Jun;35(6):1053-67. doi: 10.1007/s10439-007-9268-z. Epub 2007 Mar 22.

Prediction of cis-regulatory elements: from high-information content analysis to motif identification.

J Bioinform Comput Biol. 2007 Aug;5(4):817-38. doi: 10.1142/s021972000700293x.

CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting.

Genome Res. 2004 Jan;14(1):170-8. doi: 10.1101/gr.1642804. Epub 2003 Dec 12.

Simultaneous alignment and annotation of cis-regulatory regions.

Bioinformatics. 2007 Jan 15;23(2):e44-9. doi: 10.1093/bioinformatics/btl305.

Discriminative discovery of transcription factor binding sites from location data.

Proc IEEE Comput Syst Bioinform Conf. 2005:86-9. doi: 10.1109/csb.2005.30.

cWINNOWER algorithm for finding fuzzy DNA motifs.

Proc IEEE Comput Soc Bioinform Conf. 2003;2:260-5.

MatInspector and beyond: promoter analysis based on transcription factor binding sites.

Bioinformatics. 2005 Jul 1;21(13):2933-42. doi: 10.1093/bioinformatics/bti473. Epub 2005 Apr 28.

引用本文的文献

Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

PLoS One. 2014 May 2;9(5):e96694. doi: 10.1371/journal.pone.0096694. eCollection 2014.

ALU repeats in promoters are position-dependent co-response elements (coRE) that enhance or repress transcription by dimeric and monomeric progesterone receptors.

Mol Endocrinol. 2009 Jul;23(7):989-1000. doi: 10.1210/me.2009-0048. Epub 2009 Apr 16.

Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data.

BMC Bioinformatics. 2006 Apr 27;7:229. doi: 10.1186/1471-2105-7-229.

The design of transcription-factor binding sites is affected by combinatorial regulation.

Genome Biol. 2005;6(12):R103. doi: 10.1186/gb-2005-6-12-r103. Epub 2005 Dec 2.

Detecting DNA regulatory motifs by incorporating positional trends in information content.

Genome Biol. 2004;5(7):R50. doi: 10.1186/gb-2004-5-7-r50. Epub 2004 Jun 24.

From sequence to structure and back again: approaches for predicting protein-DNA binding.

Proteome Sci. 2004 Jun 17;2(1):3. doi: 10.1186/1477-5956-2-3.

Bioinformatic identification of novel regulatory DNA sequence motifs in Streptomyces coelicolor.

BMC Microbiol. 2004 Apr 8;4:14. doi: 10.1186/1471-2180-4-14.

AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors.

BMC Bioinformatics. 2003 Jun 23;4:25. doi: 10.1186/1471-2105-4-25.

Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements.

J Biol. 2003;2(2):11. doi: 10.1186/1475-4924-2-11. Epub 2003 Jun 6.

Integrated functional and bioinformatics approach for the identification and experimental verification of RNA signals: application to HIV-1 INS.

Nucleic Acids Res. 2003 Jun 1;31(11):2839-51. doi: 10.1093/nar/gkg390.

本文引用的文献

Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli.

J Mol Biol. 1985 Nov 5;186(1):117-28. doi: 10.1016/0022-2836(85)90262-1.

Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates.

Nucleic Acids Res. 1987 Feb 25;15(4):1353-61. doi: 10.1093/nar/15.4.1353.

Compilation and analysis of eukaryotic POL II promoter sequences.

Nucleic Acids Res. 1986 Dec 22;14(24):10009-26. doi: 10.1093/nar/14.24.10009.

Information content of binding sites on nucleotide sequences.

J Mol Biol. 1986 Apr 5;188(3):415-31. doi: 10.1016/0022-2836(86)90165-8.

A multiplicity of CCAAT box-binding proteins.

Cell. 1987 Sep 11;50(6):863-72. doi: 10.1016/0092-8674(87)90513-7.

Recognition of characteristic patterns in sets of functionally equivalent DNA sequences.

Comput Appl Biosci. 1987 Sep;3(3):223-7. doi: 10.1093/bioinformatics/3.3.223.

Compilation of transcription regulating proteins.

Nucleic Acids Res. 1988 Mar 25;16(5):1879-902. doi: 10.1093/nar/16.5.1879.

The jun proto-oncogene is positively autoregulated by its product, Jun/AP-1.

Cell. 1988 Dec 2;55(5):875-85. doi: 10.1016/0092-8674(88)90143-2.

Progressive sequence alignment as a prerequisite to correct phylogenetic trees.

J Mol Evol. 1987;25(4):351-60. doi: 10.1007/BF02603120.

Purified transcription factor AP-1 interacts with TPA-inducible enhancer elements.

Cell. 1987 Jun 19;49(6):741-52. doi: 10.1016/0092-8674(87)90612-x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

计算机辅助预测、分类及界定核酸中的蛋白质结合位点

Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献