WORDUP：一种用于在DNA序列中发现具有统计学意义模式的高效算法。

WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences.

作者信息

Pesole G, Prunella N, Liuni S, Attimonelli M, Saccone C

机构信息

Dipartimento di Biochimica e Biologia Molecolare, Università di Bari, Italy.

出版信息

Nucleic Acids Res. 1992 Jun 11;20(11):2871-5. doi: 10.1093/nar/20.11.2871.

DOI:10.1093/nar/20.11.2871

PMID:1614873

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC336935/

Abstract

We present here a fast and sensitive method designed to isolate short nucleotide sequences which have non-random statistical properties and may thus be biologically active. It is based on a first order Markov analysis and allows us to detect statistically significant sequence motifs from six to ten nucleotides long which are significantly shared (or avoided) in the sequences under investigation. This method has been tested on a set of 521 sequences extracted from the Eukaryotic Promoter Database (2). Our results demonstrate the accuracy and the efficiency of the method in that the sequence motifs which are known to act as eukaryotic promoters, such as the TATA-box and the CAAT-box, were clearly identified. In addition we have found other statistically significant motifs, the biological roles of which are yet to be clarified.

摘要

我们在此展示一种快速且灵敏的方法，该方法旨在分离具有非随机统计特性且可能因此具有生物活性的短核苷酸序列。它基于一阶马尔可夫分析，使我们能够检测长度为6至10个核苷酸的具有统计学意义的序列基序，这些基序在所研究的序列中被显著共享（或避免）。此方法已在从真核生物启动子数据库（2）中提取的一组521个序列上进行了测试。我们的结果证明了该方法的准确性和效率，因为已知作为真核生物启动子的序列基序，如TATA框和CAAT框，被清晰地识别出来。此外，我们还发现了其他具有统计学意义的基序，其生物学作用尚待阐明。

相似文献

WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences.WORDUP：一种用于在DNA序列中发现具有统计学意义模式的高效算法。

Nucleic Acids Res. 1992 Jun 11;20(11):2871-5. doi: 10.1093/nar/20.11.2871.

Eukaryotic promoter recognition by binding sites for transcription factors.通过转录因子结合位点进行真核生物启动子识别

Comput Appl Biosci. 1995 Oct;11(5):477-88. doi: 10.1093/bioinformatics/11.5.477.

Computer tool FUNSITE for analysis of eukaryotic regulatory genomic sequences.用于分析真核生物调控基因组序列的计算机工具FUNSITE

Proc Int Conf Intell Syst Mol Biol. 1995;3:197-205.

Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences.从502个不相关的启动子序列中得出的四种真核生物RNA聚合酶II启动子元件的权重矩阵描述。

J Mol Biol. 1990 Apr 20;212(4):563-78. doi: 10.1016/0022-2836(90)90223-9.

Predicting Pol II promoter sequences using transcription factor binding sites.利用转录因子结合位点预测RNA聚合酶II启动子序列

J Mol Biol. 1995 Jun 23;249(5):923-32. doi: 10.1006/jmbi.1995.0349.

Investigating extended regulatory regions of genomic DNA sequences.研究基因组DNA序列的扩展调控区域。

Bioinformatics. 1999 Jul-Aug;15(7-8):644-53. doi: 10.1093/bioinformatics/15.7.644.

SIMD parallelization of the WORDUP algorithm for detecting statistically significant patterns in DNA sequences.用于检测DNA序列中具有统计学意义模式的WORDUP算法的单指令多数据（SIMD）并行化。

Comput Appl Biosci. 1993 Dec;9(6):701-7. doi: 10.1093/bioinformatics/9.6.701.

MicroRNA promoter element discovery in Arabidopsis.拟南芥中微小RNA启动子元件的发现

RNA. 2006 Sep;12(9):1612-9. doi: 10.1261/rna.130506. Epub 2006 Aug 3.

Automatic extraction of position specific cooccurrence of transcription factor bindings on promoters.自动提取启动子上转录因子结合位点的位置特异性共现情况。

Pac Symp Biocomput. 1998:252-63.

Functional binding of the "TATA" box binding component of transcription factor TFIID to the -30 region of TATA-less promoters.转录因子TFIID的“TATA”盒结合组件与无TATA框启动子的-30区域的功能性结合。

Proc Natl Acad Sci U S A. 1992 Jul 1;89(13):5814-8. doi: 10.1073/pnas.89.13.5814.

引用本文的文献

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.原核生物基因组中间隔基序的无监督统计发现。

BMC Genomics. 2017 Jan 5;18(1):27. doi: 10.1186/s12864-016-3400-0.

An algorithm for identifying novel targets of transcription factor families: application to hypoxia-inducible factor 1 targets.一种用于识别转录因子家族新靶点的算法：应用于缺氧诱导因子1靶点

Cancer Inform. 2009;7:75-89. doi: 10.4137/cin.s1054. Epub 2009 Mar 4.

The RHNumtS compilation: features and bioinformatics approaches to locate and quantify Human NumtS.RHNumtS汇编：定位和量化人类核线粒体DNA插入序列（NumtS）的特征及生物信息学方法

BMC Genomics. 2008 Jun 3;9:267. doi: 10.1186/1471-2164-9-267.

A survey of DNA motif finding algorithms.DNA基序查找算法综述。

BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S21. doi: 10.1186/1471-2105-8-S7-S21.

Computational identification of transcriptional regulatory elements in DNA sequence.DNA序列中转录调控元件的计算识别

Nucleic Acids Res. 2006 Jul 19;34(12):3585-98. doi: 10.1093/nar/gkl372. Print 2006.

Remarkable sequence signatures in archaeal genomes.古菌基因组中显著的序列特征。

Archaea. 2003 Oct;1(3):185-90. doi: 10.1155/2003/458235.

Kangaroo--a pattern-matching program for biological sequences.袋鼠——一个用于生物序列的模式匹配程序。

BMC Bioinformatics. 2002 Jul 31;3:20. doi: 10.1186/1471-2105-3-20.

Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers.从独特调控区域中提取功能性结合位点：果蝇早期发育增强子

Genome Res. 2002 Mar;12(3):470-81. doi: 10.1101/gr.212502.

Atypical regions in large genomic DNA sequences.大型基因组DNA序列中的非典型区域。

Proc Natl Acad Sci U S A. 1994 Jul 19;91(15):7134-8. doi: 10.1073/pnas.91.15.7134.

Analysis of eukaryotic promoter sequences reveals a systematically occurring CT-signal.对真核生物启动子序列的分析揭示了一种系统出现的CT信号。

Nucleic Acids Res. 1995 Apr 11;23(7):1223-30. doi: 10.1093/nar/23.7.1223.

本文引用的文献

The structure and evolution of the human beta-globin gene family.人类β-珠蛋白基因家族的结构与进化

Cell. 1980 Oct;21(3):653-68. doi: 10.1016/0092-8674(80)90429-8.

Statistical characterization of nucleic acid sequence functional domains.核酸序列功能域的统计学特征

Nucleic Acids Res. 1983 Apr 11;11(7):2205-20. doi: 10.1093/nar/11.7.2205.

Organization and expression of eucaryotic split genes coding for proteins.编码蛋白质的真核生物断裂基因的组织与表达。

Annu Rev Biochem. 1981;50:349-83. doi: 10.1146/annurev.bi.50.070181.002025.

On the statistical significance of nucleic acid similarities.论核酸相似性的统计学意义。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):215-26. doi: 10.1093/nar/12.1part1.215.

Doublet frequencies in evolutionary distinct groups.进化上不同群体中的双重频率。

Nucleic Acids Res. 1984 Feb 10;12(3):1749-63. doi: 10.1093/nar/12.3.1749.

A comprehensive set of sequence analysis programs for the VAX.一套适用于VAX的综合序列分析程序。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387-95. doi: 10.1093/nar/12.1part1.387.

Signal search analysis: a new method to localize and characterize functionally important DNA sequences.信号搜索分析：一种定位和表征功能重要DNA序列的新方法。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):287-305. doi: 10.1093/nar/12.1part1.287.

Pattern recognition in several sequences: consensus and alignment.多个序列中的模式识别：共有序列与比对

Bull Math Biol. 1984;46(4):515-27. doi: 10.1007/BF02459500.

The repeated GC-rich motifs upstream from the TATA box are important elements of the SV40 early promoter.TATA框上游富含GC的重复基序是SV40早期启动子的重要元件。

Nucleic Acids Res. 1983 Apr 25;11(8):2447-64. doi: 10.1093/nar/11.8.2447.

Eukaryotic dinucleotide preference rules and their implications for degenerate codon usage.真核生物二核苷酸偏好规则及其对简并密码子使用的影响。

J Mol Biol. 1981 Jun 15;149(1):125-31. doi: 10.1016/0022-2836(81)90264-3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

WORDUP：一种用于在DNA序列中发现具有统计学意义模式的高效算法。

WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences.

作者信息

Pesole G, Prunella N, Liuni S, Attimonelli M, Saccone C

机构信息

Dipartimento di Biochimica e Biologia Molecolare, Università di Bari, Italy.

出版信息

Nucleic Acids Res. 1992 Jun 11;20(11):2871-5. doi: 10.1093/nar/20.11.2871.

DOI:10.1093/nar/20.11.2871

PMID:1614873

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC336935/

Abstract

摘要

WORDUP：一种用于在DNA序列中发现具有统计学意义模式的高效算法。

WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

WORDUP：一种用于在DNA序列中发现具有统计学意义模式的高效算法。

WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences.

作者信息

机构信息

出版信息