Staden R
Medical Research Council Laboratory of Molecular Biology, Cambridge, UK.
Comput Appl Biosci. 1988 Mar;4(1):53-60. doi: 10.1093/bioinformatics/4.1.53.
A method to define and search for complex patterns of motifs in nucleic acid and protein sequences is described. With this method nucleic acid motifs can be defined in eight different ways and protein motifs in six. A pattern is defined by a list of motifs. The motifs in a list are combined using the logical operators AND, OR and NOT. The list also defines the ranges of allowed separations of the motifs in the pattern. Programs to search for patterns in individual sequences and libraries of sequences are described. Patterns are defined by users and stored as annotated disk files. Hence the programming to define and locate new structures can be performed by users and fewer specific novel algorithms should be required. Examples are given of searches for transcription initiation regions, nematode mitochondrial tRNA genes and for members of the globin sequence family.
本文描述了一种在核酸和蛋白质序列中定义和搜索复杂基序模式的方法。通过这种方法,核酸基序可以用八种不同方式定义,蛋白质基序可以用六种方式定义。一个模式由一组基序列表定义。列表中的基序通过逻辑运算符“与”“或”“非”进行组合。该列表还定义了模式中基序允许的间隔范围。文中描述了在单个序列和序列库中搜索模式的程序。模式由用户定义并存储为带注释的磁盘文件。因此,用户可以进行定义和定位新结构的编程,所需的特定新颖算法应会更少。文中给出了搜索转录起始区域、线虫线粒体tRNA基因以及珠蛋白序列家族成员的示例。