Sagot M F, Viari A, Pothier J, Soldano H
Atelier de BioInformatique, CPASO-URA CNRS 448, Section de Physique et Chimie de l'Institut Curie, Paris, France.
Comput Appl Biosci. 1995 Feb;11(1):59-70. doi: 10.1093/bioinformatics/11.1.59.
Finding certain regularities in a text is an important problem in many areas, e.g. in the analysis of biological molecules such as nucleic acids or proteins. In the latter case, the text may be sequences of amino acids or a linear coding of three-dimensional structures, and the regularities then correspond to lexical or structural motifs common to two, or more, proteins. We first recall an earlier algorithm that found these regularities in a flexible way. Then we introduce a generalized version of this algorithm designed for the particular case of protein three-dimensional structures, since these structures present a few peculiarities that make them computationally harder to process. Finally, we give some applications of our new algorithm on concrete examples.
在许多领域中,在文本中发现特定规律是一个重要问题,例如在对核酸或蛋白质等生物分子的分析中。在后一种情况下,文本可能是氨基酸序列或三维结构的线性编码,此时规律对应于两种或更多种蛋白质共有的词汇或结构基序。我们首先回顾一种早期算法,该算法以灵活的方式发现这些规律。然后,我们介绍针对蛋白质三维结构的特定情况设计的该算法的广义版本,因为这些结构存在一些特殊性,使得它们在计算上更难处理。最后,我们给出新算法在具体示例上的一些应用。