Jonassen I, Collins J F, Higgins D G
Department of Informatics, University of Bergen, HIB, Norway.
Protein Sci. 1995 Aug;4(8):1587-95. doi: 10.1002/pro.5560040817.
We present a new method for the identification of conserved patterns in a set of unaligned related protein sequences. It is able to discover patterns of a quite general form, allowing for both ambiguous positions and for variable length wildcard regions. It allows the user to define a class of patterns (e.g., the degree of ambiguity allowed and the length and number of gaps), and the method is then guaranteed to find the conserved patterns in this class scoring highest according to a significance measure defined. Identified patterns may be refined using one of two new algorithms. We present a new (nonstatistical) significance measure for flexible patterns. The method is shown to recover known motifs for PROSITE families and is also applied to some recently described families from the literature.
我们提出了一种新方法,用于识别一组未比对的相关蛋白质序列中的保守模式。该方法能够发现形式相当通用的模式,允许存在模糊位置和可变长度的通配符区域。它允许用户定义一类模式(例如,允许的模糊程度以及间隙的长度和数量),然后该方法保证能找到根据所定义的显著性度量在此类中得分最高的保守模式。可使用两种新算法之一对识别出的模式进行优化。我们提出了一种针对灵活模式的新的(非统计)显著性度量。结果表明,该方法能够找回PROSITE家族的已知基序,并且还应用于文献中最近描述的一些家族。