Pisanti Nadia, Soldano Henry, Carpentier Mathilde, Pothier Joel
Dipartimento di Informatica, Largo B. Pontecorvo, Università di Pisa, Pisa, Italy.
J Comput Biol. 2009 Dec;16(12):1635-60. doi: 10.1089/cmb.2008.0019.
The geometrical configurations of atoms in protein structures can be viewed as approximate relations among them. Then, finding similar common substructures within a set of protein structures belongs to a new class of problems that generalizes that of finding repeated motifs. The novelty lies in the addition of constraints on the motifs in terms of relations that must hold between pairs of positions of the motifs. We will hence denote them as relational motifs. For this class of problems, we present an algorithm that is a suitable extension of the KMR paradigm and, in particular, of the KMRC as it uses a degenerate alphabet. Our algorithm contains several improvements that become especially useful when-as it is required for relational motifs-the inference is made by partially overlapping shorter motifs, rather than concatenating them. The efficiency, correctness and completeness of the algorithm is ensured by several non-trivial properties that are proven in this paper. The algorithm has been applied in the important field of protein common 3D substructure searching. The methods implemented have been tested on several examples of protein families such as serine proteases, globins and cytochromes P450 additionally. The detected motifs have been compared to those found by multiple structural alignments methods.
蛋白质结构中原子的几何构型可被视为它们之间的近似关系。那么,在一组蛋白质结构中寻找相似的共同子结构属于一类新的问题,它是对寻找重复基序问题的推广。其新颖之处在于,根据基序位置对之间必须成立的关系,对基序添加了约束。因此,我们将它们称为关系基序。对于这类问题,我们提出了一种算法,它是KMR范式的合适扩展,特别是KMRC的扩展,因为它使用了简并字母表。我们的算法包含多项改进,当(正如关系基序所要求的那样)通过部分重叠较短基序而非连接它们来进行推理时,这些改进会变得特别有用。本文证明的几个重要特性确保了算法的效率、正确性和完整性。该算法已应用于蛋白质常见三维子结构搜索这一重要领域。此外,所实现的方法已在丝氨酸蛋白酶、球蛋白和细胞色素P450等多个蛋白质家族的实例上进行了测试。已将检测到的基序与通过多结构比对方法找到的基序进行了比较。