Siret Research Group, Faculty of Mathematics and Physics, Charles University in Prague, Malostranské nám, 25, 118 00 Prague, Czech Republic.
Proteome Sci. 2011 Oct 14;9 Suppl 1(Suppl 1):S20. doi: 10.1186/1477-5956-9-S1-S20.
Similarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field.
We propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of local similarity in the process of feature extraction. SProt's features are based on spherical spatial neighborhood of amino acids where similarity can be well-defined. On top of the partial local similarities, global measure assessing similarity to a pair of protein structures is built. Finally, indexing is applied making the search process by an order of magnitude faster.
The proposed method outperforms other methods in classification accuracy on SCOP superfamily and fold level, while it is at least comparable to the best existing solutions in terms of precision-recall or quality of alignment.
蛋白质数据库中的相似性搜索是计算蛋白质组学中最基本的问题之一。随着实验解析的蛋白质结构数量的增加,研究重点已经从序列转移到结构。结构相似性领域形成了一个巨大的挑战,因为即使在该领域也没有最佳结构相似性的标准定义。
我们提出了一种称为 SProt 的蛋白质结构相似性度量方法。SProt 在特征提取过程中专注于高质量建模局部相似性。SProt 的特征基于氨基酸的球形空间邻域,在该邻域中可以很好地定义相似性。在局部相似性的基础上,构建了一种评估蛋白质结构对相似度的全局度量方法。最后,应用索引使搜索过程快了一个数量级。
该方法在 SCOP 超家族和折叠水平的分类准确性方面优于其他方法,而在精度-召回率或对齐质量方面至少与最好的现有解决方案相当。