Univ Paris Diderot, Sorbonne Paris Cité, Molécules Thérapeutiques in Silico, UMR 973, F-75205 Paris, France, INSERM, U973, F-75205 Paris, France and Univ Paris Diderot, Ressources Parisiennes de Bioinformatique Structurale, F-75205 Paris, France.
Bioinformatics. 2014 Mar 15;30(6):784-91. doi: 10.1093/bioinformatics/btt618. Epub 2013 Oct 27.
Meaningful scores to assess protein structure similarity are essential to decipher protein structure and sequence evolution. The mining of the increasing number of protein structures requires fast and accurate similarity measures with statistical significance. Whereas numerous approaches have been proposed for protein domains as a whole, the focus is progressively moving to a more local level of structure analysis for which similarity measurement still remains without any satisfactory answer.
We introduce a new score based on Binet-Cauchy kernel. It is normalized and bounded between 1-maximal similarity that implies exactly the same conformations for protein fragments-and -1-mirror image conformations, the unrelated conformations having a null mean score. This allows for the search of both similar and mirror conformations. In addition, such score addresses two major issue of the widely used root mean square deviation (RMSD). First, it achieves length independent statistics even for short fragments. Second, it shows better performance in the discrimination of medium range RMSD values. Being simpler and faster to compute than the RMSD, it also provides the means for large-scale mining of protein structures.
The computer software implementing the score is available at http://bioserv.rpbs.univ-paris-diderot.fr/BCscore/
frederic.guyon@univ-paris-diderot.fr
Supplementary data are available at Bioinformatics online.
评估蛋白质结构相似性的有意义分数对于破译蛋白质结构和序列进化至关重要。随着越来越多的蛋白质结构被挖掘,需要快速准确的具有统计学意义的相似性度量。虽然已经提出了许多用于整个蛋白质域的方法,但重点逐渐转移到更局部的结构分析层面,而对于这种分析,相似性测量仍然没有令人满意的答案。
我们引入了一种基于 Binet-Cauchy 核的新分数。它是归一化的,并且在 1-最大相似性(对于蛋白质片段意味着完全相同的构象)和-1-镜像构象之间有界,不相关的构象具有零均值分数。这允许搜索相似构象和镜像构象。此外,该分数解决了广泛使用的均方根偏差(RMSD)的两个主要问题。首先,它甚至可以对短片段实现长度独立的统计。其次,它在区分中等 RMSD 值方面表现出更好的性能。由于比 RMSD 更简单、更快,它还为蛋白质结构的大规模挖掘提供了手段。
实现该分数的计算机软件可在 http://bioserv.rpbs.univ-paris-diderot.fr/BCscore/ 上获得。
frederic.guyon@univ-paris-diderot.fr
补充数据可在 Bioinformatics 在线获得。