Fundación Instituto Leloir, Avda, Patricias Argentinas 435, CABA, C1405BWE, Argentina.
BMC Bioinformatics. 2012 Sep 14;13:235. doi: 10.1186/1471-2105-13-235.
A large panel of methods exists that aim to identify residues with critical impact on protein function based on evolutionary signals, sequence and structure information. However, it is not clear to what extent these different methods overlap, and if any of the methods have higher predictive potential compared to others when it comes to, in particular, the identification of catalytic residues (CR) in proteins. Using a large set of enzymatic protein families and measures based on different evolutionary signals, we sought to break up the different components of the information content within a multiple sequence alignment to investigate their predictive potential and degree of overlap.
Our results demonstrate that the different methods included in the benchmark in general can be divided into three groups with a limited mutual overlap. One group containing real-value Evolutionary Trace (rvET) methods and conservation, another containing mutual information (MI) methods, and the last containing methods designed explicitly for the identification of specificity determining positions (SDPs): integer-value Evolutionary Trace (ivET), SDPfox, and XDET. In terms of prediction of CR, we find using a proximity score integrating structural information (as the sum of the scores of residues located within a given distance of the residue in question) that only the methods from the first two groups displayed a reliable performance. Next, we investigated to what degree proximity scores for conservation, rvET and cumulative MI (cMI) provide complementary information capable of improving the performance for CR identification. We found that integrating conservation with proximity scores for rvET and cMI achieved the highest performance. The proximity conservation score contained no complementary information when integrated with proximity rvET. Moreover, the signal from rvET provided only a limited gain in predictive performance when integrated with mutual information and conservation proximity scores. Combined, these observations demonstrate that the rvET and cMI scores add complementary information to the prediction system.
This work contributes to the understanding of the different signals of evolution and also shows that it is possible to improve the detection of catalytic residues by integrating structural and higher order sequence evolutionary information with sequence conservation.
存在大量方法旨在根据进化信号、序列和结构信息来识别对蛋白质功能具有关键影响的残基。然而,目前尚不清楚这些不同方法之间的重叠程度,以及在识别蛋白质中的催化残基(CR)方面,哪些方法具有更高的预测潜力。我们使用大量酶蛋白家族和基于不同进化信号的度量方法,试图分解多重序列比对中信息内容的不同组成部分,以研究它们的预测潜力和重叠程度。
我们的结果表明,基准测试中包含的不同方法通常可以分为三个相互重叠有限的组。一组包含真实值进化痕迹(rvET)方法和保守性,另一组包含互信息(MI)方法,最后一组包含专门用于识别特异性决定位置(SDP)的方法:整数值进化痕迹(ivET)、SDPfox 和 XDET。就 CR 的预测而言,我们发现使用一种整合结构信息的接近度得分(即问题残基周围给定距离内的残基的得分之和),只有前两组的方法显示出可靠的性能。接下来,我们研究了接近度得分对于保守性、rvET 和累积 MI(cMI)提供互补信息的程度,这些信息能够提高 CR 识别的性能。我们发现,将保守性与 rvET 和 cMI 的接近度得分相结合,可以获得最高的性能。当与 rvET 的接近度得分结合时,保守性接近度得分不包含互补信息。此外,当与互信息和保守性接近度得分结合时,rvET 的信号仅在预测性能上提供了有限的增益。综合来看,这些观察结果表明,rvET 和 cMI 得分为预测系统添加了互补信息。
这项工作有助于理解不同的进化信号,也表明通过将结构和更高阶序列进化信息与序列保守性相结合,有可能提高催化残基的检测能力。