Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040 Regensburg, Germany.
BMC Bioinformatics. 2014 Apr 27;15:118. doi: 10.1186/1471-2105-15-118.
The identification of functionally important residue positions is an important task of computational biology. Methods of correlation analysis allow for the identification of pairs of residue positions, whose occupancy is mutually dependent due to constraints imposed by protein structure or function. A common measure assessing these dependencies is the mutual information, which is based on Shannon's information theory that utilizes probabilities only. Consequently, such approaches do not consider the similarity of residue pairs, which may degrade the algorithm's performance. One typical algorithm is H2r, which characterizes each individual residue position k by the conn(k)-value, which is the number of significantly correlated pairs it belongs to.
To improve specificity of H2r, we developed a revised algorithm, named H2rs, which is based on the von Neumann entropy (vNE). To compute the corresponding mutual information, a matrix A is required, which assesses the similarity of residue pairs. We determined A by deducing substitution frequencies from contacting residue pairs observed in the homologs of 35 809 proteins, whose structure is known. In analogy to H2r, the enhanced algorithm computes a normalized conn(k)-value. Within the framework of H2rs, only statistically significant vNE values were considered. To decide on significance, the algorithm calculates a p-value by performing a randomization test for each individual pair of residue positions. The analysis of a large in silico testbed demonstrated that specificity and precision were higher for H2rs than for H2r and two other methods of correlation analysis. The gain in prediction quality is further confirmed by a detailed assessment of five well-studied enzymes. The outcome of H2rs and of a method that predicts contacting residue positions (PSICOV) overlapped only marginally. H2rs can be downloaded from http://www-bioinf.uni-regensburg.de.
Considering substitution frequencies for residue pairs by means of the von Neumann entropy and a p-value improved the success rate in identifying important residue positions. The integration of proven statistical concepts and normalization allows for an easier comparison of results obtained with different proteins. Comparing the outcome of the local method H2rs and of the global method PSICOV indicates that such methods supplement each other and have different scopes of application.
功能重要残基位置的鉴定是计算生物学的一项重要任务。相关分析方法可用于鉴定残基对,由于蛋白质结构或功能的限制,这些残基对的占据是相互依赖的。一种常见的评估这些依赖性的度量是互信息,它基于香农信息论,仅使用概率。因此,这些方法不考虑残基对的相似性,这可能会降低算法的性能。一种典型的算法是 H2r,它通过 conn(k)-值来描述每个单独的残基位置 k,该值是其所属的显著相关对的数量。
为了提高 H2r 的特异性,我们开发了一种名为 H2rs 的改进算法,该算法基于冯·诺依曼熵(vNE)。为了计算相应的互信息,需要一个评估残基对相似性的矩阵 A。我们通过从已知结构的 35809 个同源蛋白质中观察到的接触残基对中推导出取代频率来确定 A。与 H2r 类似,增强算法计算归一化 conn(k)-值。在 H2rs 的框架内,仅考虑具有统计学意义的 vNE 值。为了确定显著性,该算法通过对每个残基对进行随机化测试来计算 p 值。对大型计算机测试库的分析表明,H2rs 的特异性和精度均高于 H2r 和其他两种相关分析方法。对五个研究充分的酶的详细评估进一步证实了预测质量的提高。H2rs 的结果与预测接触残基位置的方法(PSICOV)的结果仅略有重叠。H2rs 可从 http://www-bioinf.uni-regensburg.de 下载。
通过冯·诺依曼熵和 p 值考虑残基对的取代频率,提高了识别重要残基位置的成功率。结合经过验证的统计概念和归一化,允许更容易地比较使用不同蛋白质获得的结果。比较局部方法 H2rs 和全局方法 PSICOV 的结果表明,这些方法相互补充,具有不同的应用范围。