Capra John A, Singh Mona
Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA.
Bioinformatics. 2007 Aug 1;23(15):1875-82. doi: 10.1093/bioinformatics/btm270. Epub 2007 May 22.
All residues in a protein are not equally important. Some are essential for the proper structure and function of the protein, whereas others can be readily replaced. Conservation analysis is one of the most widely used methods for predicting these functionally important residues in protein sequences.
We introduce an information-theoretic approach for estimating sequence conservation based on Jensen-Shannon divergence. We also develop a general heuristic that considers the estimated conservation of sequentially neighboring sites. In large-scale testing, we demonstrate that our combined approach outperforms previous conservation-based measures in identifying functionally important residues; in particular, it is significantly better than the commonly used Shannon entropy measure. We find that considering conservation at sequential neighbors improves the performance of all methods tested. Our analysis also reveals that many existing methods that attempt to incorporate the relationships between amino acids do not lead to better identification of functionally important sites. Finally, we find that while conservation is highly predictive in identifying catalytic sites and residues near bound ligands, it is much less effective in identifying residues in protein-protein interfaces.
Data sets and code for all conservation measures evaluated are available at http://compbio.cs.princeton.edu/conservation/
蛋白质中的所有残基并非同等重要。有些对于蛋白质的正确结构和功能至关重要,而其他一些则可以很容易地被替代。保守性分析是预测蛋白质序列中这些功能重要残基最广泛使用的方法之一。
我们引入了一种基于詹森 - 香农散度估计序列保守性的信息论方法。我们还开发了一种通用启发式方法,该方法考虑了连续相邻位点的估计保守性。在大规模测试中,我们证明我们的组合方法在识别功能重要残基方面优于以前基于保守性的方法;特别是,它明显优于常用的香农熵度量。我们发现考虑连续相邻位点的保守性可提高所有测试方法的性能。我们的分析还表明,许多试图纳入氨基酸之间关系的现有方法并不能更好地识别功能重要位点。最后,我们发现虽然保守性在识别催化位点和结合配体附近的残基方面具有高度预测性,但在识别蛋白质 - 蛋白质界面中的残基方面效果要差得多。
所有评估的保守性度量的数据集和代码可在http://compbio.cs.princeton.edu/conservation/获取