Suppr超能文献

基于蛋白质结构的催化残基预测。

Protein structure based prediction of catalytic residues.

机构信息

Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA.

出版信息

BMC Bioinformatics. 2013 Feb 22;14:63. doi: 10.1186/1471-2105-14-63.

Abstract

BACKGROUND

Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation.

RESULTS

We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods.

CONCLUSIONS

We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.

摘要

背景

全球结构基因组学项目继续以前所未有的速度发布新的蛋白质结构,到目前为止已经有近 6000 个,但这些蛋白质中只有约 60%具有某种功能注释。

结果

我们探索了一系列可以用于预测已知三维结构中功能残基的特征。这些特征包括相互作用残基图中节点的各种中心性度量:接近度、中介度和网页排名中心性。我们还分析了功能氨基酸与结构总体质心 (GCM) 的距离、相对溶剂可及性 (RSA) 以及相对熵作为序列保守性的度量。从选定的特征中,我们使用神经网络训练来识别催化残基。我们发现,当与序列保守性独立使用时,使用与 GCM 的距离与氨基酸类型相结合可以提供良好的判别函数。使用 29 个注释蛋白质结构的独立测试集,该方法将最初的 9262 个残基中的 411 个残基作为最有可能参与功能的残基返回。输出的 411 个残基包含 111 个注释催化残基中的 70 个。这代表在整个输入集中对催化残基的富集约为 14 倍(对应于 63%的灵敏度和 17%的精度),性能与其他最先进方法相当。

结论

我们发现,基于图的几种度量利用了蛋白质结构的相同基本特征,这些特征可以通过与 GCM 距离的定义更简单、更有效地捕捉。这也具有简单易用的优势。同时,序列保守性仍然是识别功能残基最具影响力的特征。我们还发现,由于序列数据库的大小和组成迅速变化,必须针对特定参考数据库重新校准保守性计算。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5adc/3598644/13d6ac340781/1471-2105-14-63-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验