Suppr超能文献

利用进化和结构信息进行活性位点预测。

Active site prediction using evolutionary and structural information.

机构信息

Computer Science Division, University of California, Berkeley, USA.

出版信息

Bioinformatics. 2010 Mar 1;26(5):617-24. doi: 10.1093/bioinformatics/btq008. Epub 2010 Jan 14.

Abstract

MOTIVATION

The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites.

RESULTS

In cross-validation experiments on two benchmark datasets from the Catalytic Site Atlas and CATRES resources containing a total of 437 manually curated enzymes spanning 487 SCOP families, Discern increases catalytic site recall between 12% and 20% over methods that combine information from both sequence and structure, and by >or=50% over methods that make use of sequence conservation signal only. Controlled experiments show that Discern's improvement in catalytic residue prediction is derived from the combination of three ingredients: the use of the INTREPID phylogenomic method to extract conservation information; the use of 3D structure data, including features computed for residues that are proximal in the structure; and a statistical regularization procedure to prevent overfitting.

摘要

动机

催化残基的鉴定是理解酶功能的关键步骤。虽然已经开发了多种计算方法来完成这项任务,但准确性仍然相当低。目前最好的方法利用序列和结构信息来实现精度(预测的催化残基中催化残基的分数)为 18.5%,召回率(识别的催化残基的分数)为 57%,这是在标准基准上的结果。在这里,我们提出了一种新的方法 Discern,它通过使用统计技术来获得一个具有少量特征的模型,这些特征可以共同预测酶活性位点,从而在现有技术的基础上取得了显著的改进。

结果

在包含来自 Catalytic Site Atlas 和 CATRES 资源的 437 个经过手动整理的酶的两个基准数据集上的交叉验证实验中,Discern 提高了催化位点的召回率,比结合序列和结构信息的方法提高了 12%到 20%,比仅使用序列保守性信号的方法提高了>或=50%。对照实验表明,Discern 在催化残基预测方面的改进来自于三个方面的结合:使用 INTREPID 系统发生基因组学方法提取保守性信息;使用 3D 结构数据,包括为结构中接近的残基计算的特征;以及一种统计正则化过程,以防止过度拟合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d155/2828116/116f4297feb6/btq008f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验