Suppr超能文献

1001个最优蛋白质数据银行(PDB)结构比对:用于寻找最大接触图重叠的整数规划方法

1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap.

作者信息

Caprara Alberto, Carr Robert, Istrail Sorin, Lancia Giuseppe, Walenz Brian

机构信息

D.E.I.S., Università di Bologna, Viale Risorgimento, 2 40136 Bologna, Italy.

出版信息

J Comput Biol. 2004;11(1):27-52. doi: 10.1089/106652704773416876.

Abstract

Protein structure comparison is a fundamental problem for structural genomics, with applications to drug design, fold prediction, protein clustering, and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years of developments on the study of an emerging measure, the contact map overlap (CMO), for protein structure comparison. A contact map is a list of pairs of residues which lie in three-dimensional proximity in the protein's native fold. Although this measure is in principle computationally hard to optimize, we show how it can in fact be computed with great accuracy for related proteins by integer linear programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. We also illustrate effective heuristics, such as local search and genetic algorithms. We were able to obtain for the first time optimal alignments for large similar proteins (about 1,000 residues and 2,000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are off by at most 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact and how to choose this threshold in a sensible way.

摘要

蛋白质结构比较是结构基因组学的一个基本问题,在药物设计、折叠预测、蛋白质聚类和进化研究等方面都有应用。尽管其很重要,但针对这个问题,已知的严格方法和被广泛接受的相似性度量却非常少。在本文中,我们描述了在一种新兴的用于蛋白质结构比较的度量——接触图重叠(CMO)研究方面过去几年的进展。接触图是蛋白质天然折叠中处于三维接近位置的残基对列表。尽管这种度量原则上在计算上难以优化,但我们展示了如何通过整数线性规划技术实际上以很高的精度对相关蛋白质进行计算。这些方法的优点是通过最优比对值的上界提供接近最优性的证明。我们还阐述了有效的启发式方法,如局部搜索和遗传算法。我们首次获得了大型相似蛋白质(约1000个残基和2000个接触点)的最优比对,并使用CMO度量对蛋白质家族进行聚类。将得到的聚类与SCOP分类进行比较以验证该度量。大量的计算实验表明,在短时间内可以计算出与最优值偏差最多10%的比对。进一步的实验表明了这种度量对定义接触的阈值选择的反应以及如何明智地选择这个阈值。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验