1001个最优蛋白质数据银行（PDB）结构比对：用于寻找最大接触图重叠的整数规划方法

1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap.

作者信息

Caprara Alberto, Carr Robert, Istrail Sorin, Lancia Giuseppe, Walenz Brian

机构信息

D.E.I.S., Università di Bologna, Viale Risorgimento, 2 40136 Bologna, Italy.

出版信息

J Comput Biol. 2004;11(1):27-52. doi: 10.1089/106652704773416876.

DOI:10.1089/106652704773416876

PMID:15072687

Abstract

Protein structure comparison is a fundamental problem for structural genomics, with applications to drug design, fold prediction, protein clustering, and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years of developments on the study of an emerging measure, the contact map overlap (CMO), for protein structure comparison. A contact map is a list of pairs of residues which lie in three-dimensional proximity in the protein's native fold. Although this measure is in principle computationally hard to optimize, we show how it can in fact be computed with great accuracy for related proteins by integer linear programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. We also illustrate effective heuristics, such as local search and genetic algorithms. We were able to obtain for the first time optimal alignments for large similar proteins (about 1,000 residues and 2,000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are off by at most 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact and how to choose this threshold in a sensible way.

摘要

蛋白质结构比较是结构基因组学的一个基本问题，在药物设计、折叠预测、蛋白质聚类和进化研究等方面都有应用。尽管其很重要，但针对这个问题，已知的严格方法和被广泛接受的相似性度量却非常少。在本文中，我们描述了在一种新兴的用于蛋白质结构比较的度量——接触图重叠（CMO）研究方面过去几年的进展。接触图是蛋白质天然折叠中处于三维接近位置的残基对列表。尽管这种度量原则上在计算上难以优化，但我们展示了如何通过整数线性规划技术实际上以很高的精度对相关蛋白质进行计算。这些方法的优点是通过最优比对值的上界提供接近最优性的证明。我们还阐述了有效的启发式方法，如局部搜索和遗传算法。我们首次获得了大型相似蛋白质（约1000个残基和2000个接触点）的最优比对，并使用CMO度量对蛋白质家族进行聚类。将得到的聚类与SCOP分类进行比较以验证该度量。大量的计算实验表明，在短时间内可以计算出与最优值偏差最多10%的比对。进一步的实验表明了这种度量对定义接触的阈值选择的反应以及如何明智地选择这个阈值。

相似文献

1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap.

J Comput Biol. 2004;11(1):27-52. doi: 10.1089/106652704773416876.

Maximum contact map overlap revisited.

J Comput Biol. 2011 Jan;18(1):27-41. doi: 10.1089/cmb.2009.0196.

Towards optimal alignment of protein structure distance matrices.

Bioinformatics. 2010 Sep 15;26(18):2273-80. doi: 10.1093/bioinformatics/btq420. Epub 2010 Jul 17.

Fast overlapping of protein contact maps by alignment of eigenvectors.

Bioinformatics. 2010 Sep 15;26(18):2250-8. doi: 10.1093/bioinformatics/btq402. Epub 2010 Jul 7.

Protein threading by linear programming.

Pac Symp Biocomput. 2003:264-75.

DALIX: optimal DALI protein structure alignment.

IEEE/ACM Trans Comput Biol Bioinform. 2013 Jan-Feb;10(1):26-36. doi: 10.1109/TCBB.2012.143.

A reduction-based exact algorithm for the contact map overlap problem.

J Comput Biol. 2007 Jun;14(5):637-54. doi: 10.1089/cmb.2007.R007.

Adaptive Smith-Waterman residue match seeding for protein structural alignment.

Proteins. 2013 Oct;81(10):1823-39. doi: 10.1002/prot.24327. Epub 2013 Aug 19.

CAALIGN: a program for pairwise and multiple protein-structure alignment.

Acta Crystallogr D Biol Crystallogr. 2007 Apr;63(Pt 4):514-25. doi: 10.1107/S0907444907000844. Epub 2007 Mar 16.

PROFcon: novel prediction of long-range contacts.

Bioinformatics. 2005 Jul 1;21(13):2960-8. doi: 10.1093/bioinformatics/bti454. Epub 2005 May 12.

引用本文的文献

Alignments of biomolecular contact maps.

Interface Focus. 2021 Jun 11;11(4):20200066. doi: 10.1098/rsfs.2020.0066. eCollection 2021 Jun.

Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

Biomed Res Int. 2015;2015:563674. doi: 10.1155/2015/563674. Epub 2015 Oct 28.

Homology modeling of larger proteins guided by chemical shifts.

Nat Methods. 2015 Aug;12(8):747-50. doi: 10.1038/nmeth.3437. Epub 2015 Jun 8.

Simultaneous alignment and folding of protein sequences.

J Comput Biol. 2014 Jul;21(7):477-91. doi: 10.1089/cmb.2013.0163. Epub 2014 Apr 25.

Predicting protein contact map using evolutionary and physical constraints by integer programming.

Bioinformatics. 2013 Jul 1;29(13):i266-73. doi: 10.1093/bioinformatics/btt211.

On the difference in quality between current heuristic and optimal solutions to the protein structure alignment problem.

Biomed Res Int. 2013;2013:459248. doi: 10.1155/2013/459248. Epub 2012 Dec 23.

SAS-Pro: simultaneous residue assignment and structure superposition for protein structure alignment.

PLoS One. 2012;7(5):e37493. doi: 10.1371/journal.pone.0037493. Epub 2012 May 25.

deepBlockAlign: a tool for aligning RNA-seq profiles of read block patterns.

Bioinformatics. 2012 Jan 1;28(1):17-24. doi: 10.1093/bioinformatics/btr598. Epub 2011 Nov 3.

Fast and accurate protein substructure searching with simulated annealing and GPUs.

BMC Bioinformatics. 2010 Sep 3;11:446. doi: 10.1186/1471-2105-11-446.

A fast mathematical programming procedure for simultaneous fitting of assembly components into cryoEM density maps.

Bioinformatics. 2010 Jun 15;26(12):i261-8. doi: 10.1093/bioinformatics/btq201.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

1001个最优蛋白质数据银行（PDB）结构比对：用于寻找最大接触图重叠的整数规划方法

1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献