使用一致性系数对蛋白质图谱进行折叠识别。

Fold recognition by scoring protein maps using the congruence coefficient.

机构信息

Department of Computer Science and Engineering, University of Bologna, Bologna 40126, Italy.

Department of Computer Science, University of California, Irvine, CA 92697, USA.

出版信息

Bioinformatics. 2021 May 1;37(4):506-513. doi: 10.1093/bioinformatics/btaa833.

DOI:10.1093/bioinformatics/btaa833

PMID:32976564

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8088323/

Abstract

MOTIVATION

Protein fold recognition is a key step for template-based modeling approaches to protein structure prediction. Although closely related folds can be easily identified by sequence homology search in sequence databases, fold recognition is notoriously more difficult when it involves the identification of distantly related homologs. Recent progress in residue-residue contact and distance prediction opens up the possibility of improving fold recognition by using structural information contained in predicted distance and contact maps.

RESULTS

Here we propose to use the congruence coefficient as a metric of similarity between maps. We prove that this metric has several interesting mathematical properties which allow one to compute in polynomial time its exact mean and variance over all possible (exponentially many) alignments between two symmetric matrices, and assess the statistical significance of similarity between aligned maps. We perform fold recognition tests by recovering predicted target contact/distance maps from the two most recent Critical Assessment of Structure Prediction editions and over 27 000 non-homologous structural templates from the ECOD database. On this large benchmark, we compare fold recognition performances of different alignment tools with their own similarity scores against those obtained using the congruence coefficient. We show that the congruence coefficient overall improves fold recognition over other methods, proving its effectiveness as a general similarity metric for protein map comparison.

AVAILABILITY AND IMPLEMENTATION

The congruence coefficient software CCpro is available as part of the SCRATCH suite at: http://scratch.proteomics.ics.uci.edu/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质折叠识别是基于模板的蛋白质结构预测方法的关键步骤。尽管在序列数据库中通过序列同源性搜索可以轻松识别密切相关的折叠，但当涉及到识别远距离同源物时，折叠识别就变得更加困难。最近在残基残基接触和距离预测方面的进展为利用预测距离和接触图中包含的结构信息来改进折叠识别开辟了可能性。

结果

在这里，我们建议使用一致性系数作为地图之间相似性的度量。我们证明了该度量具有几个有趣的数学性质，允许在两个对称矩阵之间的所有可能（指数级多）对齐中计算其精确均值和方差，并评估对齐地图之间相似性的统计显著性。我们通过从最近的两次关键结构预测评估版和 ECOD 数据库中的 27000 多个非同源结构模板中恢复预测的目标接触/距离图来进行折叠识别测试。在这个大型基准测试中，我们将不同对齐工具的折叠识别性能与其自身相似性得分与使用一致性系数获得的得分进行了比较。我们表明，一致性系数总体上提高了折叠识别的性能，优于其他方法，证明了它作为蛋白质图谱比较的通用相似性度量的有效性。

可用性和实现

一致性系数软件 CCpro 可作为 SCRATCH 套件的一部分在以下网址获得：http://scratch.proteomics.ics.uci.edu/。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

Fold recognition by scoring protein maps using the congruence coefficient.

Bioinformatics. 2021 May 1;37(4):506-513. doi: 10.1093/bioinformatics/btaa833.

Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases.

Bioinformatics. 2000 Nov;16(11):988-1002. doi: 10.1093/bioinformatics/16.11.988.

EigenTHREADER: analogous protein fold recognition by efficient contact map threading.

Bioinformatics. 2017 Sep 1;33(17):2684-2690. doi: 10.1093/bioinformatics/btx217.

FALCON@home: a high-throughput protein structure prediction server based on remote homologue recognition.

Bioinformatics. 2016 Feb 1;32(3):462-4. doi: 10.1093/bioinformatics/btv581. Epub 2015 Oct 10.

Improving protein fold recognition by extracting fold-specific features from predicted residue-residue contacts.

Bioinformatics. 2017 Dec 1;33(23):3749-3757. doi: 10.1093/bioinformatics/btx514.

DeepSF: deep convolutional neural network for mapping protein sequences to folds.

Bioinformatics. 2018 Apr 15;34(8):1295-1303. doi: 10.1093/bioinformatics/btx780.

Detecting distant-homology protein structures by aligning deep neural-network based contact maps.

PLoS Comput Biol. 2019 Oct 17;15(10):e1007411. doi: 10.1371/journal.pcbi.1007411. eCollection 2019 Oct.

Towards optimal alignment of protein structure distance matrices.

Bioinformatics. 2010 Sep 15;26(18):2273-80. doi: 10.1093/bioinformatics/btq420. Epub 2010 Jul 17.

DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins.

Bioinformatics. 2020 Apr 1;36(7):2105-2112. doi: 10.1093/bioinformatics/btz863.

A machine learning information retrieval approach to protein fold recognition.

Bioinformatics. 2006 Jun 15;22(12):1456-63. doi: 10.1093/bioinformatics/btl102. Epub 2006 Mar 17.

引用本文的文献

An interactive visualization tool for educational outreach in protein contact map overlap analysis.

Front Bioinform. 2024 Mar 15;4:1358550. doi: 10.3389/fbinf.2024.1358550. eCollection 2024.

本文引用的文献

The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures.

Nucleic Acids Res. 2020 Jan 8;48(D1):D376-D382. doi: 10.1093/nar/gkz1064.

Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13).

Proteins. 2019 Dec;87(12):1141-1148. doi: 10.1002/prot.25834.

Recent developments in deep learning applied to protein structure prediction.

Proteins. 2019 Dec;87(12):1179-1189. doi: 10.1002/prot.25824. Epub 2019 Oct 14.

Critical assessment of methods of protein structure prediction (CASP)-Round XIII.

Proteins. 2019 Dec;87(12):1011-1020. doi: 10.1002/prot.25823. Epub 2019 Oct 23.

Assessing the accuracy of contact predictions in CASP13.

Proteins. 2019 Dec;87(12):1058-1068. doi: 10.1002/prot.25819. Epub 2019 Oct 24.

HH-suite3 for fast remote homology detection and deep protein annotation.

BMC Bioinformatics. 2019 Sep 14;20(1):473. doi: 10.1186/s12859-019-3019-7.

Analysis of distance-based protein structure prediction by deep learning in CASP13.

Proteins. 2019 Dec;87(12):1069-1081. doi: 10.1002/prot.25810. Epub 2019 Sep 13.

Advances in protein structure prediction and design.

Nat Rev Mol Cell Biol. 2019 Nov;20(11):681-697. doi: 10.1038/s41580-019-0163-x. Epub 2019 Aug 15.

Deep-learning contact-map guided protein structure prediction in CASP13.

Proteins. 2019 Dec;87(12):1149-1164. doi: 10.1002/prot.25792. Epub 2019 Aug 14.

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13.

Proteins. 2019 Dec;87(12):1165-1178. doi: 10.1002/prot.25697. Epub 2019 Apr 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用一致性系数对蛋白质图谱进行折叠识别。

Fold recognition by scoring protein maps using the congruence coefficient.

机构信息

Department of Computer Science and Engineering, University of Bologna, Bologna 40126, Italy.

Department of Computer Science, University of California, Irvine, CA 92697, USA.

出版信息

Bioinformatics. 2021 May 1;37(4):506-513. doi: 10.1093/bioinformatics/btaa833.

DOI:10.1093/bioinformatics/btaa833

PMID:32976564

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8088323/

Abstract

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

The congruence coefficient software CCpro is available as part of the SCRATCH suite at: http://scratch.proteomics.ics.uci.edu/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

结果

可用性和实现

一致性系数软件 CCpro 可作为 SCRATCH 套件的一部分在以下网址获得：http://scratch.proteomics.ics.uci.edu/。

补充信息

补充数据可在 Bioinformatics 在线获得。

使用一致性系数对蛋白质图谱进行折叠识别。

Fold recognition by scoring protein maps using the congruence coefficient.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用一致性系数对蛋白质图谱进行折叠识别。

Fold recognition by scoring protein maps using the congruence coefficient.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息