在序列和结构丰富的时代评估基于共进化的残基-残基接触预测的效用。

Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era.

机构信息

Howard Hughes Medical Institute, Department of Biochemistry, and Molecular and Cellular Biology Program, University of Washington, Seattle, WA 98195.

出版信息

Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15674-9. doi: 10.1073/pnas.1314045110. Epub 2013 Sep 5.

DOI:10.1073/pnas.1314045110

PMID:24009338

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3785744/

Abstract

Recently developed methods have shown considerable promise in predicting residue-residue contacts in protein 3D structures using evolutionary covariance information. However, these methods require large numbers of evolutionarily related sequences to robustly assess the extent of residue covariation, and the larger the protein family, the more likely that contact information is unnecessary because a reasonable model can be built based on the structure of a homolog. Here we describe a method that integrates sequence coevolution and structural context information using a pseudolikelihood approach, allowing more accurate contact predictions from fewer homologous sequences. We rigorously assess the utility of predicted contacts for protein structure prediction using large and representative sequence and structure databases from recent structure prediction experiments. We find that contact predictions are likely to be accurate when the number of aligned sequences (with sequence redundancy reduced to 90%) is greater than five times the length of the protein, and that accurate predictions are likely to be useful for structure modeling if the aligned sequences are more similar to the protein of interest than to the closest homolog of known structure. These conditions are currently met by 422 of the protein families collected in the Pfam database.

摘要

最近开发的方法利用进化协方差信息在预测蛋白质 3D 结构中的残基-残基接触方面显示出了相当大的潜力。然而，这些方法需要大量进化相关的序列来稳健地评估残基协变的程度，而且蛋白质家族越大，接触信息就越不重要，因为可以基于同源物的结构构建合理的模型。在这里，我们描述了一种使用伪似然方法整合序列共进化和结构上下文信息的方法，允许从较少的同源序列中进行更准确的接触预测。我们使用来自最近结构预测实验的大型和代表性序列和结构数据库，严格评估预测接触对蛋白质结构预测的效用。我们发现，当对齐序列的数量（序列冗余减少到 90%）大于蛋白质长度的五倍时，接触预测很可能是准确的，如果对齐序列与感兴趣的蛋白质比与已知结构的最接近同源物更相似，那么准确的预测很可能对结构建模有用。目前，Pfam 数据库中收集的 422 个蛋白质家族都满足这些条件。

相似文献

Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era.在序列和结构丰富的时代评估基于共进化的残基-残基接触预测的效用。

Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15674-9. doi: 10.1073/pnas.1314045110. Epub 2013 Sep 5.

De novo structure prediction of globular proteins aided by sequence variation-derived contacts.基于序列变异衍生接触辅助的球状蛋白质从头结构预测。

PLoS One. 2014 Mar 17;9(3):e92197. doi: 10.1371/journal.pone.0092197. eCollection 2014.

Protein structure determination using metagenome sequence data.利用宏基因组序列数据进行蛋白质结构测定。

Science. 2017 Jan 20;355(6322):294-298. doi: 10.1126/science.aah4043.

Prediction of Structures and Interactions from Genome Information.从基因组信息预测结构和相互作用。

Adv Exp Med Biol. 2018;1105:123-152. doi: 10.1007/978-981-13-2200-6_9.

Direct-coupling analysis of residue coevolution captures native contacts across many protein families.残基共进化的直接耦联分析捕获了许多蛋白质家族中的天然接触。

Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301. doi: 10.1073/pnas.1111471108. Epub 2011 Nov 21.

DeepHelicon: Accurate prediction of inter-helical residue contacts in transmembrane proteins by residual neural networks.DeepHelicon：通过残差神经网络准确预测跨膜蛋白中螺旋间残基接触。

J Struct Biol. 2020 Oct 1;212(1):107574. doi: 10.1016/j.jsb.2020.107574. Epub 2020 Jul 11.

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.通过整合深度多序列比对、协同进化和机器学习进行蛋白质接触预测。

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):84-96. doi: 10.1002/prot.25405. Epub 2017 Oct 31.

Sequence coevolution between RNA and protein characterized by mutual information between residue triplets.基于残基三联体之间互信息的 RNA 和蛋白质序列共进化特征。

PLoS One. 2012;7(1):e30022. doi: 10.1371/journal.pone.0030022. Epub 2012 Jan 18.

Predicting residue-residue contacts using random forest models.利用随机森林模型预测残基-残基接触。

Bioinformatics. 2011 Dec 15;27(24):3379-84. doi: 10.1093/bioinformatics/btr579. Epub 2011 Oct 20.

Using inferred residue contacts to distinguish between correct and incorrect protein models.利用推断的残基接触来区分正确和错误的蛋白质模型。

Bioinformatics. 2008 Jul 15;24(14):1575-82. doi: 10.1093/bioinformatics/btn248. Epub 2008 May 29.

引用本文的文献

Beyond static structures: protein dynamic conformations modeling in the post-AlphaFold era.超越静态结构：后AlphaFold时代的蛋白质动态构象建模

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf340.

Tracing the function expansion for a primordial protein fold in the era of fold-based function prediction: β-trefoil.在基于折叠的功能预测时代追溯原始蛋白质折叠的功能扩展：β-三叶因子。

PLoS One. 2025 Jul 3;20(7):e0320177. doi: 10.1371/journal.pone.0320177. eCollection 2025.

Emerging frontiers in protein structure prediction following the AlphaFold revolution.继AlphaFold革命之后蛋白质结构预测的新兴前沿领域。

J R Soc Interface. 2025 Apr;22(225):20240886. doi: 10.1098/rsif.2024.0886. Epub 2025 Apr 16.

Message hidden in α-helices-toward a better understanding of plant ABCG transporters' multispecificity.隐藏在α-螺旋中的信息——旨在更好地理解植物ABCG转运蛋白的多特异性

Plant Physiol. 2025 Apr 30;198(1). doi: 10.1093/plphys/kiaf146.

Thermal Adaptation of Extremozymes: Temperature-Sensitive Contact Analysis of Serine Proteases.极端酶的热适应性：丝氨酸蛋白酶的温度敏感性接触分析

bioRxiv. 2025 Mar 6:2025.03.03.641325. doi: 10.1101/2025.03.03.641325.

Decoding and reengineering the promoter specificity of T7-like RNA polymerases based on phage genome sequences.基于噬菌体基因组序列解码与改造T7样RNA聚合酶的启动子特异性

Nucleic Acids Res. 2025 Feb 27;53(5). doi: 10.1093/nar/gkaf140.

Hierarchical design of pseudosymmetric protein nanocages.伪对称蛋白质纳米笼的层次设计

Nature. 2025 Feb;638(8050):553-561. doi: 10.1038/s41586-024-08360-6. Epub 2024 Dec 18.

Protein language models learn evolutionary statistics of interacting sequence motifs.蛋白质语言模型学习相互作用序列基序的进化统计信息。

Proc Natl Acad Sci U S A. 2024 Nov 5;121(45):e2406285121. doi: 10.1073/pnas.2406285121. Epub 2024 Oct 28.

Predicting RNA sequence-structure likelihood via structure-aware deep learning.通过结构感知深度学习预测 RNA 序列-结构可能性。

BMC Bioinformatics. 2024 Sep 30;25(1):316. doi: 10.1186/s12859-024-05916-1.

Using residue interaction networks to understand protein function and evolution and to engineer new proteins.利用残基相互作用网络来理解蛋白质的功能和进化，并设计新的蛋白质。

Curr Opin Struct Biol. 2024 Dec;89:102922. doi: 10.1016/j.sbi.2024.102922. Epub 2024 Sep 26.

本文引用的文献

Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models.蛋白质中改进的接触预测：使用伪似然性推断Potts模型。

Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Jan;87(1):012707. doi: 10.1103/PhysRevE.87.012707. Epub 2013 Jan 11.

Reliable and robust detection of coevolving protein residues.可靠且稳健的共进化蛋白质残基检测。

Protein Eng Des Sel. 2012 Nov;25(11):705-13. doi: 10.1093/protein/gzs081. Epub 2012 Oct 16.

Genomics-aided structure prediction.基于基因组学的结构预测。

Proc Natl Acad Sci U S A. 2012 Jun 26;109(26):10340-5. doi: 10.1073/pnas.1207864109. Epub 2012 Jun 12.

Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis.利用片段组装和相关突变分析准确从头预测大型跨膜蛋白结构域。

Proc Natl Acad Sci U S A. 2012 Jun 12;109(24):E1540-7. doi: 10.1073/pnas.1120036109. Epub 2012 May 29.

Three-dimensional structures of membrane proteins from genomic sequencing.从基因组测序中提取膜蛋白的三维结构。

Cell. 2012 Jun 22;149(7):1607-21. doi: 10.1016/j.cell.2012.04.012. Epub 2012 May 10.

Protein 3D structure computed from evolutionary sequence variation.基于进化序列变异计算的蛋白质 3D 结构。

PLoS One. 2011;6(12):e28766. doi: 10.1371/journal.pone.0028766. Epub 2011 Dec 7.

Direct-coupling analysis of residue coevolution captures native contacts across many protein families.残基共进化的直接耦联分析捕获了许多蛋白质家族中的天然接触。

Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301. doi: 10.1073/pnas.1111471108. Epub 2011 Nov 21.

PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.PSICOV：使用基于稀疏逆协方差估计的大型多重序列比对进行精确结构接触预测。

Bioinformatics. 2012 Jan 15;28(2):184-90. doi: 10.1093/bioinformatics/btr638. Epub 2011 Nov 17.

Learning generative models for protein fold families.学习蛋白质折叠家族的生成模型。

Proteins. 2011 Apr;79(4):1061-78. doi: 10.1002/prot.22934. Epub 2011 Jan 25.

Predicted residue-residue contacts can help the scoring of 3D models.预测的残基-残基接触可以帮助 3D 模型的评分。

Proteins. 2010 Jun;78(8):1980-91. doi: 10.1002/prot.22714.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验