从主成分分析到蛋白质共进化的直接耦合分析：结构预测需要低特征值模式。

From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.

机构信息

Laboratoire de Physique Statistique de l'Ecole Normale Supérieure - UMR 8550, associé au CNRS et à l'Université Pierre et Marie Curie, Paris, France.

出版信息

PLoS Comput Biol. 2013;9(8):e1003176. doi: 10.1371/journal.pcbi.1003176. Epub 2013 Aug 22.

DOI:10.1371/journal.pcbi.1003176

PMID:23990764

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3749948/

Abstract

Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant 'patterns' of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold.

摘要

多种方法已经探索了同源蛋白多重序列比对中残基的共变，以提取功能和结构信息。其中包括主成分分析（PCA），它可以识别相关性最强的残基组，以及直接耦合分析（DCA），这是一种基于最大熵原理的全局推断方法，旨在预测残基-残基接触。在本文中，受无序系统统计物理的启发，我们引入了 Hopfield-Potts 模型，将这两种方法自然地结合起来。Hopfield-Potts 模型使我们能够从残基-残基相关矩阵的本征模和本征值的知识中识别相关的“模式”。我们展示了如何计算这些统计模式，以便用比 DCA 少得多的参数准确地预测残基-残基接触。这种降维使得我们可以避免过度拟合，并从较小尺寸的多重序列比对中提取接触信息。此外，我们还表明，PCA 丢弃的低本征值相关模式对于恢复结构信息非常重要：对应的模式高度局域化，即它们集中在少数几个位点，我们发现这些位点在三维蛋白质折叠中彼此非常接近。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b790/3749948/82a8eb579e38/pcbi.1003176.g001.jpg

相似文献

From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.从主成分分析到蛋白质共进化的直接耦合分析：结构预测需要低特征值模式。

PLoS Comput Biol. 2013;9(8):e1003176. doi: 10.1371/journal.pcbi.1003176. Epub 2013 Aug 22.

Improving residue-residue contact prediction via low-rank and sparse decomposition of residue correlation matrix.通过残基相关矩阵的低秩和稀疏分解改进残基-残基接触预测。

Biochem Biophys Res Commun. 2016 Mar 25;472(1):217-22. doi: 10.1016/j.bbrc.2016.01.188. Epub 2016 Feb 23.

Distance matrix-based approach to protein structure prediction.基于距离矩阵的蛋白质结构预测方法。

J Struct Funct Genomics. 2009 Mar;10(1):67-81. doi: 10.1007/s10969-009-9062-2. Epub 2009 Feb 18.

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.通过整合深度多序列比对、协同进化和机器学习进行蛋白质接触预测。

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):84-96. doi: 10.1002/prot.25405. Epub 2017 Oct 31.

Selection of sequence motifs and generative Hopfield-Potts models for protein families.蛋白质家族的序列基序选择和生成型 Hopfield-Potts 模型。

Phys Rev E. 2019 Sep;100(3-1):032128. doi: 10.1103/PhysRevE.100.032128.

Statistical mechanical properties of sequence space determine the efficiency of the various algorithms to predict interaction energies and native contacts from protein coevolution.序列空间的统计力学性质决定了各种算法预测蛋白质共进化中相互作用能和天然接触的效率。

Phys Biol. 2019 Jun 4;16(4):046007. doi: 10.1088/1478-3975/ab1c15.

Predicting protein β-sheet contacts using a maximum entropy-based correlated mutation measure.利用基于最大熵的关联突变度量预测蛋白质 β-折叠接触。

Bioinformatics. 2013 Mar 1;29(5):580-7. doi: 10.1093/bioinformatics/btt005. Epub 2013 Jan 10.

Sequence based residue depth prediction using evolutionary information and predicted secondary structure.基于序列的残基深度预测，利用进化信息和预测的二级结构。

BMC Bioinformatics. 2008 Sep 20;9:388. doi: 10.1186/1471-2105-9-388.

Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction.CCMgen 通过合成蛋白比对量化残基残基接触预测中的噪声。

PLoS Comput Biol. 2018 Nov 5;14(11):e1006526. doi: 10.1371/journal.pcbi.1006526. eCollection 2018 Nov.

MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins.MetaPSICOV：结合协同进化方法用于精确预测蛋白质中的接触和长程氢键

Bioinformatics. 2015 Apr 1;31(7):999-1006. doi: 10.1093/bioinformatics/btu791. Epub 2014 Nov 26.

引用本文的文献

Predicting viral sensitivity to antibodies using genetic sequences and antibody similarities.利用基因序列和抗体相似性预测病毒对抗体的敏感性。

bioRxiv. 2025 Aug 11:2025.08.08.669352. doi: 10.1101/2025.08.08.669352.

What does it take to learn the rules of RNA base pairing? A lot less than you may think.学习RNA碱基配对规则需要什么？比你想象的要少得多。

bioRxiv. 2025 Aug 2:2025.07.31.668042. doi: 10.1101/2025.07.31.668042.

Direct coupling analysis and the attention mechanism.直接耦合分析与注意力机制。

BMC Bioinformatics. 2025 Feb 6;26(1):41. doi: 10.1186/s12859-025-06062-y.

Impact of phylogeny on the inference of functional sectors from protein sequence data.系统发育对从蛋白质序列数据推断功能区的影响。

PLoS Comput Biol. 2024 Sep 23;20(9):e1012091. doi: 10.1371/journal.pcbi.1012091. eCollection 2024 Sep.

Exploring complexity of class-A Beta-lactamase family using physiochemical-based multiplex networks.使用基于理化性质的多重网络探索 A 类β-内酰胺酶家族的复杂性。

Sci Rep. 2023 Nov 23;13(1):20626. doi: 10.1038/s41598-023-48128-y.

Unveiling the Inhibitory Potentials of Peptidomimetic Azanitriles and Pyridyl Esters towards SARS-CoV-2 Main Protease: A Molecular Modelling Investigation.揭示肽拟氮腈和吡啶酯对 SARS-CoV-2 主蛋白酶的抑制潜力：分子模拟研究。

Molecules. 2023 Mar 14;28(6):2641. doi: 10.3390/molecules28062641.

Bézier interpolation improves the inference of dynamical models from data.贝塞尔插值法可改善从数据中推断动力学模型的效果。

Phys Rev E. 2023 Feb;107(2-1):024116. doi: 10.1103/PhysRevE.107.024116.

Coevolution-based prediction of key allosteric residues for protein function regulation.基于共进化的蛋白质功能调控关键别构残基预测。

Elife. 2023 Feb 17;12:e81850. doi: 10.7554/eLife.81850.

Undersampling and the inference of coevolution in proteins.蛋白质中的欠采样和共进化推断。

Cell Syst. 2023 Mar 15;14(3):210-219.e7. doi: 10.1016/j.cels.2022.12.013. Epub 2023 Jan 23.

Structural Investigations and Binding Mechanisms of Oseltamivir Drug Resistance Conferred by the E119V Mutation in Influenza H7N9 Virus.流感 H7N9 病毒 E119V 突变导致奥司他韦耐药的结构研究与结合机制。

Molecules. 2022 Jul 8;27(14):4376. doi: 10.3390/molecules27144376.

本文引用的文献

Prediction of contacts from correlated sequence substitutions.预测相关序列取代的接触。

Curr Opin Struct Biol. 2013 Jun;23(3):473-9. doi: 10.1016/j.sbi.2013.04.001. Epub 2013 May 14.

Emerging methods in protein co-evolution.蛋白质共进化的新兴方法。

Nat Rev Genet. 2013 Apr;14(4):249-61. doi: 10.1038/nrg3414. Epub 2013 Mar 5.

Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models.蛋白质中改进的接触预测：使用伪似然性推断Potts模型。

Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Jan;87(1):012707. doi: 10.1103/PhysRevE.87.012707. Epub 2013 Jan 11.

Protein structure prediction from sequence variation.从序列变异预测蛋白质结构。

Nat Biotechnol. 2012 Nov;30(11):1072-80. doi: 10.1038/nbt.2419.

Genomics-aided structure prediction.基于基因组学的结构预测。

Proc Natl Acad Sci U S A. 2012 Jun 26;109(26):10340-5. doi: 10.1073/pnas.1207864109. Epub 2012 Jun 12.

Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis.通过整合基因组学、分子动力学和诱变，推断组氨酸激酶自身磷酸化的结构基础。

Proc Natl Acad Sci U S A. 2012 Jun 26;109(26):E1733-42. doi: 10.1073/pnas.1201301109. Epub 2012 Jun 5.

Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis.利用片段组装和相关突变分析准确从头预测大型跨膜蛋白结构域。

Proc Natl Acad Sci U S A. 2012 Jun 12;109(24):E1540-7. doi: 10.1073/pnas.1120036109. Epub 2012 May 29.

Three-dimensional structures of membrane proteins from genomic sequencing.从基因组测序中提取膜蛋白的三维结构。

Cell. 2012 Jun 22;149(7):1607-21. doi: 10.1016/j.cell.2012.04.012. Epub 2012 May 10.

Statistical mechanics for natural flocks of birds.鸟类自然群体的统计力学。

Proc Natl Acad Sci U S A. 2012 Mar 27;109(13):4786-91. doi: 10.1073/pnas.1118633109. Epub 2012 Mar 16.

The Protein Data Bank at 40: reflecting on the past to prepare for the future.蛋白质数据库 40 年：回顾过去，展望未来。

Structure. 2012 Mar 7;20(3):391-6. doi: 10.1016/j.str.2012.01.010.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从主成分分析到蛋白质共进化的直接耦合分析：结构预测需要低特征值模式。

From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献