系统发育学中的幂律尾部。

Power law tails in phylogenetic systems.

机构信息

Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom.

Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom

出版信息

Proc Natl Acad Sci U S A. 2018 Jan 23;115(4):690-695. doi: 10.1073/pnas.1711913115. Epub 2018 Jan 8.

DOI:10.1073/pnas.1711913115

PMID:29311320

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5789915/

Abstract

Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.

摘要

蛋白质序列比对的协方差分析使用共进化的序列位置对来预测蛋白质结构和功能的特征。然而，目前的方法忽略了序列之间的系统发育关系，可能会破坏共变位置的识别。在这里，我们使用随机矩阵理论来证明存在一个幂律尾部，它可以区分由系统发育引起的协方差谱和由结构相互作用引起的协方差谱。该幂律基本上与系统发育树拓扑无关，仅取决于两个参数——序列长度和平均分支长度。我们证明，这些幂律尾部在用于预测 3D 结构中接触的大型蛋白质序列比对中普遍存在，这与我们的理论预测一致。这表明，要将系统发育效应与控制生物功能的序列远端位点之间的相互作用分离，有必要去除或降低协方差矩阵的具有最大特征值的特征向量。我们确认截断这些特征向量可以提高接触预测的准确性。

相似文献

Power law tails in phylogenetic systems.系统发育学中的幂律尾部。

Proc Natl Acad Sci U S A. 2018 Jan 23;115(4):690-695. doi: 10.1073/pnas.1711913115. Epub 2018 Jan 8.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction.不受系统发育或熵影响的互信息显著改善了残基接触预测。

Bioinformatics. 2008 Feb 1;24(3):333-40. doi: 10.1093/bioinformatics/btm604. Epub 2007 Dec 5.

POWER: PhylOgenetic WEb Repeater--an integrated and user-optimized framework for biomolecular phylogenetic analysis.POWER：系统发育网络中继器——一个用于生物分子系统发育分析的集成且用户优化的框架。

Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W553-6. doi: 10.1093/nar/gki494.

Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information.对系统发育、少量观测值和数据冗余进行校正，可提高使用互信息识别共同进化氨基酸对的准确性。

Bioinformatics. 2009 May 1;25(9):1125-31. doi: 10.1093/bioinformatics/btp135. Epub 2009 Mar 10.

The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction.蛋白质功能残基预测中保守性与相关系统发育的对比特性。

BMC Bioinformatics. 2008 Jan 25;9:51. doi: 10.1186/1471-2105-9-51.

Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions.蛋白质多序列比对中的互信息揭示了两类共同进化的位点。

Biochemistry. 2005 May 17;44(19):7156-65. doi: 10.1021/bi050293e.

A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.同源蛋白质的一种构象空间，其保留互信息并允许基于成对Z分数概率进行系统发育推断。

BMC Bioinformatics. 2005 Mar 10;6:49. doi: 10.1186/1471-2105-6-49.

PCOAT: positional correlation analysis using multiple methods.PCOAT：使用多种方法的位置相关性分析

Bioinformatics. 2004 Dec 12;20(18):3697-9. doi: 10.1093/bioinformatics/bth431. Epub 2004 Jul 22.

The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships.通过排除系统发育关系的信息，利用共进化分析推断蛋白质-蛋白质相互作用的方法得到了改进。

Bioinformatics. 2005 Sep 1;21(17):3482-9. doi: 10.1093/bioinformatics/bti564. Epub 2005 Jun 30.

引用本文的文献

Phylogenetic Corrections and Higher-Order Sequence Statistics in Protein Families: The Potts Model vs MSA Transformer.蛋白质家族中的系统发育校正和高阶序列统计：Potts模型与多序列比对变换器

ArXiv. 2025 Mar 1:arXiv:2503.00289v1.

Natural diversifying evolution of nonribosomal peptide synthetases in a defensive symbiont reveals nonmodular functional constraints.防御性共生体中非核糖体肽合成酶的自然多样化进化揭示了非模块化功能限制。

PNAS Nexus. 2024 Sep 12;3(9):pgae384. doi: 10.1093/pnasnexus/pgae384. eCollection 2024 Sep.

Impact of phylogeny on the inference of functional sectors from protein sequence data.系统发育对从蛋白质序列数据推断功能区的影响。

PLoS Comput Biol. 2024 Sep 23;20(9):e1012091. doi: 10.1371/journal.pcbi.1012091. eCollection 2024 Sep.

The importance of input sequence set to consensus-derived proteins and their relationship to reconstructed ancestral proteins.输入序列集对共识衍生蛋白的重要性及其与重建祖先蛋白的关系。

Protein Sci. 2024 Jun;33(6):e5011. doi: 10.1002/pro.5011.

An evolution-based framework for describing human gut bacteria.一种基于进化的人类肠道细菌描述框架。

bioRxiv. 2023 Dec 5:2023.12.04.569969. doi: 10.1101/2023.12.04.569969.

scPrisma infers, filters and enhances topological signals in single-cell data using spectral template matching.scPrisma 通过谱模板匹配在单细胞数据中推断、过滤和增强拓扑信号。

Nat Biotechnol. 2023 Nov;41(11):1645-1654. doi: 10.1038/s41587-023-01663-5. Epub 2023 Feb 27.

Impact of phylogeny on structural contact inference from protein sequence data.系统发育对从蛋白质序列数据推断结构接触的影响。

J R Soc Interface. 2023 Feb;20(199):20220707. doi: 10.1098/rsif.2022.0707. Epub 2023 Feb 8.

Generative power of a protein language model trained on multiple sequence alignments.基于多序列比对训练的蛋白质语言模型的生成能力。

Elife. 2023 Feb 3;12:e79854. doi: 10.7554/eLife.79854.

General strategies for using amino acid sequence data to guide biochemical investigation of protein function.利用氨基酸序列数据指导蛋白质功能的生化研究的一般策略。

Biochem Soc Trans. 2022 Dec 16;50(6):1847-1858. doi: 10.1042/BST20220849.

Protein language models trained on multiple sequence alignments learn phylogenetic relationships.基于多重序列比对训练的蛋白质语言模型可以学习系统发育关系。

Nat Commun. 2022 Oct 22;13(1):6298. doi: 10.1038/s41467-022-34032-y.

本文引用的文献

A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs.一项针对保守RNA结构的统计测试表明，缺乏lncRNA中存在结构的证据。

Nat Methods. 2017 Jan;14(1):45-48. doi: 10.1038/nmeth.4066. Epub 2016 Nov 7.

Inferring interaction partners from protein sequences.从蛋白质序列推断相互作用伙伴。

Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12180-12185. doi: 10.1073/pnas.1606762113. Epub 2016 Sep 23.

Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models.使用精确可解模型对蛋白质结构和设计的逆统计方法进行基准测试。

PLoS Comput Biol. 2016 May 13;12(5):e1004889. doi: 10.1371/journal.pcbi.1004889. eCollection 2016 May.

3D RNA and Functional Interactions from Evolutionary Couplings.基于进化偶联的3D RNA与功能相互作用

Cell. 2016 May 5;165(4):963-75. doi: 10.1016/j.cell.2016.03.030. Epub 2016 Apr 14.

Intramolecular allosteric communication in dopamine D2 receptor revealed by evolutionary amino acid covariation.通过进化氨基酸共变揭示多巴胺D2受体中的分子内变构通讯

Proc Natl Acad Sci U S A. 2016 Mar 29;113(13):3539-44. doi: 10.1073/pnas.1516579113. Epub 2016 Mar 15.

The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库：迈向更可持续的未来。

Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.

Protein structure determination by combining sparse NMR data with evolutionary couplings.通过将稀疏核磁共振数据与进化耦合相结合来确定蛋白质结构

Nat Methods. 2015 Aug;12(8):751-4. doi: 10.1038/nmeth.3455. Epub 2015 Jun 29.

Scaling laws describe memories of host-pathogen riposte in the HIV population.标度律描述了HIV群体中宿主-病原体反应的记忆。

Proc Natl Acad Sci U S A. 2015 Feb 17;112(7):1965-70. doi: 10.1073/pnas.1415386112. Epub 2015 Feb 2.

Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information.利用进化信息对蛋白质界面上的残基-残基相互作用进行稳健且准确的预测。

Elife. 2014 May 1;3:e02030. doi: 10.7554/eLife.02030.

From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.从主成分分析到蛋白质共进化的直接耦合分析：结构预测需要低特征值模式。

PLoS Comput Biol. 2013;9(8):e1003176. doi: 10.1371/journal.pcbi.1003176. Epub 2013 Aug 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。