Suppr超能文献

系统发育学中的幂律尾部。

Power law tails in phylogenetic systems.

机构信息

Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom.

Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom

出版信息

Proc Natl Acad Sci U S A. 2018 Jan 23;115(4):690-695. doi: 10.1073/pnas.1711913115. Epub 2018 Jan 8.

Abstract

Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.

摘要

蛋白质序列比对的协方差分析使用共进化的序列位置对来预测蛋白质结构和功能的特征。然而,目前的方法忽略了序列之间的系统发育关系,可能会破坏共变位置的识别。在这里,我们使用随机矩阵理论来证明存在一个幂律尾部,它可以区分由系统发育引起的协方差谱和由结构相互作用引起的协方差谱。该幂律基本上与系统发育树拓扑无关,仅取决于两个参数——序列长度和平均分支长度。我们证明,这些幂律尾部在用于预测 3D 结构中接触的大型蛋白质序列比对中普遍存在,这与我们的理论预测一致。这表明,要将系统发育效应与控制生物功能的序列远端位点之间的相互作用分离,有必要去除或降低协方差矩阵的具有最大特征值的特征向量。我们确认截断这些特征向量可以提高接触预测的准确性。

相似文献

1
Power law tails in phylogenetic systems.系统发育学中的幂律尾部。
Proc Natl Acad Sci U S A. 2018 Jan 23;115(4):690-695. doi: 10.1073/pnas.1711913115. Epub 2018 Jan 8.
2
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
9
PCOAT: positional correlation analysis using multiple methods.PCOAT:使用多种方法的位置相关性分析
Bioinformatics. 2004 Dec 12;20(18):3697-9. doi: 10.1093/bioinformatics/bth431. Epub 2004 Jul 22.

引用本文的文献

3
Impact of phylogeny on the inference of functional sectors from protein sequence data.系统发育对从蛋白质序列数据推断功能区的影响。
PLoS Comput Biol. 2024 Sep 23;20(9):e1012091. doi: 10.1371/journal.pcbi.1012091. eCollection 2024 Sep.

本文引用的文献

2
Inferring interaction partners from protein sequences.从蛋白质序列推断相互作用伙伴。
Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12180-12185. doi: 10.1073/pnas.1606762113. Epub 2016 Sep 23.
4
3D RNA and Functional Interactions from Evolutionary Couplings.基于进化偶联的3D RNA与功能相互作用
Cell. 2016 May 5;165(4):963-75. doi: 10.1016/j.cell.2016.03.030. Epub 2016 Apr 14.
6
The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库:迈向更可持续的未来。
Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.
8
Scaling laws describe memories of host-pathogen riposte in the HIV population.标度律描述了HIV群体中宿主-病原体反应的记忆。
Proc Natl Acad Sci U S A. 2015 Feb 17;112(7):1965-70. doi: 10.1073/pnas.1415386112. Epub 2015 Feb 2.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验