• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在基于共同进化的蛋白质接触预测中,系统发育相关性的影响。

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins.

机构信息

University of Havana, Physics Faculty, Department of Theoretical Physics, Group of Complex Systems and Statistical Physics, Havana, Cuba.

Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative - LCQB, Paris, France.

出版信息

PLoS Comput Biol. 2021 May 24;17(5):e1008957. doi: 10.1371/journal.pcbi.1008957. eCollection 2021 May.

DOI:10.1371/journal.pcbi.1008957
PMID:34029316
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8177639/
Abstract

Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings.

摘要

基于共进化的接触预测,无论是直接通过全局统计序列模型产生的共进化耦合,还是使用结构监督和深度学习,都在基于序列的蛋白质结构预测中得到了广泛应用。然而,全局统计建模的一个基本假设是,序列形成了一个未知概率分布的至少近似独立的样本,该样本可以从数据中学习。在蛋白质家族的情况下,这种假设显然被蛋白质序列之间的系统发育关系所违反。事实证明,在共进化模型学习中考虑系统发育相关性非常困难。在这里,我们提出了一种互补的方法:我们开发了一些策略来对序列数据进行随机化或重采样,这样就可以保留保守模式和系统发育关系,同时去除内在的(即基于结构或功能的)共进化耦合。将直接耦合分析应用于真实数据和重采样数据的结果进行比较表明,最大的共进化耦合,即用于接触预测的那些,仅受到系统发育的微弱影响。然而,在重采样数据中,由系统发育引起的虚假耦合与真实数据中的第一个假阳性接触预测大小相当。因此,从功能上对系统发育诱导的耦合进行剖析,可能会将准确的接触预测扩展到中等大小的耦合范围。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/bfb354be69bd/pcbi.1008957.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/72bf6a7c0555/pcbi.1008957.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/561ebcf27ab3/pcbi.1008957.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/03b77b192657/pcbi.1008957.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/3fed0a07086d/pcbi.1008957.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/86f0b6dcb91a/pcbi.1008957.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/bfb354be69bd/pcbi.1008957.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/72bf6a7c0555/pcbi.1008957.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/561ebcf27ab3/pcbi.1008957.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/03b77b192657/pcbi.1008957.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/3fed0a07086d/pcbi.1008957.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/86f0b6dcb91a/pcbi.1008957.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c4e/8177639/bfb354be69bd/pcbi.1008957.g006.jpg

相似文献

1
On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins.在基于共同进化的蛋白质接触预测中,系统发育相关性的影响。
PLoS Comput Biol. 2021 May 24;17(5):e1008957. doi: 10.1371/journal.pcbi.1008957. eCollection 2021 May.
2
Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.通过整合深度多序列比对、协同进化和机器学习进行蛋白质接触预测。
Proteins. 2018 Mar;86 Suppl 1(Suppl 1):84-96. doi: 10.1002/prot.25405. Epub 2017 Oct 31.
3
A multi-scale coevolutionary approach to predict interactions between protein domains.一种预测蛋白质结构域相互作用的多尺度协同进化方法。
PLoS Comput Biol. 2019 Oct 21;15(10):e1006891. doi: 10.1371/journal.pcbi.1006891. eCollection 2019 Oct.
4
Coevolutionary Analysis of Protein Sequences for Molecular Modeling.用于分子建模的蛋白质序列共进化分析
Methods Mol Biol. 2019;2022:379-397. doi: 10.1007/978-1-4939-9608-7_16.
5
Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks.利用深度卷积神经网络增强进化耦合。
Cell Syst. 2018 Jan 24;6(1):65-74.e3. doi: 10.1016/j.cels.2017.11.014. Epub 2017 Dec 20.
6
Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13.基于深度残差神经网络的原始共进化特征集成方法在 CASP13 中用于接触图预测。
Proteins. 2019 Dec;87(12):1082-1091. doi: 10.1002/prot.25798. Epub 2019 Aug 22.
7
The evolution of contact prediction: evidence that contact selection in statistical contact prediction is changing.接触预测的演变:接触选择在统计接触预测中发生变化的证据。
Bioinformatics. 2020 Mar 1;36(6):1750-1756. doi: 10.1093/bioinformatics/btz816.
8
Chasing long-range evolutionary couplings in the AlphaFold era.在 AlphaFold 时代追寻长程进化耦合。
Biopolymers. 2023 Mar;114(3):e23530. doi: 10.1002/bip.23530. Epub 2023 Feb 8.
9
Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone.共进化蛋白质界面的保守性架起了“模糊地带”中原核生物与真核生物同源性的桥梁。
Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):15018-15023. doi: 10.1073/pnas.1611861114. Epub 2016 Dec 13.
10
DCA-MOL: A PyMOL Plugin To Analyze Direct Evolutionary Couplings.DCA-MOL:一个用于分析直接进化耦合的 PyMOL 插件。
J Chem Inf Model. 2019 Feb 25;59(2):625-629. doi: 10.1021/acs.jcim.8b00690. Epub 2019 Jan 28.

引用本文的文献

1
Phylogenetic Corrections and Higher-Order Sequence Statistics in Protein Families: The Potts Model vs MSA Transformer.蛋白质家族中的系统发育校正和高阶序列统计:Potts模型与多序列比对变换器
ArXiv. 2025 Mar 1:arXiv:2503.00289v1.
2
Inference and visualization of complex genotype-phenotype maps with .利用……对复杂基因型-表型图谱进行推断和可视化
bioRxiv. 2025 Mar 15:2025.03.09.642267. doi: 10.1101/2025.03.09.642267.
3
Reconstruction of Ancestral Protein Sequences Using Autoregressive Generative Models.使用自回归生成模型重建祖先蛋白质序列

本文引用的文献

1
Global analysis of more than 50,000 SARS-CoV-2 genomes reveals epistasis between eight viral genes.对超过 50,000 个 SARS-CoV-2 基因组的全球分析揭示了 8 个病毒基因之间的上位性。
Proc Natl Acad Sci U S A. 2020 Dec 8;117(49):31519-31526. doi: 10.1073/pnas.2012331117. Epub 2020 Nov 17.
2
An evolution-based model for designing chorismate mutase enzymes.一种基于进化的分支酸变位酶设计模型。
Science. 2020 Jul 24;369(6502):440-445. doi: 10.1126/science.aba3304.
3
Improved protein structure prediction using predicted interresidue orientations.
Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf070.
4
Identification of coevolving positions by ancestral reconstruction.通过祖先重建鉴定协同进化位点。
Commun Biol. 2025 Feb 28;8(1):329. doi: 10.1038/s42003-025-07676-x.
5
Decoding allosteric landscapes: computational methodologies for enzyme modulation and drug discovery.解读变构景观:用于酶调节和药物发现的计算方法
RSC Chem Biol. 2025 Feb 14;6(4):539-554. doi: 10.1039/d4cb00282b. eCollection 2025 Apr 2.
6
Impact of phylogeny on the inference of functional sectors from protein sequence data.系统发育对从蛋白质序列数据推断功能区的影响。
PLoS Comput Biol. 2024 Sep 23;20(9):e1012091. doi: 10.1371/journal.pcbi.1012091. eCollection 2024 Sep.
7
Impact of phylogeny on structural contact inference from protein sequence data.系统发育对从蛋白质序列数据推断结构接触的影响。
J R Soc Interface. 2023 Feb;20(199):20220707. doi: 10.1098/rsif.2022.0707. Epub 2023 Feb 8.
8
Generative power of a protein language model trained on multiple sequence alignments.基于多序列比对训练的蛋白质语言模型的生成能力。
Elife. 2023 Feb 3;12:e79854. doi: 10.7554/eLife.79854.
9
Protein language models trained on multiple sequence alignments learn phylogenetic relationships.基于多重序列比对训练的蛋白质语言模型可以学习系统发育关系。
Nat Commun. 2022 Oct 22;13(1):6298. doi: 10.1038/s41467-022-34032-y.
10
Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes.通过上位序列景观破译 61157 个大肠杆菌基因组中的多态性。
Nat Commun. 2022 Jul 12;13(1):4030. doi: 10.1038/s41467-022-31643-3.
利用预测的残基间取向改进蛋白质结构预测。
Proc Natl Acad Sci U S A. 2020 Jan 21;117(3):1496-1503. doi: 10.1073/pnas.1914677117. Epub 2020 Jan 2.
4
Structures of a dimodular nonribosomal peptide synthetase reveal conformational flexibility.二模块非核糖体肽合成酶的结构揭示了构象灵活性。
Science. 2019 Nov 8;366(6466). doi: 10.1126/science.aaw4388.
5
Phylogenetic weighting does little to improve the accuracy of evolutionary coupling analyses.系统发育加权对提高进化偶联分析的准确性作用不大。
Entropy (Basel). 2019 Oct;21(10). doi: 10.3390/e21101000. Epub 2019 Oct 12.
6
Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13).使用多个深度神经网络进行蛋白质结构预测在第十三届蛋白质结构预测关键评估 (CASP13) 中。
Proteins. 2019 Dec;87(12):1141-1148. doi: 10.1002/prot.25834.
7
Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints.深度学习利用迭代预测的结构约束来扩展从头开始的蛋白质建模对基因组的覆盖范围。
Nat Commun. 2019 Sep 4;10(1):3977. doi: 10.1038/s41467-019-11994-0.
8
Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction.CCMgen 通过合成蛋白比对量化残基残基接触预测中的噪声。
PLoS Comput Biol. 2018 Nov 5;14(11):e1006526. doi: 10.1371/journal.pcbi.1006526. eCollection 2018 Nov.
9
The Pfam protein families database in 2019.2019 年 Pfam 蛋白质家族数据库。
Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432. doi: 10.1093/nar/gky995.
10
High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.利用全卷积神经网络和最小序列特征进行高精度蛋白质接触预测。
Bioinformatics. 2018 Oct 1;34(19):3308-3315. doi: 10.1093/bioinformatics/bty341.