琐碎和非琐碎误差源导致互信息方法中蛋白质伙伴的错误识别。

Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches.

机构信息

Laboratório de Biologia Teórica e Computacional (LBTC), Universidade de Brasília DF, Brasília, Brazil.

出版信息

Sci Rep. 2021 Mar 25;11(1):6902. doi: 10.1038/s41598-021-86455-0.

DOI:10.1038/s41598-021-86455-0

PMID:33767294

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7994710/

Abstract

The problem of finding the correct set of partners for a given pair of interacting protein families based on multi-sequence alignments (MSAs) has received great attention over the years. Recently, the native contacts of two interacting proteins were shown to store the strongest mutual information (MI) signal to discriminate MSA concatenations with the largest fraction of correct pairings. Although that signal might be of practical relevance in the search for an effective heuristic to solve the problem, the number of MSA concatenations with near-native MI is large, imposing severe limitations. Here, a Genetic Algorithm that explores possible MSA concatenations according to a MI maximization criteria is shown to find degenerate solutions with two error sources, arising from mismatches among (i) similar and (ii) non-similar sequences. If mistakes made among similar sequences are disregarded, type-(i) solutions are found to resolve correct pairings at best true positive (TP) rates of 70%-far above the very same estimates in type-(ii) solutions. A machine learning classification algorithm helps to show further that differences between optimized solutions based on TP rates are not artificial and may have biological meaning associated with the three-dimensional distribution of the MI signal. Type-(i) solutions may therefore correspond to reliable results for predictive purposes, found here to be more likely obtained via MI maximization across protein systems having a minimum critical number of amino acid contacts on their interaction surfaces (N > 200).

摘要

基于多序列比对（MSA）寻找给定的一对相互作用的蛋白质家族的正确配对伙伴的问题近年来受到了极大关注。最近，两个相互作用的蛋白质的天然接触点被证明存储了最强的互信息（MI）信号，以区分具有最大正确配对比例的 MSA 串联。尽管该信号可能在寻找有效的启发式方法来解决该问题方面具有实际意义，但具有近天然 MI 的 MSA 串联数量很大，这带来了严重的限制。这里，根据 MI 最大化标准探索可能的 MSA 串联的遗传算法被证明可以找到具有两个错误源的退化解决方案，这些错误源来自（i）相似和（ii）非相似序列之间的不匹配。如果忽略相似序列之间的错误，则可以找到类型-(i) 解决方案，以最高的真阳性（TP）率（70%）解决正确配对，远高于类型-(ii) 解决方案中的相同估计值。机器学习分类算法有助于进一步表明，基于 TP 率的优化解决方案之间的差异不是人为的，并且可能与 MI 信号的三维分布有关联。因此，类型-(i) 解决方案可能对应于具有预测目的的可靠结果，在这里发现，通过在其相互作用表面上具有最小临界氨基酸接触数（N>200）的蛋白质系统中进行 MI 最大化，更有可能获得可靠的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ef2/7994710/0fd1e415167d/41598_2021_86455_Fig1_HTML.jpg

相似文献

Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches.琐碎和非琐碎误差源导致互信息方法中蛋白质伙伴的错误识别。

Sci Rep. 2021 Mar 25;11(1):6902. doi: 10.1038/s41598-021-86455-0.

Coevolutive, evolutive and stochastic information in protein-protein interactions.蛋白质-蛋白质相互作用中的协同进化、进化和随机信息。

Comput Struct Biotechnol J. 2019 Nov 20;17:1429-1435. doi: 10.1016/j.csbj.2019.10.005. eCollection 2019.

MISTIC: Mutual information server to infer coevolution.MISTIC：互信息服务器推断共进化。

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W8-14. doi: 10.1093/nar/gkt427. Epub 2013 May 28.

Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information.对系统发育、少量观测值和数据冗余进行校正，可提高使用互信息识别共同进化氨基酸对的准确性。

Bioinformatics. 2009 May 1;25(9):1125-31. doi: 10.1093/bioinformatics/btp135. Epub 2009 Mar 10.

Accurate simulation and detection of coevolution signals in multiple sequence alignments.准确模拟和检测多重序列比对中的协同进化信号。

PLoS One. 2012;7(10):e47108. doi: 10.1371/journal.pone.0047108. Epub 2012 Oct 16.

Reducing the false positive rate in the non-parametric analysis of molecular coevolution.降低分子协同进化非参数分析中的假阳性率。

BMC Evol Biol. 2008 Apr 10;8:106. doi: 10.1186/1471-2148-8-106.

Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments.从已编辑的层级比对获取超大量且精确的蛋白质多重序列比对。

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa042.

Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs.利用来自低质量多序列比对的进化信息改进蛋白质-蛋白质相互作用预测。

PLoS One. 2017 Feb 6;12(2):e0169356. doi: 10.1371/journal.pone.0169356. eCollection 2017.

Characterization of multiple sequence alignment errors using complete-likelihood score and position-shift map.使用完全似然得分和位置偏移图对多序列比对错误进行表征。

BMC Bioinformatics. 2016 Mar 18;17:133. doi: 10.1186/s12859-016-0945-5.

Evaluation measures of multiple sequence alignments.多序列比对的评估方法。

J Comput Biol. 2000 Feb-Apr;7(1-2):261-76. doi: 10.1089/10665270050081513.

引用本文的文献

Investigating Statistical Conditions of Coevolutionary Signals that Enable Algorithmic Predictions of Protein Partners.研究能够实现蛋白质伴侣算法预测的协同进化信号的统计条件。

J Chem Inf Model. 2025 Apr 28;65(8):4107-4115. doi: 10.1021/acs.jcim.5c00052. Epub 2025 Apr 15.

Decoding allosteric landscapes: computational methodologies for enzyme modulation and drug discovery.解读变构景观：用于酶调节和药物发现的计算方法

RSC Chem Biol. 2025 Feb 14;6(4):539-554. doi: 10.1039/d4cb00282b. eCollection 2025 Apr 2.

本文引用的文献

Coevolutive, evolutive and stochastic information in protein-protein interactions.蛋白质-蛋白质相互作用中的协同进化、进化和随机信息。

Comput Struct Biotechnol J. 2019 Nov 20;17:1429-1435. doi: 10.1016/j.csbj.2019.10.005. eCollection 2019.

Phylogenetic correlations can suffice to infer protein partners from sequences.系统发育相关性足以从序列中推断蛋白质伴侣。

PLoS Comput Biol. 2019 Oct 14;15(10):e1007179. doi: 10.1371/journal.pcbi.1007179. eCollection 2019 Oct.

The role of coevolutionary signatures in protein interaction dynamics, complex inference, molecular recognition, and mutational landscapes.共进化特征在蛋白质相互作用动力学、复杂推断、分子识别和突变景观中的作用。

Curr Opin Struct Biol. 2019 Jun;56:179-186. doi: 10.1016/j.sbi.2019.03.024. Epub 2019 Apr 28.

Inferring interaction partners from protein sequences using mutual information.利用互信息从蛋白质序列推断相互作用的伙伴。

PLoS Comput Biol. 2018 Nov 13;14(11):e1006401. doi: 10.1371/journal.pcbi.1006401. eCollection 2018 Nov.

Improved inference of intermolecular contacts through protein-protein interaction prediction using coevolutionary analysis.通过使用共进化分析进行蛋白质-蛋白质相互作用预测来改进分子间接触的推断。

Bioinformatics. 2019 Jun 1;35(12):2036-2042. doi: 10.1093/bioinformatics/bty924.

Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis.通过直接耦合分析同时鉴定特异性相互作用的旁系同源物和蛋白质间相互作用位点

Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12186-12191. doi: 10.1073/pnas.1607570113. Epub 2016 Oct 11.

Inferring interaction partners from protein sequences.从蛋白质序列推断相互作用伙伴。

Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12180-12185. doi: 10.1073/pnas.1606762113. Epub 2016 Sep 23.

Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution.当前序列共进化检测方法的有效性与局限性的比较研究

Bioinformatics. 2015 Jun 15;31(12):1929-37. doi: 10.1093/bioinformatics/btv103. Epub 2015 Feb 19.

P2CS: updates of the prokaryotic two-component systems database.P2CS：原核生物双组分系统数据库的更新

Nucleic Acids Res. 2015 Jan;43(Database issue):D536-41. doi: 10.1093/nar/gku968. Epub 2014 Oct 16.

Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information.利用进化信息对蛋白质界面上的残基-残基相互作用进行稳健且准确的预测。

Elife. 2014 May 1;3:e02030. doi: 10.7554/eLife.02030.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

琐碎和非琐碎误差源导致互信息方法中蛋白质伙伴的错误识别。

Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献