Suppr超能文献

琐碎和非琐碎误差源导致互信息方法中蛋白质伙伴的错误识别。

Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches.

机构信息

Laboratório de Biologia Teórica e Computacional (LBTC), Universidade de Brasília DF, Brasília, Brazil.

出版信息

Sci Rep. 2021 Mar 25;11(1):6902. doi: 10.1038/s41598-021-86455-0.

Abstract

The problem of finding the correct set of partners for a given pair of interacting protein families based on multi-sequence alignments (MSAs) has received great attention over the years. Recently, the native contacts of two interacting proteins were shown to store the strongest mutual information (MI) signal to discriminate MSA concatenations with the largest fraction of correct pairings. Although that signal might be of practical relevance in the search for an effective heuristic to solve the problem, the number of MSA concatenations with near-native MI is large, imposing severe limitations. Here, a Genetic Algorithm that explores possible MSA concatenations according to a MI maximization criteria is shown to find degenerate solutions with two error sources, arising from mismatches among (i) similar and (ii) non-similar sequences. If mistakes made among similar sequences are disregarded, type-(i) solutions are found to resolve correct pairings at best true positive (TP) rates of 70%-far above the very same estimates in type-(ii) solutions. A machine learning classification algorithm helps to show further that differences between optimized solutions based on TP rates are not artificial and may have biological meaning associated with the three-dimensional distribution of the MI signal. Type-(i) solutions may therefore correspond to reliable results for predictive purposes, found here to be more likely obtained via MI maximization across protein systems having a minimum critical number of amino acid contacts on their interaction surfaces (N > 200).

摘要

基于多序列比对(MSA)寻找给定的一对相互作用的蛋白质家族的正确配对伙伴的问题近年来受到了极大关注。最近,两个相互作用的蛋白质的天然接触点被证明存储了最强的互信息(MI)信号,以区分具有最大正确配对比例的 MSA 串联。尽管该信号可能在寻找有效的启发式方法来解决该问题方面具有实际意义,但具有近天然 MI 的 MSA 串联数量很大,这带来了严重的限制。这里,根据 MI 最大化标准探索可能的 MSA 串联的遗传算法被证明可以找到具有两个错误源的退化解决方案,这些错误源来自(i)相似和(ii)非相似序列之间的不匹配。如果忽略相似序列之间的错误,则可以找到类型-(i) 解决方案,以最高的真阳性(TP)率(70%)解决正确配对,远高于类型-(ii) 解决方案中的相同估计值。机器学习分类算法有助于进一步表明,基于 TP 率的优化解决方案之间的差异不是人为的,并且可能与 MI 信号的三维分布有关联。因此,类型-(i) 解决方案可能对应于具有预测目的的可靠结果,在这里发现,通过在其相互作用表面上具有最小临界氨基酸接触数(N>200)的蛋白质系统中进行 MI 最大化,更有可能获得可靠的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ef2/7994710/0fd1e415167d/41598_2021_86455_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验