Suppr超能文献

高适应性路径可以连接序列重叠度低的蛋白质。

High fitness paths can connect proteins with low sequence overlap.

作者信息

Kantroo Pranav, Wagner Günter P, Machta Benjamin B

机构信息

Computational Biology and Bioinformatics Program, Yale University, New Haven, CT-06520, USA.

Quantitative Biology Institute, Yale University, New Haven, CT-06520, USA.

出版信息

bioRxiv. 2024 Nov 15:2024.11.13.623265. doi: 10.1101/2024.11.13.623265.

Abstract

The structure and function of a protein are determined by its amino acid sequence. While random mutations change a protein's sequence, evolutionary forces shape its structural fold and biological activity. Studies have shown that neutral networks can connect a local region of sequence space by single residue mutations that preserve viability. However, the larger-scale connectedness of protein morphospace remains poorly understood. Recent advances in artificial intelligence have enabled us to computationally predict a protein's structure and quantify its functional plausibility. Here we build on these tools to develop an algorithm that generates viable paths between distantly related extant protein pairs. The intermediate sequences in these paths differ by single residue changes over subsequent steps - substitutions, insertions and deletions are admissible moves. Their fitness is evaluated using the protein language model ESM2, and maintained as high as possible subject to the constraints of the traversal. We document the qualitative variation across paths generated between progressively divergent protein pairs, some of which do not even acquire the same structural fold. The ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.

摘要

蛋白质的结构和功能由其氨基酸序列决定。虽然随机突变会改变蛋白质的序列,但进化力量塑造了其结构折叠和生物活性。研究表明,中性网络可以通过保留生存能力的单残基突变连接序列空间的局部区域。然而,蛋白质形态空间的更大规模连通性仍知之甚少。人工智能的最新进展使我们能够通过计算预测蛋白质的结构并量化其功能合理性。在此,我们基于这些工具开发了一种算法,该算法可在远缘现存蛋白质对之间生成可行路径。这些路径中的中间序列在后续步骤中通过单残基变化而不同——替换、插入和缺失都是允许的移动。使用蛋白质语言模型ESM2评估它们的适应性,并在遍历的约束下尽可能保持高适应性。我们记录了在逐渐分化的蛋白质对之间生成的路径上的定性变化,其中一些甚至没有获得相同的结构折叠。在两个序列之间进行插值的难易程度可以用作它们之间同源性可能性的代理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aca9/11601429/00510ac9c623/nihpp-2024.11.13.623265v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验