Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
Department of Computer Science, University of Haifa, Haifa, Israel.
Protein Sci. 2022 Sep;31(9):e4407. doi: 10.1002/pro.4407.
The emergence of novel proteins, beyond these that can be readily made by duplication and recombination of preexisting domains, is elusive. De novo emergence from random sequences is unlikely because the vast majority of random chains would not even fold, let alone function. An alternative explanation is that novel proteins emerge by duplication and fusion of pre-existing polypeptide segments. In this case, traces of such ancient events may remain within contemporary proteins in the form of reused segments. Together with the late Dan Tawfik, we detected such similar segments, far shorter than intact protein domains, which are found in different environments. The detection of these, "bridging themes," was based on a unique search strategy, where in addition to searching for similarity of shared fragments, so-called "themes," we also explicitly searched for cases in which the sequence segments before and after the theme are dissimilar (both in sequence and structure). Here, using a similar strategy, we further expanded the search and discovered almost 500 additional "bridging themes," linking domains that are often from ancient folds. The themes, of 20 residues or more (average 53), do not retain their structure despite sharing 37% sequence identity on average. Indeed, conformation flexibility may confer an evolutionary advantage, in that it fits in multiple environments. We elaborate on two interesting themes, shared between Rossmann/Trefoil-Plexin-like domains and a β-propeller-like domain. FOR A BROAD AUDIENCE: A fundamental question in molecular evolution is how protein domains emerged. Similar segments shared between domains of seemingly distinct origins, may offer clues, as these may be remnants of the evolutionary process through which these domains emerged. However, finding such cases is difficult. Here, we expand the set of such cases which we curated previously, adding segments shared between domains that are considered ancient.
新蛋白质的出现,除了那些可以通过复制和重组预先存在的结构域轻易产生的蛋白质之外,是难以捉摸的。从头开始从随机序列中出现的可能性不大,因为绝大多数随机链甚至不会折叠,更不用说发挥作用了。另一种解释是,新蛋白质是通过预先存在的多肽片段的复制和融合而出现的。在这种情况下,这些古老事件的痕迹可能以重复使用的片段的形式保留在当代蛋白质中。我与已故的 Dan Tawfik 一起,在不同环境中发现了这些存在于不同环境中的非常短的、远小于完整蛋白质结构域的相似片段。这些“桥梁主题”的检测是基于一种独特的搜索策略,除了搜索共享片段的相似性(所谓的“主题”)之外,我们还明确地搜索了主题前后的序列片段不相似的情况(序列和结构都不相似)。在这里,我们使用类似的策略进一步扩展了搜索,并发现了近 500 个额外的“桥梁主题”,这些主题将经常来自古老折叠的结构域联系起来。这些主题有 20 个残基或更多(平均 53 个),尽管平均共享 37%的序列同一性,但它们不保留其结构。事实上,构象灵活性可能赋予其进化优势,因为它适应于多种环境。我们详细介绍了两个有趣的主题,它们在 Rossmann/Trefoil-Plexin 样结构域和β-螺旋桨样结构域之间共享。适合广大读者:分子进化中的一个基本问题是蛋白质结构域是如何出现的。来自不同起源的结构域之间共享的相似片段可能提供线索,因为这些片段可能是这些结构域出现的进化过程的残余。然而,找到这些情况是困难的。在这里,我们扩展了我们之前整理的此类案例集,添加了被认为是古老的结构域之间共享的片段。