Department of Biochemistry and Molecular Biochemistry, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel.
Department of Biochemistry and Molecular Biochemistry, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel;
Proc Natl Acad Sci U S A. 2017 Oct 31;114(44):11703-11708. doi: 10.1073/pnas.1707642114. Epub 2017 Oct 19.
Proteins share similar segments with one another. Such "reused parts"-which have been successfully incorporated into other proteins-are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment "reuse" across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for "themes"-segments of at least 35 residues of similar sequence and structure-reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.
蛋白质彼此之间具有相似的片段。这些“重复使用的部分”(已成功整合到其他蛋白质中)可能比从头进化的片段具有更大的进化优势,因为大多数后者甚至没有折叠的能力。为了系统地探索蛋白质中片段“重复使用”的进化痕迹,我们开发了一种自动化方法,从蛋白质比对中识别重复使用的片段。我们在代表 15016 个域(蛋白质结构分类数据库(ECOD))或 20398 个链(蛋白质数据库(PDB))的集合中搜索至少 35 个残基的相似序列和结构的“主题” - 重复使用的片段。我们观察到主题重用非常普遍,并且当识别主题的长度阈值较低时,重用更加广泛。结构域是蛋白质中重复使用的最佳特征形式,但它只是许多复杂且交织在一起的进化痕迹之一。其他痕迹包括在少数蛋白质中共享的长主题,它们包含并重叠在许多蛋白质中重复出现的较短主题中。观察到的复杂性与复制和分歧进化一致,并且一些主题可能包含祖先片段的后代。观察到的递归足迹,其中相同的氨基酸可以同时参与几个交织的主题,这可能是蛋白质设计的一个有用概念。数据可在 http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/ 上获得。