Tal Guy, Boca Simina Maria, Mittenthal Jay, Caetano-Anollés Gustavo
Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA.
Department of Cell and Developmental Biology, University of Illinois, Urbana, IL, 61801, USA.
J Mol Evol. 2016 May;82(4-5):230-43. doi: 10.1007/s00239-016-9740-1. Epub 2016 May 5.
Domains are folded structures and evolutionary building blocks of protein molecules. Their three-dimensional atomic conformations, which define biological functions, can be coarse-grained into levels of a hierarchy. Here we build global dynamical models for the evolution of domains at fold and fold superfamily (FSF) levels. We fit the models with data from phylogenomic trees of domain structures and evaluate the distributions of the resulting parameters and their implications. The trees were inferred from a census of domain structures in hundreds of genomes from all three superkingdoms of life. The models used birth-death differential equations with the global abundances of structures as state variables, with one set of equations for folds and another for FSFs. Only the transitions present in the tree are assumed possible. Each fold or FSF diversifies in variants, eventually producing a new fold or FSF. The parameters specify rates of generation of variants and of new folds or FSFs. The equations were solved for the parameters by simplifying the trees to a comb-like topology, treating branches as emerging directly from a trunk. We found that the rate constants for folds and FSFs evolved similarly. These parameters showed a sharp transient change at about 1.5 Gyrs ago. This time coincides with a period in which domains massively combined in proteins and their arrangements distributed in novel lineages during the rise of organismal diversification. Our simulations suggest that exploration of protein structure space occurs through coarse-grained discoveries that undergo fine-grained elaboration.
结构域是蛋白质分子的折叠结构和进化构建单元。它们的三维原子构象决定了生物学功能,可粗粒度化为一个层次结构的不同级别。在这里,我们构建了结构域在折叠和折叠超家族(FSF)水平上进化的全局动力学模型。我们将模型与来自结构域结构的系统发育树的数据进行拟合,并评估所得参数的分布及其含义。这些树是从生命三大超界数百个基因组中的结构域结构普查推断出来的。模型使用以结构的全局丰度为状态变量的生死微分方程,一组方程用于折叠,另一组用于FSF。仅假设树中存在的转变是可能的。每个折叠或FSF在变体中多样化,最终产生一个新的折叠或FSF。参数指定变体以及新折叠或FSF的产生速率。通过将树简化为梳状拓扑结构,将分支视为直接从主干出现,求解方程以获得参数。我们发现折叠和FSF的速率常数以相似的方式进化。这些参数在大约15亿年前出现了急剧的瞬态变化。这个时间与一个时期相吻合,在此期间,在生物多样化兴起过程中,结构域在蛋白质中大量组合,并且它们的排列分布在新的谱系中。我们的模拟表明,蛋白质结构空间的探索是通过经历细粒度细化的粗粒度发现来进行的。