Centre for Applied Mathematics and Bioinformatics, Department of Mathematics and Natural Sciences, Gulf University for Science and Technology, Hawally, Kuwait.
School of Molecular Sciences and Center for Molecular Design and Biomimetics at the Biodesign Institute, Arizona State University, Tempe, AZ, USA.
Mol Biol Evol. 2022 Jan 7;39(1). doi: 10.1093/molbev/msab280.
Morphospaces-representations of phenotypic characteristics-are often populated unevenly, leaving large parts unoccupied. Such patterns are typically ascribed to contingency, or else to natural selection disfavoring certain parts of the morphospace. The extent to which developmental bias, the tendency of certain phenotypes to preferentially appear as potential variation, also explains these patterns is hotly debated. Here we demonstrate quantitatively that developmental bias is the primary explanation for the occupation of the morphospace of RNA secondary structure (SS) shapes. Upon random mutations, some RNA SS shapes (the frequent ones) are much more likely to appear than others. By using the RNAshapes method to define coarse-grained SS classes, we can directly compare the frequencies that noncoding RNA SS shapes appear in the RNAcentral database to frequencies obtained upon a random sampling of sequences. We show that: 1) only the most frequent structures appear in nature; the vast majority of possible structures in the morphospace have not yet been explored; 2) remarkably small numbers of random sequences are needed to produce all the RNA SS shapes found in nature so far; and 3) perhaps most surprisingly, the natural frequencies are accurately predicted, over several orders of magnitude in variation, by the likelihood that structures appear upon a uniform random sampling of sequences. The ultimate cause of these patterns is not natural selection, but rather a strong phenotype bias in the RNA genotype-phenotype map, a type of developmental bias or "findability constraint," which limits evolutionary dynamics to a hugely reduced subset of structures that are easy to "find."
形态空间——表现表型特征的空间——通常分布不均匀,留下大片未被占据的空间。这些模式通常归因于偶然性,或者自然选择不偏爱形态空间的某些部分。发育偏差(某些表型更倾向于优先出现作为潜在变异的趋势)在多大程度上也解释了这些模式,这是一个激烈争论的问题。在这里,我们定量地证明,发育偏差是 RNA 二级结构(SS)形状形态空间占据的主要解释。在随机突变的情况下,某些 RNA SS 形状(常见的形状)比其他形状更容易出现。通过使用 RNAshapes 方法定义粗粒化 SS 类,我们可以直接比较非编码 RNA SS 形状在 RNAcentral 数据库中的出现频率与通过对序列的随机抽样获得的频率。我们表明:1)只有最常见的结构出现在自然界中;形态空间中绝大多数可能的结构尚未被探索;2)只需要很少数量的随机序列就可以产生迄今为止在自然界中发现的所有 RNA SS 形状;3)也许最令人惊讶的是,通过对序列进行均匀随机抽样的可能性,可以准确预测自然界中 RNA 结构的自然频率,在几个数量级的变化范围内。这些模式的最终原因不是自然选择,而是 RNA 基因型-表型图谱中的强烈表型偏差,即一种发育偏差或“可发现性约束”,它将进化动力学限制在易于“发现”的结构的极大减少的子集内。