Biological Sciences Department, New York City College of Technology (City Tech), The City University of New York (CUNY), 285 Jay Street, Brooklyn, NY, 11201, USA.
BMC Evol Biol. 2019 Jul 30;19(1):158. doi: 10.1186/s12862-019-1464-6.
There is wide agreement that only a subset of the twenty standard amino acids existed prebiotically in sufficient concentrations to form functional polypeptides. We ask how this subset, postulated as {A,D,E,G,I,L,P,S,T,V}, could have formed structures stable enough to found metabolic pathways. Inspired by alphabet reduction experiments, we undertook a computational analysis to measure the structural coding behavior of sequences simplified by reduced alphabets. We sought to discern characteristics of the prebiotic set that would endow it with unique properties relevant to structure, stability, and folding.
Drawing on a large dataset of single-domain proteins, we employed an information-theoretic measure to assess how well the prebiotic amino acid set preserves fold information against all other possible ten-amino acid sets. An extensive virtual mutagenesis procedure revealed that the prebiotic set excellently preserves sequence-dependent information regarding both backbone conformation and tertiary contact matrix of proteins. We observed that information retention is fold-class dependent: the prebiotic set sufficiently encodes the structure space of α/β and α + β folds, and to a lesser extent, of all-α and all-β folds. The prebiotic set appeared insufficient to encode the small proteins. Assessing how well the prebiotic set discriminates native vs. incorrect sequence-structure matches, we found that α/β and α + β folds exhibit more pronounced energy gaps with the prebiotic set than with nearly all alternatives.
The prebiotic set optimally encodes local backbone structures that appear in the folded environment and near-optimally encodes the tertiary contact matrix of extant proteins. The fold-class-specific patterns observed from our structural analysis confirm the postulated timeline of fold appearance in proteogenesis derived from proteomic sequence analyses. Polypeptides arising in a prebiotic environment will likely form α/β and α + β-like folds if any at all. We infer that the progressive expansion of the alphabet allowed the increased conformational stability and functional specificity of later folds, including all-α, all-β, and small proteins. Our results suggest that prebiotic sequences are amenable to mutations that significantly lower native conformational energies and increase discrimination amidst incorrect folds. This property may have assisted the genesis of functional proto-enzymes prior to the expansion of the full amino acid alphabet.
人们普遍认为,只有二十种标准氨基酸中的一部分在足够的浓度下存在于前生物环境中,足以形成功能性多肽。我们想知道这个假设的子集{A、D、E、G、I、L、P、S、T、V}如何能够形成足够稳定的结构,从而建立代谢途径。受字母简化实验的启发,我们进行了一项计算分析,以衡量通过简化字母表得到的序列的结构编码行为。我们试图辨别前生物集的特征,这些特征将赋予它与结构、稳定性和折叠相关的独特属性。
利用大量单域蛋白质数据集,我们采用信息论度量来评估前生物氨基酸集在所有其他可能的十种氨基酸集的情况下保存折叠信息的程度。广泛的虚拟诱变程序表明,前生物集极好地保存了与蛋白质的骨架构象和三级接触矩阵相关的序列依赖性信息。我们观察到信息保留是折叠类别的依赖性:前生物集充分编码了α/β和α+β折叠的结构空间,以及在较小程度上的所有-α和所有-β折叠的结构空间。前生物集似乎不足以编码小蛋白质。评估前生物集区分天然与不正确的序列-结构匹配的程度,我们发现α/β和α+β折叠与前生物集的能量差距比与几乎所有其他替代集的能量差距更大。
前生物集最优地编码了在折叠环境中出现的局部骨架结构,并近乎最优地编码了现生物种蛋白质的三级接触矩阵。我们的结构分析观察到的折叠类特异性模式证实了从蛋白质组序列分析中得出的蛋白质发生的折叠出现时间线的假设。如果在任何前生物环境中出现多肽,它们可能会形成α/β和α+β样折叠。我们推断,字母表的逐步扩展允许后来的折叠,包括所有-α、所有-β和小蛋白质,具有更高的构象稳定性和功能特异性。我们的结果表明,前生物序列易于发生突变,这些突变会显著降低天然构象能量,并在不正确的折叠中增加辨别力。这种特性可能在前全氨基酸字母表扩展之前,有助于功能原酶的起源。