German Aerospace Center (DLR), Earth Observation Center (EOC) , Münchner Straße 20, D-82234 Oberpfaffenhofen-Wessling, Germany.
J Chem Inf Model. 2013 Nov 25;53(11):2851-62. doi: 10.1021/ci400209n. Epub 2013 Nov 11.
α-Amino acids are fundamental to biochemistry as the monomeric building blocks with which cells construct proteins according to genetic instructions. However, the 20 amino acids of the standard genetic code represent a tiny fraction of the number of α-amino acid chemical structures that could plausibly play such a role, both from the perspective of natural processes by which life emerged and evolved, and from the perspective of human-engineered genetically coded proteins. Until now, efforts to describe the structures comprising this broader set, or even estimate their number, have been hampered by the complex combinatorial properties of organic molecules. Here, we use computer software based on graph theory and constructive combinatorics in order to conduct an efficient and exhaustive search of the chemical structures implied by two careful and precise definitions of the α-amino acids relevant to coded biological proteins. Our results include two virtual libraries of α-amino acid structures corresponding to these different approaches, comprising 121 044 and 3 846 structures, respectively, and suggest a simple approach to exploring much larger, as yet uncomputed, libraries of interest.
α-氨基酸是生物化学的基础,它们是细胞根据遗传指令构建蛋白质的单体结构单元。然而,标准遗传密码中的 20 种氨基酸只是可能发挥这种作用的α-氨基酸化学结构数量的一小部分,无论是从生命出现和进化的自然过程的角度来看,还是从人类设计的遗传编码蛋白质的角度来看。到目前为止,描述这一更广泛集合的结构,甚至估计其数量的努力,都受到有机分子复杂组合性质的阻碍。在这里,我们使用基于图论和构造组合学的计算机软件,对与编码生物蛋白质相关的两个精心而精确的α-氨基酸定义所暗示的化学结构进行高效和全面的搜索。我们的结果包括对应于这两种不同方法的两个虚拟α-氨基酸结构库,分别包含 121044 和 3846 种结构,并提出了一种简单的方法来探索更大的、尚未计算的相关文库。