基于残基理化性质景观的生成 β-发夹设计。
Generative β-hairpin design using a residue-based physicochemical property landscape.
机构信息
School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia.
Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee.
出版信息
Biophys J. 2024 Sep 3;123(17):2790-2806. doi: 10.1016/j.bpj.2024.01.029. Epub 2024 Feb 1.
De novo peptide design is a new frontier that has broad application potential in the biological and biomedical fields. Most existing models for de novo peptide design are largely based on sequence homology that can be restricted based on evolutionarily derived protein sequences and lack the physicochemical context essential in protein folding. Generative machine learning for de novo peptide design is a promising way to synthesize theoretical data that are based on, but unique from, the observable universe. In this study, we created and tested a custom peptide generative adversarial network intended to design peptide sequences that can fold into the β-hairpin secondary structure. This deep neural network model is designed to establish a preliminary foundation of the generative approach based on physicochemical and conformational properties of 20 canonical amino acids, for example, hydrophobicity and residue volume, using extant structure-specific sequence data from the PDB. The beta generative adversarial network model robustly distinguishes secondary structures of β hairpin from α helix and intrinsically disordered peptides with an accuracy of up to 96% and generates artificial β-hairpin peptide sequences with minimum sequence identities around 31% and 50% when compared against the current NCBI PDB and nonredundant databases, respectively. These results highlight the potential of generative models specifically anchored by physicochemical and conformational property features of amino acids to expand the sequence-to-structure landscape of proteins beyond evolutionary limits.
从头多肽设计是一个新的前沿领域,在生物和生物医学领域具有广泛的应用潜力。大多数现有的从头多肽设计模型主要基于序列同源性,这可能受到进化衍生的蛋白质序列的限制,并且缺乏蛋白质折叠所必需的物理化学上下文。从头多肽设计的生成式机器学习是一种有前途的方法,可以合成基于但又不同于可观测宇宙的理论数据。在这项研究中,我们创建并测试了一个定制的多肽生成式对抗网络,旨在设计可以折叠成 β-发夹二级结构的多肽序列。这个深度神经网络模型旨在基于 20 种常见氨基酸的物理化学和构象特性(例如疏水性和残基体积),建立基于生成方法的初步基础,利用来自 PDB 的现有的结构特异性序列数据。β 生成式对抗网络模型能够以高达 96%的准确率稳健地区分β发夹的二级结构与α螺旋和天然无序肽,并生成与当前 NCBI PDB 和非冗余数据库相比最小序列同一性分别约为 31%和 50%的人工β-发夹多肽序列。这些结果突出了生成模型的潜力,特别是通过氨基酸的物理化学和构象特性来扩展蛋白质的序列到结构景观,超越了进化限制。