Jin Shuwen, Zeng Zihan, Xiong Xiyan, Huang Baicheng, Tang Li, Wang Hongsheng, Ma Xiao, Tang Xiaochun, Shao Guoqing, Huang Xingxu, Lin Feng
Zhejiang Lab, Hangzhou, 311121, China.
Polytechnic Institute, Zhejiang University, Hangzhou, 310015, China.
Commun Biol. 2025 May 30;8(1):839. doi: 10.1038/s42003-025-08282-7.
The rapid advancement of artificial intelligence (AI) has enabled de novo design of functional proteins, circumventing the reliance on natural templates or sequencing databases. However, current protein design models are ineffective in generating proteins without stable structures, such as antimicrobial peptides (AMPs), which are short and structurally flexible yet play critical biological roles. To address this challenge, we present AMPGen, an evolutionary information-reserved and diffusion-driven generative model for de novo design of target-specific AMPs. AMPGen innovates AI tools, including a generator, a discriminator, and a scorer, along with biochemical knowledge-based screening programs. The generator employs a pre-trained, order-agnostic autoregressive diffusion model, which performs axial attention to capture protein evolutionary information from multiple sequence alignments (MSAs). The AMP-MSA conditional input raises the success rate of generated AMPs, which are subsequently filtered based on physicochemical properties and assessed by an XGBoost-based discriminator. The final target-specific scoring is performed with an LSTM-based scorer, resulting in high-quality AMP candidates. In this study, of the 40 de novo designed AMP candidates for verification, 38 were successfully synthesized, and among them, 81.58% demonstrated antibacterial activity. These AMPs designed by AMPGen are absent from existing AMP databases, and exhibit high antibacterial capacity, sequence diversity, and broad-spectrum activity.
人工智能(AI)的快速发展使得能够从头设计功能蛋白,从而避免了对天然模板或测序数据库的依赖。然而,当前的蛋白质设计模型在生成没有稳定结构的蛋白质方面效率低下,例如抗菌肽(AMPs),它们短小且结构灵活,但却发挥着关键的生物学作用。为了应对这一挑战,我们提出了AMPGen,这是一种用于从头设计目标特异性抗菌肽的保留进化信息和扩散驱动的生成模型。AMPGen创新了人工智能工具,包括生成器、鉴别器和评分器,以及基于生化知识的筛选程序。生成器采用预训练的、与顺序无关的自回归扩散模型,该模型执行轴向注意力以从多序列比对(MSA)中捕获蛋白质进化信息。AMPs-MSA条件输入提高了生成抗菌肽的成功率,随后根据物理化学性质对其进行筛选,并由基于XGBoost的鉴别器进行评估。最终的目标特异性评分由基于LSTM的评分器执行,从而产生高质量的抗菌肽候选物。在本研究中,用于验证的40种从头设计的抗菌肽候选物中有38种成功合成,其中81.58%表现出抗菌活性。这些由AMPGen设计的抗菌肽在现有的抗菌肽数据库中不存在,并且具有高抗菌能力、序列多样性和广谱活性。