Zhou Tianqian, Zhang Shibo, Song Huijia, He Qiang, Fang Chun, Lin Xiaozhu
College of Information Engineering, Beijing Institute of Petrochemical Technology, No. 19 Qingyuan North Road, Daxing District, Beijing, 102617, China.
J Comput Aided Mol Des. 2025 Sep 4;39(1):76. doi: 10.1007/s10822-025-00653-w.
With the rapid advancement of biotechnology, protein generation and design based on generative models have demonstrated extensive applications in drug development, vaccine research, and biocatalysis. This research proposes a protein generation method based on the generalized diffusion model, which breaks through the traditional diffusion model's reliance on Gaussian noise, enables more flexible protein sequence generation, and preliminarily verifies its advantages. Specifically, protein sequences were first encoded using one-hot encoding and input into the diffusion model to generate novel sequences. Subsequently, the tertiary structures of the generated proteins were predicted using AlphaFold, followed by structural alignment and backbone distance calculation via PyMOL to select the optimal sequences. The predicted derivative protein sequence A_005 was screened from the generated sequences and subjected to an affinity assay with Protein A parental. Experimental results revealed that A_005 exhibited remarkably high affinity, as well as a satisfactory dissociation rate and association rate. The findings demonstrate that the protein generation method based on the generalized diffusion model can effectively design protein sequences with high structural and functional similarity to target sequences. While prior studies have shown that both DDPM and generalized diffusion models achieve high generation quality, the generalized diffusion model outperforms in terms of task adaptability. Our research not only opens new technological pathways for protein design but also lays a solid foundation for future applications in biomedicine, providing significant theoretical and experimental evidence for subsequent drug development.
随着生物技术的迅速发展,基于生成模型的蛋白质生成与设计在药物开发、疫苗研究和生物催化等领域展现出广泛应用。本研究提出了一种基于广义扩散模型的蛋白质生成方法,该方法突破了传统扩散模型对高斯噪声的依赖,能够更灵活地生成蛋白质序列,并初步验证了其优势。具体而言,首先使用独热编码对蛋白质序列进行编码,然后将其输入扩散模型以生成新序列。随后,使用AlphaFold预测生成蛋白质的三级结构,接着通过PyMOL进行结构比对和主链距离计算以选择最优序列。从生成的序列中筛选出预测的衍生蛋白质序列A_005,并与蛋白A亲本进行亲和力测定。实验结果表明,A_005表现出极高的亲和力,以及令人满意的解离速率和结合速率。研究结果表明,基于广义扩散模型的蛋白质生成方法能够有效地设计出与目标序列具有高度结构和功能相似性的蛋白质序列。虽然先前的研究表明DDPM和广义扩散模型都能实现高质量的生成,但广义扩散模型在任务适应性方面表现更优。我们的研究不仅为蛋白质设计开辟了新的技术途径,也为未来在生物医学中的应用奠定了坚实基础,为后续药物开发提供了重要的理论和实验依据。