CPL-Diff：一种用于从头设计固定长度功能肽序列的扩散模型。

CPL-Diff: A Diffusion Model for De Novo Design of Functional Peptide Sequences with Fixed Length.

作者信息

Luo Zhenjie, Geng Aoyun, Wei Leyi, Zou Quan, Cui Feifei, Zhang Zilong

机构信息

College of Computer Science and Technology, Hainan University, No. 58, Renmin Avenue, Haikou, 570228, China.

Centre for Artificial Intelligence driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR, 999078, China.

出版信息

Adv Sci (Weinh). 2025 May;12(20):e2412926. doi: 10.1002/advs.202412926. Epub 2025 Apr 15.

DOI:10.1002/advs.202412926

PMID:40231709

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12120732/

Abstract

Peptides are recognized as next-generation therapeutic drugs due to their unique properties and are essential for treating human diseases. In recent years, a number of deep generation models for generating peptides have been proposed and have shown great potential. However, these models cannot well control the length of the generated sequence, while the sequence length has a very important impact on the physical and chemical properties and therapeutic effects of peptides. Here, a diffusion model is introduced, capable of controlling the length of generated functional peptide sequences, named CPL-Diff. CPL-Diff can control the length of generated polypeptide sequences using only attention masking. Additionally, CPL-Diff can generate single-functional polypeptide sequences based on given conditional information. Experiments demonstrate that the peptides generated by CPL-Diff exhibit lower perplexity and similarity compared to those produced by the current state-of-the-art models, and further exhibit relevant physicochemical properties similar to real sequences. The interpretability analysis is also performed on CPL-Diff to understand how it controls the length of generated sequences and the decision-making process involved in generating polypeptide sequences, with the aim of providing important theoretical guidance for polypeptide design. The code for CPL-Diff is available at https://github.com/luozhenjie1997/CPL-Diff.

摘要

由于其独特的性质，肽被认为是下一代治疗药物，对治疗人类疾病至关重要。近年来，已经提出了许多用于生成肽的深度生成模型，并显示出巨大的潜力。然而，这些模型不能很好地控制生成序列的长度，而序列长度对肽的物理化学性质和治疗效果有非常重要的影响。在此，引入了一种能够控制生成的功能性肽序列长度的扩散模型，名为CPL-Diff。CPL-Diff仅使用注意力掩码就能控制生成的多肽序列的长度。此外，CPL-Diff可以根据给定的条件信息生成单功能多肽序列。实验表明，与当前最先进的模型生成的肽相比，CPL-Diff生成的肽具有更低的困惑度和相似度，并且进一步表现出与真实序列相似的相关物理化学性质。还对CPL-Diff进行了解释性分析，以了解它如何控制生成序列的长度以及生成多肽序列所涉及的决策过程，旨在为多肽设计提供重要的理论指导。CPL-Diff的代码可在https://github.com/luozhenjie1997/CPL-Diff获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/097d/12120732/9752aad7f34b/ADVS-12-2412926-g008.jpg

相似文献

CPL-Diff: A Diffusion Model for De Novo Design of Functional Peptide Sequences with Fixed Length.

Adv Sci (Weinh). 2025 May;12(20):e2412926. doi: 10.1002/advs.202412926. Epub 2025 Apr 15.

ProT-Diff: A Modularized and Efficient Strategy for De Novo Generation of Antimicrobial Peptide Sequences by Integrating Protein Language and Diffusion Models.

Adv Sci (Weinh). 2024 Nov;11(43):e2406305. doi: 10.1002/advs.202406305. Epub 2024 Sep 25.

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

GM-Pep: A High Efficiency Strategy to De Novo Design Functional Peptide Sequences.

J Chem Inf Model. 2022 May 23;62(10):2617-2629. doi: 10.1021/acs.jcim.2c00089. Epub 2022 May 9.

De Novo Design of Large Polypeptides Using a Lightweight Diffusion Model Integrating LSTM and Attention Mechanism Under Per-Residue Secondary Structure Constraints.

Molecules. 2025 Feb 28;30(5):1116. doi: 10.3390/molecules30051116.

PFB-Diff: Progressive Feature Blending diffusion for text-driven image editing.

Neural Netw. 2025 Jan;181:106777. doi: 10.1016/j.neunet.2024.106777. Epub 2024 Oct 9.

Diff-Retinex++: Retinex-Driven Reinforced Diffusion Model for Low-Light Image Enhancement.

IEEE Trans Pattern Anal Mach Intell. 2025 Aug;47(8):6823-6841. doi: 10.1109/TPAMI.2025.3563612.

HelixGAN a deep-learning methodology for conditional de novo design of α-helix structures.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad036.

Identification of human viral protein-derived ligands recognized by individual MHCI-restricted T-cell receptors.

Immunol Cell Biol. 2016 Jul;94(6):573-82. doi: 10.1038/icb.2016.12. Epub 2016 Feb 5.

Deep Learning-Based Bioactive Therapeutic Peptide Generation and Screening.

J Chem Inf Model. 2023 Feb 13;63(3):835-845. doi: 10.1021/acs.jcim.2c01485. Epub 2023 Feb 1.

本文引用的文献

Stack-AVP: A Stacked Ensemble Predictor Based on Multi-view Information for Fast and Accurate Discovery of Antiviral Peptides.

J Mol Biol. 2025 Mar 15;437(6):168853. doi: 10.1016/j.jmb.2024.168853. Epub 2024 Nov 6.

ProT-Diff: A Modularized and Efficient Strategy for De Novo Generation of Antimicrobial Peptide Sequences by Integrating Protein Language and Diffusion Models.

Adv Sci (Weinh). 2024 Nov;11(43):e2406305. doi: 10.1002/advs.202406305. Epub 2024 Sep 25.

Design and Synthesis of Antifungal Peptides Guided by Quantitative Antifungal Activity.

J Chem Inf Model. 2024 May 27;64(10):4277-4285. doi: 10.1021/acs.jcim.4c00142. Epub 2024 May 14.

ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a language diffusion model.

Sci Adv. 2024 Feb 9;10(6):eadl4000. doi: 10.1126/sciadv.adl4000. Epub 2024 Feb 7.

Generative models for protein sequence modeling: recent advances and future directions.

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad358.

Structure of a fungal 1,3-β-glucan synthase.

Sci Adv. 2023 Sep 15;9(37):eadh7820. doi: 10.1126/sciadv.adh7820. Epub 2023 Sep 13.

Generative design of proteins based on secondary structure constraints using an attention-based diffusion model.

Chem. 2023 Jul 13;9(7):1828-1849. doi: 10.1016/j.chempr.2023.03.020. Epub 2023 Apr 20.

Evolutionary-scale prediction of atomic-level protein structure with a language model.

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

Discovering highly potent antimicrobial peptides with deep generative model HydrAMP.

Nat Commun. 2023 Mar 15;14(1):1453. doi: 10.1038/s41467-023-36994-z.

Large language models generate functional protein sequences across diverse families.

Nat Biotechnol. 2023 Aug;41(8):1099-1106. doi: 10.1038/s41587-022-01618-2. Epub 2023 Jan 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CPL-Diff：一种用于从头设计固定长度功能肽序列的扩散模型。

CPL-Diff: A Diffusion Model for De Novo Design of Functional Peptide Sequences with Fixed Length.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献