Luo Zhenjie, Geng Aoyun, Wei Leyi, Zou Quan, Cui Feifei, Zhang Zilong
College of Computer Science and Technology, Hainan University, No. 58, Renmin Avenue, Haikou, 570228, China.
Centre for Artificial Intelligence driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR, 999078, China.
Adv Sci (Weinh). 2025 May;12(20):e2412926. doi: 10.1002/advs.202412926. Epub 2025 Apr 15.
Peptides are recognized as next-generation therapeutic drugs due to their unique properties and are essential for treating human diseases. In recent years, a number of deep generation models for generating peptides have been proposed and have shown great potential. However, these models cannot well control the length of the generated sequence, while the sequence length has a very important impact on the physical and chemical properties and therapeutic effects of peptides. Here, a diffusion model is introduced, capable of controlling the length of generated functional peptide sequences, named CPL-Diff. CPL-Diff can control the length of generated polypeptide sequences using only attention masking. Additionally, CPL-Diff can generate single-functional polypeptide sequences based on given conditional information. Experiments demonstrate that the peptides generated by CPL-Diff exhibit lower perplexity and similarity compared to those produced by the current state-of-the-art models, and further exhibit relevant physicochemical properties similar to real sequences. The interpretability analysis is also performed on CPL-Diff to understand how it controls the length of generated sequences and the decision-making process involved in generating polypeptide sequences, with the aim of providing important theoretical guidance for polypeptide design. The code for CPL-Diff is available at https://github.com/luozhenjie1997/CPL-Diff.
由于其独特的性质,肽被认为是下一代治疗药物,对治疗人类疾病至关重要。近年来,已经提出了许多用于生成肽的深度生成模型,并显示出巨大的潜力。然而,这些模型不能很好地控制生成序列的长度,而序列长度对肽的物理化学性质和治疗效果有非常重要的影响。在此,引入了一种能够控制生成的功能性肽序列长度的扩散模型,名为CPL-Diff。CPL-Diff仅使用注意力掩码就能控制生成的多肽序列的长度。此外,CPL-Diff可以根据给定的条件信息生成单功能多肽序列。实验表明,与当前最先进的模型生成的肽相比,CPL-Diff生成的肽具有更低的困惑度和相似度,并且进一步表现出与真实序列相似的相关物理化学性质。还对CPL-Diff进行了解释性分析,以了解它如何控制生成序列的长度以及生成多肽序列所涉及的决策过程,旨在为多肽设计提供重要的理论指导。CPL-Diff的代码可在https://github.com/luozhenjie1997/CPL-Diff获取。