Tang Sophia, Zhang Yinuo, Chatterjee Pranam
Department of Biomedical Engineering, Duke University.
Management and Technology Program, University of Pennsylvania.
ArXiv. 2025 Jan 1:arXiv:2412.17780v3.
Peptide therapeutics, a major class of medicines, have achieved remarkable success across diseases such as diabetes and cancer, with landmark examples such as GLP-1 receptor agonists revolutionizing the treatment of type-2 diabetes and obesity. Despite their success, designing peptides that satisfy multiple conflicting objectives, such as target binding affinity, solubility, and membrane permeability, remains a major challenge. Classical drug development and target structure-based design are ineffective for such tasks, as they fail to optimize global functional properties critical for therapeutic efficacy. Existing generative frameworks are largely limited to continuous spaces, unconditioned outputs, or single-objective guidance, making them unsuitable for discrete sequence optimization across multiple properties. To address this, we present , a multi-objective discrete diffusion model for the simultaneous generation and optimization of therapeutic peptide SMILES. Built on the Masked Discrete Language Model (MDLM) framework, PepTune ensures valid peptide structures with bond-dependent masking schedules and penalty-based objectives. To guide the diffusion process, we propose a Monte Carlo Tree Search (MCTS)-based strategy that balances exploration and exploitation to iteratively refine Pareto-optimal sequences. MCTS integrates classifier-based rewards with search-tree expansion, overcoming gradient estimation challenges and data sparsity. Using PepTune, we generate diverse, chemically modified peptides optimized for multiple therapeutic properties, including target binding affinity, membrane permeability, solubility, hemolysis, and non-fouling for various disease-relevant targets. In total, our results demonstrate that MCTS-guided masked discrete diffusion is a powerful and modular approach for multi-objective sequence design in discrete state spaces.
肽类疗法是一类主要的药物,在糖尿病和癌症等多种疾病的治疗中取得了显著成功,例如胰高血糖素样肽-1(GLP-1)受体激动剂等标志性药物彻底改变了2型糖尿病和肥胖症的治疗方式。尽管取得了成功,但设计出能满足多种相互冲突目标(如靶标结合亲和力、溶解度和膜通透性)的肽类药物仍然是一项重大挑战。传统的药物开发和基于靶标结构的设计在这类任务中效果不佳,因为它们无法优化对治疗效果至关重要的整体功能特性。现有的生成框架在很大程度上局限于连续空间、无条件输出或单目标引导,使其不适用于跨多种特性的离散序列优化。为了解决这个问题,我们提出了PepTune,一种用于同时生成和优化治疗性肽SMILES的多目标离散扩散模型。PepTune基于掩码离散语言模型(MDLM)框架构建,通过依赖键的掩码策略和基于惩罚的目标确保有效的肽结构。为了引导扩散过程,我们提出了一种基于蒙特卡洛树搜索(MCTS)的策略,该策略平衡探索和利用以迭代优化帕累托最优序列。MCTS将基于分类器的奖励与搜索树扩展相结合,克服了梯度估计挑战和数据稀疏性。使用PepTune,我们生成了针对多种治疗特性进行优化的多样化化学修饰肽,包括针对各种疾病相关靶标的靶标结合亲和力、膜通透性、溶解度、溶血和抗污染性。总的来说,我们的结果表明,MCTS引导的掩码离散扩散是离散状态空间中多目标序列设计的一种强大且模块化的方法。