Yang Sen, Ju Lingli, Cheng Peng, Zhou JiangLin, Cai Yamin, Feng Dawei
Bioinformatics Center of AMMS, Beijing, 100039, China.
The 921st Hospital of Chinese PLA, Changsha, 410073, China.
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf248.
Generative models have demonstrated considerable promise in de novo protein design. Traditional approaches typically focus on either sequence or structure in isolation, limiting the capacity to explore the intricate sequence-structure landscape and achieve optimal designs. However, joint protein sequence and structure co-design remains a largely underexplored challenge.
We present CoFlow, a discrete model for protein co-design from scratch or given constraints. CoFlow employs a joint discrete flow and integrates a multi-modal protein masked language model to facilitate co-design in the discrete space. Comprehensive experiments demonstrate that CoFlow outperforms previous design methods across multiple evaluation metrics. Notably, CoFlow achieves a consistency approximately eight times higher than that of ESM3 in unconditional generation. Moreover, CoFlow exhibits competitive performance in conditional generation tasks, including motif-scaffolding, protein folding, and inverse folding.
The source code of CoFlow, including data preprocessing and model, is available at https://github.com/LtECoD/CoFlow and https://zenodo.org/records/14842367. (DOI: 10.5281/zenodo.14842367).
生成模型在从头开始的蛋白质设计中展现出了巨大的潜力。传统方法通常孤立地关注序列或结构,限制了探索复杂的序列 - 结构格局并实现最优设计的能力。然而,蛋白质序列和结构的联合协同设计在很大程度上仍是一个未被充分探索的挑战。
我们提出了CoFlow,一种用于从零开始或给定约束条件下进行蛋白质协同设计的离散模型。CoFlow采用联合离散流,并集成了多模态蛋白质掩码语言模型,以促进在离散空间中的协同设计。全面的实验表明,CoFlow在多个评估指标上优于先前的设计方法。值得注意的是,在无条件生成中,CoFlow实现的一致性比ESM3高出约八倍。此外,CoFlow在条件生成任务中表现出具有竞争力的性能,包括基序 - 支架构建、蛋白质折叠和反向折叠。
CoFlow的源代码,包括数据预处理和模型,可在https://github.com/LtECoD/CoFlow和https://zenodo.org/records/14842367获取。(DOI:10.5281/zenodo.14842367)