Wang Xuze, Li Yangyang, Hou Xiancong, Liu Hao
College of Computer Science and Technology, Ocean University of China, Qingdao, China.
J Enzyme Inhib Med Chem. 2025 Dec;40(1):2524742. doi: 10.1080/14756366.2025.2524742. Epub 2025 Jul 3.
Enzyme sequence design has always been a challenging task, particularly in optimising key properties such as enzyme solubility, stability, and activity. This study proposes an innovative approach by utilising a variational autoencoder (VAE) model integrated with the Gromov-Wasserstein (GW) distance for enzyme sequence optimisation. The GWAE model improves representation learning by using the GW distance, thereby generating functional variants with desired characteristics. We also introduce an innovative enzyme dataset construction method that incorporates multiple sequence alignment (MSA) techniques to address sequence length discrepancies, enhancing the accuracy of the optimisation process. Experimental results show that the GWAE model outperforms the traditional VAE on multiple metrics. The generated enzyme sequences demonstrate superior solubility, stability, and hydrophobicity. Additionally, by integrating AlphaFold3 for structural prediction, we verify the structural stability of the generated sequences, further enhancing their practical applicability.
酶序列设计一直是一项具有挑战性的任务,特别是在优化诸如酶的溶解度、稳定性和活性等关键特性方面。本研究提出了一种创新方法,即利用变分自编码器(VAE)模型与格罗莫夫-瓦瑟斯坦(GW)距离相结合来优化酶序列。GWAE模型通过使用GW距离改进了表示学习,从而生成具有所需特性的功能变体。我们还引入了一种创新的酶数据集构建方法,该方法结合了多序列比对(MSA)技术来解决序列长度差异问题,提高了优化过程的准确性。实验结果表明,GWAE模型在多个指标上优于传统的VAE。生成的酶序列表现出优异的溶解度、稳定性和疏水性。此外,通过整合AlphaFold3进行结构预测,我们验证了生成序列的结构稳定性,进一步提高了它们的实际适用性。