Tan Yang, Wang Ruilin, Wu Banghao, Hong Liang, Zhou Bingxin
Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
School of Information and Science, East China University of Science and Technology, Shanghai, 200231, China.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i401-i409. doi: 10.1093/bioinformatics/btaf189.
MOTIVATION: Enzyme engineering is a critical approach for producing enzymes that meet industrial and research demands by modifying wild-type proteins to enhance properties such as catalytic activity and thermostability. Beyond traditional directed evolution and rational design, recent advancements in deep learning offer cost-effective and high-performance alternatives. By encoding implicit coevolutionary patterns, these pretrained models have become powerful tools, with the central challenge being to uncover the intricate relationships among protein sequence, structure, and function. RESULTS: We present VenusREM, a retrieval-enhanced protein language model designed to capture local amino acid interactions in both spatial and temporal scales. VenusREM achieves state-of-the-art performance on 217 assays from the ProteinGym benchmark. Beyond high-throughput open benchmark validations, we conducted a low-throughput post hoc analysis on more than 30 mutants to verify the model's ability to improve the stability and binding affinity of a VHH antibody. We also validated the effectiveness of VenusREM by designing 10 novel mutants of a DNA polymerase and performing wet-lab experiments to evaluate their enhanced activity at elevated temperatures. Both in silico and experimental evaluations not only confirm the reliability of VenusREM as a computational tool for enzyme engineering but also demonstrate a comprehensive evaluation framework for future computational studies in mutation effect prediction. AVAILABILITY AND IMPLEMENTATION: The implementation is available at https://github.com/tyang816/VenusREM.
动机:酶工程是一种关键方法,通过修饰野生型蛋白质以增强催化活性和热稳定性等特性来生产满足工业和研究需求的酶。除了传统的定向进化和理性设计外,深度学习的最新进展提供了经济高效且高性能的替代方案。通过对隐含的协同进化模式进行编码,这些预训练模型已成为强大的工具,核心挑战在于揭示蛋白质序列、结构和功能之间的复杂关系。 结果:我们提出了VenusREM,这是一种检索增强的蛋白质语言模型,旨在在空间和时间尺度上捕捉局部氨基酸相互作用。VenusREM在ProteinGym基准测试的217项测定中取得了领先水平的性能。除了高通量开放基准验证外,我们对30多个突变体进行了低通量事后分析,以验证该模型改善VHH抗体稳定性和结合亲和力的能力。我们还通过设计一种DNA聚合酶的10个新突变体并进行湿实验室实验来评估它们在高温下增强的活性,从而验证了VenusREM的有效性。计算机模拟和实验评估不仅证实了VenusREM作为酶工程计算工具的可靠性,还展示了一个用于未来突变效应预测计算研究的综合评估框架。 可用性和实现方式:该实现可在https://github.com/tyang816/VenusREM获取。
Cochrane Database Syst Rev. 2008-7-16
Clin Orthop Relat Res. 2024-9-1
2025-1
Health Technol Assess. 2001
Psychopharmacol Bull. 2024-7-8
Cochrane Database Syst Rev. 2022-3-2
Cochrane Database Syst Rev. 2021-4-19
Funct Integr Genomics. 2025-7-4
mLife. 2024-12-26
Bioinformatics. 2024-11-1
J Chem Inf Model. 2024-5-13
Protein Eng Des Sel. 2023-1-21
Nat Biotechnol. 2024-2