Suppr超能文献

利用T5ProtChem对分子和蛋白质语言表征进行统一深度学习。

Unified Deep Learning of Molecular and Protein Language Representations with T5ProtChem.

作者信息

Kelly Thomas, Xia Song, Lu Jieyu, Zhang Yingkai

机构信息

Department of Chemistry, New York University, New York, New York 10003, United States.

Simons Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States.

出版信息

J Chem Inf Model. 2025 Apr 28;65(8):3990-3998. doi: 10.1021/acs.jcim.5c00051. Epub 2025 Apr 8.

Abstract

Deep learning has revolutionized difficult tasks in chemistry and biology, yet existing language models often treat these domains separately, relying on concatenated architectures and independently pretrained weights. These approaches fail to fully exploit the shared atomic foundations of molecular and protein sequences. Here, we introduce T5ProtChem, a unified model based on the T5 architecture, designed to simultaneously process molecular and protein sequences. Using a new pretraining objective, ProtiSMILES, T5ProtChem bridges the molecular and protein domains, enabling efficient, generalizable protein-chemical modeling. The model achieves a state-of-the-art performance in tasks such as binding affinity prediction and reaction prediction, while having a strong performance in protein function prediction. Additionally, it supports novel applications, including covalent binder classification and sequence-level adduct prediction. These results demonstrate the versatility of unified language models for drug discovery, protein engineering, and other interdisciplinary efforts in computational biology and chemistry.

摘要

深度学习彻底改变了化学和生物学中的难题,但现有的语言模型通常将这些领域分开处理,依赖于拼接架构和独立预训练的权重。这些方法未能充分利用分子和蛋白质序列共有的原子基础。在此,我们引入了T5ProtChem,这是一种基于T5架构的统一模型,旨在同时处理分子和蛋白质序列。通过使用一种新的预训练目标ProtiSMILES,T5ProtChem架起了分子和蛋白质领域之间的桥梁,实现了高效、可推广的蛋白质-化学建模。该模型在结合亲和力预测和反应预测等任务中取得了领先的性能,同时在蛋白质功能预测方面也表现出色。此外,它还支持新的应用,包括共价结合剂分类和序列水平加合物预测。这些结果证明了统一语言模型在药物发现、蛋白质工程以及计算生物学和化学中的其他跨学科研究中的通用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/637b/12042257/02327f88915f/ci5c00051_0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验