Suppr超能文献

MolE:一种基于解缠注意力的分子图基础模型。

MolE: a foundation model for molecular graphs using disentangled attention.

机构信息

Recursion, Salt Lake City, UT, USA.

Novo Nordisk Research Center, Lexington, MA, USA.

出版信息

Nat Commun. 2024 Nov 12;15(1):9431. doi: 10.1038/s41467-024-53751-y.

Abstract

Models that accurately predict properties based on chemical structure are valuable tools in the chemical sciences. However, for many properties, public and private training sets are typically small, making it difficult for models to generalize well outside of the training data. Recently, this lack of generalization has been mitigated by using self-supervised pretraining on large unlabeled datasets, followed by finetuning on smaller, labeled datasets. Inspired by these advances, we report MolE, a Transformer architecture adapted for molecular graphs together with a two-step pretraining strategy. The first step of pretraining is a self-supervised approach focused on learning chemical structures trained on ~842 million molecular graphs, and the second step is a massive multi-task approach to learn biological information. We show that finetuning models that were pretrained in this way perform better than the best published results on 10 of the 22 ADMET (absorption, distribution, metabolism, excretion and toxicity) tasks included in the Therapeutic Data Commons leaderboard (c. September 2023).

摘要

基于化学结构准确预测性质的模型是化学科学中非常有价值的工具。然而,对于许多性质来说,公共和私人的训练集通常很小,使得模型很难在训练数据之外很好地推广。最近,通过在大型无标签数据集上使用自监督预训练,并在较小的有标签数据集上进行微调,这种缺乏泛化的情况得到了缓解。受这些进展的启发,我们报告了 MolE,这是一种针对分子图的 Transformer 架构,以及一种两步预训练策略。预训练的第一步是一种专注于学习化学结构的自监督方法,在大约 8.42 亿个分子图上进行训练,第二步是一种大规模多任务方法,用于学习生物学信息。我们表明,以这种方式进行预训练的微调模型在 Therapeutic Data Commons 排行榜(c.2023 年 9 月)上包含的 22 项 ADMET(吸收、分布、代谢、排泄和毒性)任务中的 10 项上的表现优于已发表的最佳结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c8/11557931/5dca3c7c0ce1/41467_2024_53751_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验