Suppr超能文献

负面化学数据在反应结果预测中提升语言模型。

Negative chemical data boosts language models in reaction outcome prediction.

作者信息

Toniato Alessandra, Vaucher Alain C, Laino Teodoro, Graziani Mara

机构信息

IBM Research Europe, Zurich, Switzerland.

NCCR Catalysis, Zurich, Switzerland.

出版信息

Sci Adv. 2025 Jun 13;11(24):eadt5578. doi: 10.1126/sciadv.adt5578.

Abstract

Trial-and-error approaches in chemistry generate abundant unsuccessful experiments, yet the potential of these so-called negative results remains largely underutilized. Here, we demonstrate that information from negative chemical reactions can be leveraged to improve reactivity-prediction models, offering advantages in scenarios with a limited volume of successful data. We extend the tuning of language models with reinforcement learning to the chemistry domain, training a transformer model for chemical reaction prediction. Our approach is evaluated using both a rigorously controlled dataset and a realistic high-throughput dataset comprising extensive reaction screenings across diverse catalysts sets and experimental conditions. The model achieves state-of-the-art performance by leveraging information from as few as 20 positive data points in the controlled dataset, supported by a negative dataset at least 40 times larger. Consistent results on both datasets demonstrate that, with an appropriate optimization strategy and the inclusion of unsuccessful experimental data, models can be effectively trained even when successful reactions are underrepresented.

摘要

化学中的试错方法会产生大量未成功的实验,然而这些所谓的负面结果的潜力在很大程度上仍未得到充分利用。在这里,我们证明了可以利用来自负面化学反应的信息来改进反应性预测模型,这在成功数据量有限的情况下具有优势。我们将通过强化学习对语言模型的调优扩展到化学领域,训练一个用于化学反应预测的变压器模型。我们使用一个经过严格控制的数据集和一个现实的高通量数据集对我们的方法进行评估,该高通量数据集包含在不同催化剂组和实验条件下的广泛反应筛选。该模型通过利用受控数据集中少至20个正数据点的信息实现了当前最优性能,由至少大40倍的负数据集提供支持。两个数据集上的一致结果表明,通过适当的优化策略并纳入未成功的实验数据,即使成功反应的代表性不足,也可以有效地训练模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d605/12164950/136681015391/sciadv.adt5578-f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验