• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过无监督对比预训练提高机器学习在小化学反应数据上的性能。

Improving machine learning performance on small chemical reaction data with unsupervised contrastive pretraining.

作者信息

Wen Mingjian, Blau Samuel M, Xie Xiaowei, Dwaraknath Shyam, Persson Kristin A

机构信息

Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA.

College of Chemistry, University of California Berkeley CA 94720 USA.

出版信息

Chem Sci. 2022 Jan 11;13(5):1446-1458. doi: 10.1039/d1sc06515g. eCollection 2022 Feb 2.

DOI:10.1039/d1sc06515g
PMID:35222929
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8809395/
Abstract

Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem-classifying reactions into distinct families-and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets, as well as those based on reaction fingerprints derived from masked language modelling. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.

摘要

机器学习(ML)方法在加速化学空间探索并从数据中提取科学见解以变革化学发现方面具有巨大潜力。然而,现代化学反应ML模型,如基于图神经网络(GNN)的模型,必须在大量带标签的数据上进行训练,以避免过度拟合数据,从而导致低准确性和低可转移性。在这项工作中,我们提出了一种策略,利用无标签数据为少量带标签的化学反应数据学习准确的ML模型。我们专注于一个古老且突出的问题——将反应分类到不同的族中——并为此任务构建了一个GNN模型。我们首先使用无监督对比学习在无标签反应数据上对模型进行预训练,然后在少量带标签的反应上对其进行微调。对比预训练通过使一个反应的两个增强版本的表示彼此相似但与其他反应不同来进行学习。我们提出了保护反应中心的化学上一致的反应增强方法,并发现它们是模型从未标签数据中提取相关信息以辅助反应分类任务的关键。迁移学习得到的模型大幅优于从头开始训练的监督模型。此外,它始终比基于传统规则驱动反应指纹的模型表现更好,传统规则驱动反应指纹长期以来一直是小数据集的默认选择,同时也比基于从掩码语言建模导出的反应指纹的模型表现更好。除了反应分类,该策略的有效性还在回归数据集上进行了测试;学习到的基于GNN的反应指纹还可用于探索化学反应空间,我们通过查询相似反应进行了展示。该策略可轻松应用于其他预测反应问题,以揭示无标签数据在利用有限标签供应学习更好模型方面的作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/96633ec3429a/d1sc06515g-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/33ed2de94a45/d1sc06515g-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/762ed95b75a4/d1sc06515g-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/49269c8668de/d1sc06515g-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/dd187afa32f7/d1sc06515g-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/89e36df87ed5/d1sc06515g-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/96633ec3429a/d1sc06515g-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/33ed2de94a45/d1sc06515g-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/762ed95b75a4/d1sc06515g-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/49269c8668de/d1sc06515g-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/dd187afa32f7/d1sc06515g-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/89e36df87ed5/d1sc06515g-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4067/8809395/96633ec3429a/d1sc06515g-f6.jpg

相似文献

1
Improving machine learning performance on small chemical reaction data with unsupervised contrastive pretraining.通过无监督对比预训练提高机器学习在小化学反应数据上的性能。
Chem Sci. 2022 Jan 11;13(5):1446-1458. doi: 10.1039/d1sc06515g. eCollection 2022 Feb 2.
2
Contrastive learning of graphs under label noise.图在标签噪声下的对比学习。
Neural Netw. 2024 Apr;172:106113. doi: 10.1016/j.neunet.2024.106113. Epub 2024 Jan 6.
3
MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph.MoCL:通过基于分子图的知识感知对比学习实现的数据驱动分子指纹
KDD. 2021 Aug;2021:3585-3594. doi: 10.1145/3447548.3467186. Epub 2021 Aug 14.
4
Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast.通过错误负样本缓解和分解片段对比来提升分子对比学习。
J Chem Inf Model. 2022 Jun 13;62(11):2713-2725. doi: 10.1021/acs.jcim.2c00495. Epub 2022 May 31.
5
Investigating Contrastive Pair Learning's Frontiers in Supervised, Semisupervised, and Self-Supervised Learning.探究对比对学习在监督学习、半监督学习和自监督学习中的前沿进展。
J Imaging. 2024 Aug 13;10(8):196. doi: 10.3390/jimaging10080196.
6
Graph Contrastive Learning With Adaptive Proximity-Based Graph Augmentation.基于自适应近邻图增强的图对比学习
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14301-14314. doi: 10.1109/TNNLS.2023.3278183. Epub 2024 Oct 7.
7
Accurate graph classification via two-staged contrastive curriculum learning.通过两阶段对比课程学习实现准确的图分类。
PLoS One. 2024 Jan 3;19(1):e0296171. doi: 10.1371/journal.pone.0296171. eCollection 2024.
8
Graph Clustering with High-Order Contrastive Learning.基于高阶对比学习的图聚类
Entropy (Basel). 2023 Oct 10;25(10):1432. doi: 10.3390/e25101432.
9
Reducing annotation burden in MR: A novel MR-contrast guided contrastive learning approach for image segmentation.减少磁共振成像中的标注负担:一种新的基于磁共振对比引导的对比学习方法用于图像分割。
Med Phys. 2024 Apr;51(4):2707-2720. doi: 10.1002/mp.16820. Epub 2023 Nov 13.
10
Improved GNNs for Log  Prediction by Transferring Knowledge from Low-Fidelity Data.通过从低质量数据转移知识来改进图神经网络进行日志预测。
J Chem Inf Model. 2023 Apr 24;63(8):2345-2359. doi: 10.1021/acs.jcim.2c01564. Epub 2023 Mar 31.

引用本文的文献

1
HiCLR: Knowledge-Induced Hierarchical Contrastive Learning with Retrosynthesis Prediction Yields a Reaction Foundation Model.HiCLR:基于逆合成预测的知识诱导分层对比学习产生反应基础模型。
JACS Au. 2025 Jun 25;5(7):3140-3155. doi: 10.1021/jacsau.5c00289. eCollection 2025 Jul 28.
2
Smart Reaction Templating: A Graph-Based Method for Automated Molecular Dynamics Input Generation.智能反应模板:一种基于图形的自动生成分子动力学输入的方法。
J Chem Inf Model. 2025 Jun 23;65(12):6038-6047. doi: 10.1021/acs.jcim.5c00445. Epub 2025 Jun 6.
3
Machine learning applications for thermochemical and kinetic property prediction.

本文引用的文献

1
Exploring chemical compound space with quantum-based machine learning.利用基于量子的机器学习探索化合物空间。
Nat Rev Chem. 2020 Jul;4(7):347-358. doi: 10.1038/s41570-020-0189-9. Epub 2020 Jun 12.
2
Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction.通过反应凝聚图的学习表示来进行反应性质的机器学习。
J Chem Inf Model. 2022 May 9;62(9):2101-2110. doi: 10.1021/acs.jcim.1c00975. Epub 2021 Nov 4.
3
The Open Reaction Database.开放式反应数据库。
用于热化学和动力学性质预测的机器学习应用。
Rev Chem Eng. 2024 Nov 29;41(4):419-449. doi: 10.1515/revce-2024-0027. eCollection 2025 May.
4
Local reaction condition optimization via machine learning.通过机器学习优化局部反应条件
J Mol Model. 2025 Apr 23;31(5):143. doi: 10.1007/s00894-025-06365-0.
5
Enhancing chemical reaction search through contrastive representation learning and human-in-the-loop.通过对比表示学习和人工参与来增强化学反应搜索
J Cheminform. 2025 Apr 10;17(1):51. doi: 10.1186/s13321-025-00987-5.
6
HEPOM: Using Graph Neural Networks for the Accelerated Predictions of Hydrolysis Free Energies in Different pH Conditions.HEPOM:利用图神经网络加速预测不同pH条件下的水解自由能
J Chem Inf Model. 2025 Apr 28;65(8):3963-3975. doi: 10.1021/acs.jcim.4c02443. Epub 2025 Apr 4.
7
Machine learning-guided strategies for reaction conditions design and optimization.用于反应条件设计与优化的机器学习引导策略。
Beilstein J Org Chem. 2024 Oct 4;20:2476-2492. doi: 10.3762/bjoc.20.212. eCollection 2024.
8
Designing solvent systems using self-evolving solubility databases and graph neural networks.利用自进化溶解度数据库和图神经网络设计溶剂系统。
Chem Sci. 2023 Dec 8;15(3):923-939. doi: 10.1039/d3sc03468b. eCollection 2024 Jan 17.
9
Machine Learning Full NMR Chemical Shift Tensors of Silicon Oxides with Equivariant Graph Neural Networks.利用等变图神经网络对氧化硅进行机器学习全 NMR 化学位移张量。
J Phys Chem A. 2023 Mar 16;127(10):2388-2398. doi: 10.1021/acs.jpca.2c07530. Epub 2023 Mar 2.
10
Navigating with chemometrics and machine learning in chemistry.在化学领域运用化学计量学和机器学习进行导航。
Artif Intell Rev. 2023 Jan 24:1-26. doi: 10.1007/s10462-023-10391-w.
J Am Chem Soc. 2021 Nov 17;143(45):18820-18826. doi: 10.1021/jacs.1c09820. Epub 2021 Nov 2.
4
Atom-to-atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies.原子到原子的映射:流行映射算法和共识策略的基准研究。
Mol Inform. 2022 Apr;41(4):e2100138. doi: 10.1002/minf.202100138. Epub 2021 Nov 2.
5
Data-Driven Prediction of Formation Mechanisms of Lithium Ethylene Monocarbonate with an Automated Reaction Network.基于自动反应网络的数据驱动预测碳酸亚乙烯酯锂的形成机制
J Am Chem Soc. 2021 Aug 25;143(33):13245-13258. doi: 10.1021/jacs.1c05807. Epub 2021 Aug 11.
6
Quantum chemical calculations of lithium-ion battery electrolyte and interphase species.锂离子电池电解质和界面物种的量子化学计算
Sci Data. 2021 Aug 5;8(1):203. doi: 10.1038/s41597-021-00986-9.
7
Predicting enzymatic reactions with a molecular transformer.用分子变换器预测酶促反应。
Chem Sci. 2021 May 25;12(25):8648-8659. doi: 10.1039/d1sc02362d. eCollection 2021 Jul 1.
8
BonDNet: a graph neural network for the prediction of bond dissociation energies for charged molecules.BonDNet:一种用于预测带电分子键解离能的图神经网络。
Chem Sci. 2020 Dec 8;12(5):1858-1868. doi: 10.1039/d0sc05251e.
9
Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts.用于预测有机催化剂对映选择性的基于反应的机器学习表示法。
Chem Sci. 2021 Apr 3;12(20):6879-6889. doi: 10.1039/d1sc00482d.
10
Reaction Mechanism Generator v3.0: Advances in Automatic Mechanism Generation.反应机制生成器 v3.0:自动机制生成的进展。
J Chem Inf Model. 2021 Jun 28;61(6):2686-2696. doi: 10.1021/acs.jcim.0c01480. Epub 2021 May 28.