• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于数据驱动的反应分类模型的开发与应用:电子实验记录本与药物化学文献的比较。

Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature.

机构信息

Information School , University of Sheffield , Regent Court, 211 Portobello , Sheffield S1 4DP , United Kingdom.

Evotec (U.K.) Ltd. , 114 Innovation Drive , Milton Park, Abingdon OX14 4RZ , United Kingdom.

出版信息

J Chem Inf Model. 2019 Oct 28;59(10):4167-4187. doi: 10.1021/acs.jcim.9b00537. Epub 2019 Sep 26.

DOI:10.1021/acs.jcim.9b00537
PMID:31529948
Abstract

Reaction classification has often been considered an important task for many different applications, and has traditionally been accomplished using hand-coded rule-based approaches. However, the availability of large collections of reactions enables data-driven approaches to be developed. We present the development and validation of a 336-class machine learning-based classification model integrated within a Conformal Prediction (CP) framework to associate reaction class predictions with confidence estimations. We also propose a data-driven approach for "dynamic" reaction fingerprinting to maximize the effectiveness of reaction encoding, as well as developing a novel reaction classification system that organizes labels into four hierarchical levels (SHREC: Sheffield Hierarchical REaction Classification). We show that the performance of the CP augmented model can be improved by defining confidence thresholds to detect predictions that are less likely to be false. For example, the external validation of the model reports 95% of predictions as correct by filtering out less than 15% of the uncertain classifications. The application of the model is demonstrated by classifying two reaction data sets: one extracted from an industrial ELN and the other from the medicinal chemistry literature. We show how confidence estimations and class compositions across different levels of information can be used to gain immediate insights on the nature of reaction collections and hidden relationships between reaction classes.

摘要

反应分类通常被认为是许多不同应用的重要任务,传统上使用基于规则的手工编码方法来完成。然而,大量反应的出现使得可以开发基于数据的方法。我们提出了一种基于机器学习的 336 类分类模型的开发和验证,该模型集成在一个共形预测 (CP) 框架内,以关联反应类别的预测和置信度估计。我们还提出了一种数据驱动的“动态”反应指纹识别方法,以最大限度地提高反应编码的有效性,并开发了一种新的反应分类系统,将标签组织成四个层次结构级别 (SHREC:谢菲尔德层次反应分类)。我们表明,通过定义置信度阈值来检测不太可能错误的预测,可以提高 CP 增强模型的性能。例如,该模型的外部验证通过过滤掉不到 15%的不确定分类,报告 95%的预测为正确。该模型的应用通过对两个反应数据集进行分类来演示:一个从工业 ELN 中提取,另一个从药物化学文献中提取。我们展示了如何使用置信度估计和不同信息级别下的类别组成来快速了解反应集合的性质和反应类别之间的隐藏关系。

相似文献

1
Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature.基于数据驱动的反应分类模型的开发与应用:电子实验记录本与药物化学文献的比较。
J Chem Inf Model. 2019 Oct 28;59(10):4167-4187. doi: 10.1021/acs.jcim.9b00537. Epub 2019 Sep 26.
2
Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity.一种用于化学反应的新型指纹图谱的开发及其在大规模反应分类和相似性方面的应用。
J Chem Inf Model. 2015 Jan 26;55(1):39-53. doi: 10.1021/ci5006614. Epub 2015 Jan 13.
3
Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.用于监测的大型行政数据库损伤叙述分类——一种结合机器学习集成和人工审核的实用方法。
Accid Anal Prev. 2017 Jan;98:359-371. doi: 10.1016/j.aap.2016.10.014. Epub 2016 Nov 15.
4
A hybrid method for prediction and repositioning of drug Anatomical Therapeutic Chemical classes.一种用于预测药物解剖治疗化学分类并重新定位的混合方法。
Mol Biosyst. 2014 Apr;10(4):868-77. doi: 10.1039/c3mb70490d. Epub 2014 Feb 4.
5
Enhancing reaction-based de novo design using a multi-label reaction class recommender.使用多标签反应类别推荐器增强基于反应的从头设计。
J Comput Aided Mol Des. 2020 Jul;34(7):783-803. doi: 10.1007/s10822-020-00300-6. Epub 2020 Feb 28.
6
Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.急诊科脓毒症患者院内死亡率的预测:一种基于本地大数据驱动的机器学习方法。
Acad Emerg Med. 2016 Mar;23(3):269-78. doi: 10.1111/acem.12876. Epub 2016 Feb 13.
7
Chemotion ELN: an Open Source electronic lab notebook for chemists in academia.Chemotion电子实验室笔记本:一款面向学术界化学家的开源电子实验室笔记本。
J Cheminform. 2017 Sep 25;9(1):54. doi: 10.1186/s13321-017-0240-0.
8
Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification.学习化学:探索机器学习在基于结构的化学本体分类任务中的适用性。
J Cheminform. 2021 Mar 16;13(1):23. doi: 10.1186/s13321-021-00500-8.
9
Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies.用于高通量筛选 Parham 反应的摄动理论和机器学习 (PTML) 模型:实验和理论研究。
J Chem Inf Model. 2018 Jul 23;58(7):1384-1396. doi: 10.1021/acs.jcim.8b00286. Epub 2018 Jun 27.
10
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.基于数据驱动的血糖动力学建模与预测:机器学习在 1 型糖尿病中的应用。
Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.

引用本文的文献

1
Machine learning applications for thermochemical and kinetic property prediction.用于热化学和动力学性质预测的机器学习应用。
Rev Chem Eng. 2024 Nov 29;41(4):419-449. doi: 10.1515/revce-2024-0027. eCollection 2025 May.
2
Site-specific template generative approach for retrosynthetic planning.用于逆合成规划的位点特异性模板生成方法。
Nat Commun. 2024 Sep 6;15(1):7818. doi: 10.1038/s41467-024-52048-4.
3
Reaction rebalancing: a novel approach to curating reaction databases.反应再平衡:一种整理反应数据库的新方法。
J Cheminform. 2024 Jul 19;16(1):82. doi: 10.1186/s13321-024-00875-4.
4
Incorporating Synthetic Accessibility in Drug Design: Predicting Reaction Yields of Suzuki Cross-Couplings by Leveraging AbbVie's 15-Year Parallel Library Data Set.在药物设计中纳入合成可及性:利用 AbbVie 长达 15 年的平行文库数据集预测铃木交叉偶联反应产率。
J Am Chem Soc. 2024 Jun 5;146(22):15070-15084. doi: 10.1021/jacs.4c00098. Epub 2024 May 20.
5
On the use of real-world datasets for reaction yield prediction.关于使用真实世界数据集进行反应产率预测
Chem Sci. 2023 Mar 13;14(19):4997-5005. doi: 10.1039/d2sc06041h. eCollection 2023 May 17.
6
Organic reactivity from mechanism to machine learning.从机理到机器学习的有机反应活性
Nat Rev Chem. 2021 Apr;5(4):240-255. doi: 10.1038/s41570-021-00260-x. Epub 2021 Mar 16.
7
Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery.预测化学:用于反应部署、反应开发和反应发现的机器学习
Chem Sci. 2022 Nov 28;14(2):226-244. doi: 10.1039/d2sc05089g. eCollection 2023 Jan 4.
8
Navigating chemical reaction space - application to DNA-encoded chemistry.探索化学反应空间——在DNA编码化学中的应用
Chem Sci. 2022 Sep 1;13(37):11221-11231. doi: 10.1039/d2sc02474h. eCollection 2022 Sep 28.
9
Reaction classification and yield prediction using the differential reaction fingerprint DRFP.使用微分反应指纹DRFP进行反应分类和产率预测。
Digit Discov. 2022 Jan 21;1(2):91-97. doi: 10.1039/d1dd00006c. eCollection 2022 Apr 11.
10
Improving machine learning performance on small chemical reaction data with unsupervised contrastive pretraining.通过无监督对比预训练提高机器学习在小化学反应数据上的性能。
Chem Sci. 2022 Jan 11;13(5):1446-1458. doi: 10.1039/d1sc06515g. eCollection 2022 Feb 2.