• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

反应再平衡:一种整理反应数据库的新方法。

Reaction rebalancing: a novel approach to curating reaction databases.

作者信息

Phan Tieu-Long, Weinbauer Klaus, Gärtner Thomas, Merkle Daniel, Andersen Jakob L, Fagerberg Rolf, Stadler Peter F

机构信息

Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.

Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark.

出版信息

J Cheminform. 2024 Jul 19;16(1):82. doi: 10.1186/s13321-024-00875-4.

DOI:10.1186/s13321-024-00875-4
PMID:39030583
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11264917/
Abstract

PURPOSE

Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the rule rather than the exception. The curation and correction of such incomplete entries is thus an urgent need.

METHODS

The SynRBL framework addresses this issue with a dual-strategy: a rule-based method for non-carbon compounds, using atomic symbols and counts for prediction, alongside a Maximum Common Subgraph (MCS)-based technique for carbon compounds, aimed at aligning reactants and products to infer missing entities.

RESULTS

The rule-based method exceeded 99% accuracy, while MCS-based accuracy varied from 81.19 to 99.33%, depending on reaction properties. Furthermore, an applicability domain and a machine learning scoring function were devised to quantify prediction confidence. The overall efficacy of this framework was delineated through its success rate and accuracy metrics, which spanned from 89.83 to 99.75% and 90.85 to 99.05%, respectively.

CONCLUSION

The SynRBL framework offers a novel solution for recalibrating chemical reactions, significantly enhancing reaction completeness. With rigorous validation, it achieved groundbreaking accuracy in reaction rebalancing. This sets the stage for future improvement in particular of atom-atom mapping techniques as well as of downstream tasks such as automated synthesis planning.

SCIENTIFIC CONTRIBUTION

SynRBL features a novel computational approach to correcting unbalanced entries in chemical reaction databases. By combining heuristic rules for inferring non-carbon compounds and common subgraph searches to address carbon unbalance, SynRBL successfully addresses most instances of this problem, which affects the majority of data in most large-scale resources. Compared to alternative solutions, SynRBL achieves a dramatic increase in both success rate and accurary, and provides the first freely available open source solution for this problem.

摘要

目的

反应数据库是计算化学和生物化学中各种应用的关键资源,包括计算机辅助合成规划(CASP)和代谢网络的大规模分析。只有数据集准确完整,这些资源的全部潜力才能得以实现。然而,缺少共反应物和共产物,即反应不平衡,却是普遍现象而非个别情况。因此,对这些不完整条目进行整理和修正迫在眉睫。

方法

SynRBL框架采用双重策略解决此问题:一种针对非碳化合物的基于规则的方法,利用原子符号和计数进行预测,同时采用一种基于最大公共子图(MCS)的技术处理碳化合物,旨在对齐反应物和产物以推断缺失的实体。

结果

基于规则的方法准确率超过99%,而基于MCS的准确率则因反应性质而异,在81.19%至99.33%之间。此外,还设计了一个适用域和一个机器学习评分函数来量化预测置信度。该框架的整体效能通过成功率和准确率指标来描述,成功率分别为89.83%至99.75%,准确率为90.85%至99.05%。

结论

SynRBL框架为重新校准化学反应提供了一种新颖的解决方案,显著提高了反应的完整性。经过严格验证,它在反应重新平衡方面取得了突破性的准确率。这为未来特别是原子到原子映射技术以及诸如自动合成规划等下游任务的改进奠定了基础。

科学贡献

SynRBL具有一种新颖的计算方法,用于校正化学反应数据库中的不平衡条目。通过结合推断非碳化合物的启发式规则和用于解决碳不平衡的公共子图搜索,SynRBL成功解决了这个问题的大多数实例,而这个问题影响了大多数大规模资源中的大部分数据。与其他替代解决方案相比,SynRBL在成功率和准确率方面都有显著提高,并为这个问题提供了首个免费的开源解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/19d87943fbcb/13321_2024_875_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/5fc99b305053/13321_2024_875_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/e5e658172751/13321_2024_875_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/91c873bf45e5/13321_2024_875_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/926266a5e16f/13321_2024_875_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/a4cf89fe60ee/13321_2024_875_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/3f1b50cedaae/13321_2024_875_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/e82f6628279d/13321_2024_875_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/3619be3d94a1/13321_2024_875_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/849c82cd9ade/13321_2024_875_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/4dcf9fc3c496/13321_2024_875_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/19d87943fbcb/13321_2024_875_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/5fc99b305053/13321_2024_875_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/e5e658172751/13321_2024_875_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/91c873bf45e5/13321_2024_875_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/926266a5e16f/13321_2024_875_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/a4cf89fe60ee/13321_2024_875_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/3f1b50cedaae/13321_2024_875_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/e82f6628279d/13321_2024_875_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/3619be3d94a1/13321_2024_875_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/849c82cd9ade/13321_2024_875_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/4dcf9fc3c496/13321_2024_875_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/11264917/19d87943fbcb/13321_2024_875_Fig10_HTML.jpg

相似文献

1
Reaction rebalancing: a novel approach to curating reaction databases.反应再平衡:一种整理反应数据库的新方法。
J Cheminform. 2024 Jul 19;16(1):82. doi: 10.1186/s13321-024-00875-4.
2
AutoTemplate: enhancing chemical reaction datasets for machine learning applications in organic chemistry.自动模板:增强用于有机化学机器学习应用的化学反应数据集。
J Cheminform. 2024 Jun 27;16(1):74. doi: 10.1186/s13321-024-00869-2.
3
Machine Learning in Computer-Aided Synthesis Planning.计算机辅助合成规划中的机器学习
Acc Chem Res. 2018 May 15;51(5):1281-1289. doi: 10.1021/acs.accounts.8b00087. Epub 2018 May 1.
4
Completing and Balancing Database Excerpted Chemical Reactions with a Hybrid Mechanistic-Machine Learning Approach.使用混合机理-机器学习方法完成并平衡数据库摘录的化学反应。
ACS Omega. 2024 Apr 10;9(16):18385-18399. doi: 10.1021/acsomega.4c00262. eCollection 2024 Apr 23.
5
Small Molecule Subgraph Detector (SMSD) toolkit.小分子子图探测器(SMSD)工具包。
J Cheminform. 2009 Aug 10;1(1):12. doi: 10.1186/1758-2946-1-12.
6
Extraction of organic chemistry grammar from unsupervised learning of chemical reactions.从化学反应的无监督学习中提取有机化学语法
Sci Adv. 2021 Apr 7;7(15). doi: 10.1126/sciadv.abe4166. Print 2021 Apr.
7
Computing atom mappings for biochemical reactions without subgraph isomorphism.无需子图同构计算生化反应的原子映射。
J Comput Biol. 2011 Jan;18(1):43-58. doi: 10.1089/cmb.2009.0216.
8
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Enhancing Retrosynthetic Reaction Prediction with Deep Learning Using Multiscale Reaction Classification.利用多尺度反应分类增强深度学习的逆合成反应预测
J Chem Inf Model. 2019 Feb 25;59(2):673-688. doi: 10.1021/acs.jcim.8b00801. Epub 2019 Feb 1.

引用本文的文献

1
SynTemp: Efficient Extraction of Graph-Based Reaction Rules from Large-Scale Reaction Databases.SynTemp:从大规模反应数据库中高效提取基于图的反应规则
J Chem Inf Model. 2025 Mar 24;65(6):2882-2896. doi: 10.1021/acs.jcim.4c01795. Epub 2025 Feb 28.
2
Machine learning-guided strategies for reaction conditions design and optimization.用于反应条件设计与优化的机器学习引导策略。
Beilstein J Org Chem. 2024 Oct 4;20:2476-2492. doi: 10.3762/bjoc.20.212. eCollection 2024.

本文引用的文献

1
Completing and Balancing Database Excerpted Chemical Reactions with a Hybrid Mechanistic-Machine Learning Approach.使用混合机理-机器学习方法完成并平衡数据库摘录的化学反应。
ACS Omega. 2024 Apr 10;9(16):18385-18399. doi: 10.1021/acsomega.4c00262. eCollection 2024 Apr 23.
2
SynCluster: Reaction Type Clustering and Recommendation Framework for Synthesis Planning.SynCluster:用于合成规划的反应类型聚类与推荐框架
JACS Au. 2023 Nov 17;3(12):3446-3461. doi: 10.1021/jacsau.3c00607. eCollection 2023 Dec 25.
3
Bidirectional Graphormer for Reactivity Understanding: Neural Network Trained to Reaction Atom-to-Atom Mapping Task.
双向图格默模型用于反应理解:训练用于反应原子到原子映射任务的神经网络。
J Chem Inf Model. 2022 Jul 25;62(14):3307-3315. doi: 10.1021/acs.jcim.2c00344. Epub 2022 Jul 6.
4
Reaction classification and yield prediction using the differential reaction fingerprint DRFP.使用微分反应指纹DRFP进行反应分类和产率预测。
Digit Discov. 2022 Jan 21;1(2):91-97. doi: 10.1039/d1dd00006c. eCollection 2022 Apr 11.
5
Machine Learning for Chemical Reactivity: The Importance of Failed Experiments.机器学习在化学反应中的应用:失败实验的重要性。
Angew Chem Int Ed Engl. 2022 Jul 18;61(29):e202204647. doi: 10.1002/anie.202204647. Epub 2022 Jun 7.
6
Making the collective knowledge of chemistry open and machine actionable.使化学的集体知识开放并可用于机器操作。
Nat Chem. 2022 Apr;14(4):365-376. doi: 10.1038/s41557-022-00910-7. Epub 2022 Apr 4.
7
Atom-to-atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies.原子到原子的映射:流行映射算法和共识策略的基准研究。
Mol Inform. 2022 Apr;41(4):e2100138. doi: 10.1002/minf.202100138. Epub 2021 Nov 2.
8
Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy.使用基于Transformer的模型和超图探索策略预测逆合成途径。
Chem Sci. 2020 Mar 3;11(12):3316-3325. doi: 10.1039/c9sc05704h.
9
Extraction of organic chemistry grammar from unsupervised learning of chemical reactions.从化学反应的无监督学习中提取有机化学语法
Sci Adv. 2021 Apr 7;7(15). doi: 10.1126/sciadv.abe4166. Print 2021 Apr.
10
Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature.基于数据驱动的反应分类模型的开发与应用:电子实验记录本与药物化学文献的比较。
J Chem Inf Model. 2019 Oct 28;59(10):4167-4187. doi: 10.1021/acs.jcim.9b00537. Epub 2019 Sep 26.