• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

化学数据共享:经验教训和强制结构化反应数据的案例

Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data.

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Gothenburg, Sweden.

出版信息

J Chem Inf Model. 2023 Jul 24;63(14):4253-4265. doi: 10.1021/acs.jcim.3c00607. Epub 2023 Jul 5.

DOI:10.1021/acs.jcim.3c00607
PMID:37405398
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10369484/
Abstract

The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets, in order to advance the role of AI in the field at scale, there must be significant improvements in the reporting of reaction data. Currently, the majority of publicly available data is reported in an unstructured format and heavily imbalanced toward high-yielding reactions, which influences the types of models that can be successfully trained. In this Perspective, we analyze several data curation and sharing initiatives that have seen success in chemistry and molecular biology. We discuss several factors that have contributed to their success and how we can take lessons from these case studies and apply them to reaction data. Finally, we spotlight the Open Reaction Database and summarize key actions the community can take toward making reaction data more findable, accessible, interoperable, and reusable (FAIR), including the use of mandates from funding agencies and publishers.

摘要

过去十年,机器学习在计算机辅助合成规划中的应用推动了预测化学和反应信息学领域的许多令人瞩目的发展。尽管其中许多发展甚至是在相对较小的定制数据集上实现的,但为了在大规模上推进 AI 在该领域的作用,反应数据的报告必须有显著的改进。目前,大多数公开可用的数据以非结构化的格式报告,并且严重偏向高产率的反应,这影响了可以成功训练的模型类型。在这篇观点文章中,我们分析了化学和分子生物学领域中几个取得成功的数据管理和共享计划。我们讨论了促成它们成功的几个因素,以及我们如何从这些案例研究中吸取经验教训并将其应用于反应数据。最后,我们重点介绍了开放反应数据库,并总结了社区可以采取的关键行动,以使反应数据更易发现、访问、互操作和可重复使用(FAIR),包括利用资助机构和出版商的规定。

相似文献

1
Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data.化学数据共享:经验教训和强制结构化反应数据的案例
J Chem Inf Model. 2023 Jul 24;63(14):4253-4265. doi: 10.1021/acs.jcim.3c00607. Epub 2023 Jul 5.
2
FAIRDOMHub: a repository and collaboration environment for sharing systems biology research.FAIRDOMHub:一个用于共享系统生物学研究的知识库和协作环境。
Nucleic Acids Res. 2017 Jan 4;45(D1):D404-D407. doi: 10.1093/nar/gkw1032. Epub 2016 Nov 28.
3
A Data Transformation Methodology to Create Findable, Accessible, Interoperable, and Reusable Health Data: Software Design, Development, and Evaluation Study.一种创建可发现、可访问、可互操作和可重用健康数据的数据转换方法:软件设计、开发和评估研究。
J Med Internet Res. 2023 Mar 8;25:e42822. doi: 10.2196/42822.
4
Understanding the value of curation: A survey of researcher perspectives of data curation services from six US institutions.理解策展的价值:来自六个美国机构的研究人员对数据策展服务的看法调查。
PLoS One. 2023 Nov 1;18(11):e0293534. doi: 10.1371/journal.pone.0293534. eCollection 2023.
5
Initiatives, Concepts, and Implementation Practices of FAIR (Findable, Accessible, Interoperable, and Reusable) Data Principles in Health Data Stewardship Practice: Protocol for a Scoping Review.健康数据管理实践中FAIR(可查找、可访问、可互操作和可重用)数据原则的倡议、概念及实施实践:一项范围综述方案
JMIR Res Protoc. 2021 Feb 2;10(2):e22505. doi: 10.2196/22505.
6
FAIR human neuroscientific data sharing to advance AI driven research and applications: Legal frameworks and missing metadata standards.公平的人类神经科学数据共享以推动人工智能驱动的研究与应用:法律框架和缺失的元数据标准
Front Genet. 2023 Mar 13;14:1086802. doi: 10.3389/fgene.2023.1086802. eCollection 2023.
7
The past, present and future of neuroscience data sharing: a perspective on the state of practices and infrastructure for FAIR.神经科学数据共享的过去、现在与未来:关于促进可获取、可互操作、可重用和可理解(FAIR)实践与基础设施状况的观点
Front Neuroinform. 2024 Jan 5;17:1276407. doi: 10.3389/fninf.2023.1276407. eCollection 2023.
8
The project data sphere initiative: accelerating cancer research by sharing data.项目数据领域计划:通过数据共享加速癌症研究
Oncologist. 2015 May;20(5):464-e20. doi: 10.1634/theoncologist.2014-0431. Epub 2015 Apr 15.
9
Scientist and data architect collaborate to curate and archive an inner ear electrophysiology data collection.科学家和数据架构师合作,对内耳电生理学数据进行整理和归档。
PLoS One. 2019 Oct 18;14(10):e0223984. doi: 10.1371/journal.pone.0223984. eCollection 2019.
10
A framework for community curation of interspecies interactions literature.物种间相互作用文献的社区策展框架。
Elife. 2023 Jul 4;12:e84658. doi: 10.7554/eLife.84658.

引用本文的文献

1
Studying Noncovalent Interactions in Molecular Systems with Machine Learning.利用机器学习研究分子系统中的非共价相互作用。
Chem Rev. 2025 Jun 25;125(12):5776-5829. doi: 10.1021/acs.chemrev.4c00893. Epub 2025 Jun 9.
2
Representation of chemistry transport models simulations using knowledge graphs.使用知识图谱表示化学传输模型模拟
J Cheminform. 2025 May 31;17(1):91. doi: 10.1186/s13321-025-01025-0.
3
Data accessibility in the chemical sciences: an analysis of recent practice in organic chemistry journals.化学科学中的数据可获取性:对有机化学期刊近期实践的分析

本文引用的文献

1
Predicting reaction conditions from limited data through active transfer learning.通过主动迁移学习从有限数据预测反应条件。
Chem Sci. 2022 May 11;13(22):6655-6668. doi: 10.1039/d1sc06932b. eCollection 2022 Jun 7.
2
Many researchers were not compliant with their published data sharing statement: a mixed-methods study.许多研究人员未遵守其公布的数据共享声明:一项混合方法研究。
J Clin Epidemiol. 2022 Oct;150:33-41. doi: 10.1016/j.jclinepi.2022.05.019. Epub 2022 May 30.
3
NIH issues a seismic mandate: share data publicly.美国国立卫生研究院发布了一项重大指令:公开共享数据。
Beilstein J Org Chem. 2025 May 2;21:864-876. doi: 10.3762/bjoc.21.70. eCollection 2025.
4
Simple User-Friendly Reaction Format.简单易用的反应格式。
Mol Inform. 2025 Jan;44(1):e202400361. doi: 10.1002/minf.202400361.
5
Machine learning-guided strategies for reaction conditions design and optimization.用于反应条件设计与优化的机器学习引导策略。
Beilstein J Org Chem. 2024 Oct 4;20:2476-2492. doi: 10.3762/bjoc.20.212. eCollection 2024.
6
Reproducibility in chemistry research.化学研究中的可重复性
Heliyon. 2024 Jun 26;10(14):e33658. doi: 10.1016/j.heliyon.2024.e33658. eCollection 2024 Jul 30.
7
Fine-tuning large language models for chemical text mining.针对化学文本挖掘对大语言模型进行微调。
Chem Sci. 2024 Jun 7;15(27):10600-10611. doi: 10.1039/d4sc00924j. eCollection 2024 Jul 10.
8
Standardizing Substrate Selection: A Strategy toward Unbiased Evaluation of Reaction Generality.标准化底物选择:一种实现反应通用性无偏评估的策略。
ACS Cent Sci. 2024 Apr 8;10(4):899-906. doi: 10.1021/acscentsci.3c01638. eCollection 2024 Apr 24.
9
Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices.反应洞察:使用键电子矩阵进行快速化学反应分析。
J Cheminform. 2024 Mar 29;16(1):37. doi: 10.1186/s13321-024-00834-z.
10
Enhancing chemical synthesis: a two-stage deep neural network for predicting feasible reaction conditions.增强化学合成:用于预测可行反应条件的两阶段深度神经网络。
J Cheminform. 2024 Jan 24;16(1):11. doi: 10.1186/s13321-024-00805-4.
Nature. 2022 Feb;602(7898):558-559. doi: 10.1038/d41586-022-00402-1.
4
Prediction of the Chemical Context for Buchwald-Hartwig Coupling Reactions.预测 Buchwald-Hartwig 偶联反应的化学环境。
Mol Inform. 2022 Aug;41(8):e2100294. doi: 10.1002/minf.202100294. Epub 2022 Feb 22.
5
Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks.利用现代 Hopfield 网络改进少样本和零样本反应模板预测。
J Chem Inf Model. 2022 May 9;62(9):2111-2120. doi: 10.1021/acs.jcim.1c01065. Epub 2022 Jan 15.
6
Call for a Public Open Database of All Chemical Reactions.呼吁建立所有化学反应的公共开放数据库。
J Chem Inf Model. 2022 May 9;62(9):2011-2014. doi: 10.1021/acs.jcim.1c01140. Epub 2021 Nov 10.
7
The Open Reaction Database.开放式反应数据库。
J Am Chem Soc. 2021 Nov 17;143(45):18820-18826. doi: 10.1021/jacs.1c09820. Epub 2021 Nov 2.
8
ReactionDataExtractor: A Tool for Automated Extraction of Information from Chemical Reaction Schemes.反应数据提取器:一种从化学反应图中自动提取信息的工具。
J Chem Inf Model. 2021 Oct 25;61(10):4962-4974. doi: 10.1021/acs.jcim.1c01017. Epub 2021 Sep 15.
9
Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy.使用基于Transformer的模型和超图探索策略预测逆合成途径。
Chem Sci. 2020 Mar 3;11(12):3316-3325. doi: 10.1039/c9sc05704h.
10
Automated Chemical Reaction Extraction from Scientific Literature.从科学文献中自动提取化学反应
J Chem Inf Model. 2022 May 9;62(9):2035-2045. doi: 10.1021/acs.jcim.1c00284. Epub 2021 Jun 11.