Suppr超能文献

化学数据共享:经验教训和强制结构化反应数据的案例

Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data.

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Gothenburg, Sweden.

出版信息

J Chem Inf Model. 2023 Jul 24;63(14):4253-4265. doi: 10.1021/acs.jcim.3c00607. Epub 2023 Jul 5.

Abstract

The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets, in order to advance the role of AI in the field at scale, there must be significant improvements in the reporting of reaction data. Currently, the majority of publicly available data is reported in an unstructured format and heavily imbalanced toward high-yielding reactions, which influences the types of models that can be successfully trained. In this Perspective, we analyze several data curation and sharing initiatives that have seen success in chemistry and molecular biology. We discuss several factors that have contributed to their success and how we can take lessons from these case studies and apply them to reaction data. Finally, we spotlight the Open Reaction Database and summarize key actions the community can take toward making reaction data more findable, accessible, interoperable, and reusable (FAIR), including the use of mandates from funding agencies and publishers.

摘要

过去十年,机器学习在计算机辅助合成规划中的应用推动了预测化学和反应信息学领域的许多令人瞩目的发展。尽管其中许多发展甚至是在相对较小的定制数据集上实现的,但为了在大规模上推进 AI 在该领域的作用,反应数据的报告必须有显著的改进。目前,大多数公开可用的数据以非结构化的格式报告,并且严重偏向高产率的反应,这影响了可以成功训练的模型类型。在这篇观点文章中,我们分析了化学和分子生物学领域中几个取得成功的数据管理和共享计划。我们讨论了促成它们成功的几个因素,以及我们如何从这些案例研究中吸取经验教训并将其应用于反应数据。最后,我们重点介绍了开放反应数据库,并总结了社区可以采取的关键行动,以使反应数据更易发现、访问、互操作和可重复使用(FAIR),包括利用资助机构和出版商的规定。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验