Nippa David F, Müller Alex T, Atz Kenneth, Konrad David B, Grether Uwe, Martin Rainer E, Schneider Gisbert
Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070, Basel, Switzerland.
Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany.
Mol Inform. 2025 Jan;44(1):e202400361. doi: 10.1002/minf.202400361.
Utilizing the growing wealth of chemical reaction data can boost synthesis planning and increase success rates. Yet, the effectiveness of machine learning tools for retrosynthesis planning and forward reaction prediction relies on accessible, well-curated data presented in a structured format. Although some public and licensed reaction databases exist, they often lack essential information about reaction conditions. To address this issue and promote the principles of findable, accessible, interoperable, and reusable (FAIR) data reporting and sharing, we introduce the Simple User-Friendly Reaction Format (SURF). SURF standardizes the documentation of reaction data through a structured tabular format, requiring only a basic understanding of spreadsheets. This format enables chemists to record the synthesis of molecules in a format that is understandable by both humans and machines, which facilitates seamless sharing and integration directly into machine learning pipelines. SURF files are designed to be interoperable, easily imported into relational databases, and convertible into other formats. This complements existing initiatives like the Open Reaction Database (ORD) and Unified Data Model (UDM). At Roche, SURF plays a crucial role in democratizing FAIR reaction data sharing and expediting the chemical synthesis process.
利用日益丰富的化学反应数据可以促进合成规划并提高成功率。然而,机器学习工具在逆合成规划和正向反应预测方面的有效性依赖于以结构化格式呈现的可获取、精心整理的数据。尽管存在一些公共和许可的反应数据库,但它们往往缺乏有关反应条件的基本信息。为了解决这个问题并推广可查找、可获取、可互操作和可重复使用(FAIR)的数据报告和共享原则,我们引入了简单用户友好反应格式(SURF)。SURF通过结构化表格格式对反应数据的文档进行标准化,只需要对电子表格有基本的了解。这种格式使化学家能够以人类和机器都能理解的格式记录分子的合成,这有助于无缝共享并直接集成到机器学习管道中。SURF文件设计为可互操作的,易于导入关系数据库,并可转换为其他格式。这补充了诸如开放反应数据库(ORD)和统一数据模型(UDM)等现有计划。在罗氏公司,SURF在实现FAIR反应数据共享的民主化和加快化学合成过程方面发挥着关键作用。