Pasquini Marta, Stenta Marco
Syngenta Crop Protection AG, Schaffhauserstrasse, 4332, Stein, AG, Switzerland.
J Cheminform. 2023 Apr 1;15(1):41. doi: 10.1186/s13321-023-00714-y.
The increasing amount of chemical reaction data makes traditional ways to navigate its corpus less effective, while the demand for novel approaches and instruments is rising. Recent data science and machine learning techniques support the development of new ways to extract value from the available reaction data. On the one side, Computer-Aided Synthesis Planning tools can predict synthetic routes in a model-driven approach; on the other side, experimental routes can be extracted from the Network of Organic Chemistry, in which reaction data are linked in a network. In this context, the need to combine, compare and analyze synthetic routes generated by different sources arises naturally.
Here we present LinChemIn, a python toolkit that allows chemoinformatics operations on synthetic routes and reaction networks. Wrapping some third-party packages for handling graph arithmetic and chemoinformatics and implementing new data models and functionalities, LinChemIn allows the interconversion between data formats and data models and enables route-level analysis and operations, including route comparison and descriptors calculation. Object-Oriented Design principles inspire the software architecture, and the modules are structured to maximize code reusability and support code testing and refactoring. The code structure should facilitate external contributions, thus encouraging open and collaborative software development.
The current version of LinChemIn allows users to combine synthetic routes generated from various tools and analyze them, and constitutes an open and extensible framework capable of incorporating contributions from the community and fostering scientific discussion. Our roadmap envisages the development of sophisticated metrics for routes evaluation, a multi-parameter scoring system, and the implementation of an entire "ecosystem" of functionalities operating on synthetic routes. LinChemIn is freely available at https://github.com/syngenta/linchemin.
化学反应数据量的不断增加使得传统的浏览其语料库的方法效果越来越差,而对新方法和工具的需求却在不断上升。最近的数据科学和机器学习技术支持开发从现有反应数据中提取价值的新方法。一方面,计算机辅助合成规划工具可以通过模型驱动的方法预测合成路线;另一方面,可以从有机化学网络中提取实验路线,其中反应数据在网络中相互关联。在这种背景下,自然就产生了对不同来源生成的合成路线进行组合、比较和分析的需求。
在此,我们展示了LinChemIn,这是一个Python工具包,可对合成路线和反应网络进行化学信息学操作。通过包装一些用于处理图算法和化学信息学的第三方软件包,并实现新的数据模型和功能,LinChemIn允许数据格式和数据模型之间的相互转换,并支持路线级分析和操作,包括路线比较和描述符计算。面向对象设计原则启发了软件架构,模块的结构设计旨在最大限度地提高代码可重用性,并支持代码测试和重构。代码结构应便于外部贡献,从而鼓励开放和协作的软件开发。
当前版本的LinChemIn允许用户组合从各种工具生成的合成路线并进行分析,构成了一个开放且可扩展的框架,能够纳入社区的贡献并促进科学讨论。我们的路线图设想开发用于路线评估的复杂指标、多参数评分系统,以及实现对合成路线进行操作的整个功能“生态系统”。LinChemIn可在https://github.com/syngenta/linchemin上免费获取。