Suppr超能文献

数据驱动的方法用于识别多步骤反合成中的超参数。

Data-driven approaches for identifying hyperparameters in multi-step retrosynthesis.

机构信息

Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.

Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, UiT The Arctic University of Norway, N9037, Tromsø, Norway.

出版信息

Mol Inform. 2023 Nov;42(11):e202300128. doi: 10.1002/minf.202300128. Epub 2023 Sep 27.

Abstract

The multi-step retrosynthesis problem can be solved by a search algorithm, such as Monte Carlo tree search (MCTS). The performance of multistep retrosynthesis, as measured by a trade-off in search time and route solvability, therefore depends on the hyperparameters of the search algorithm. In this paper, we demonstrated the effect of three MCTS hyperparameters (number of iterations, tree depth, and tree width) on metrics such as Linear integrated speed-accuracy score (LISAS) and Inverse efficiency score which consider both route solvability and search time. This exploration was conducted by employing three data-driven approaches, namely a systematic grid search, Bayesian optimization over an ensemble of molecules to obtain static MCTS hyperparameters, and a machine learning approach to dynamically predict optimal MCTS hyperparameters given an input target molecule. With the obtained results on the internal dataset, we demonstrated that it is possible to identify a hyperparameter set which outperforms the current AiZynthFinder default setting. It appeared optimal across a variety of target input molecules, both on proprietary and public datasets. The settings identified with the in-house dataset reached a solvability of 93 % and median search time of 151 s for the in-house dataset, and a 74 % solvability and 114 s for the ChEMBL dataset. These numbers can be compared to the current default settings which solved 85 % and 73 % during a median time of 110s and 84 s, for in-house and ChEMBL, respectively.

摘要

多步骤反合成问题可以通过搜索算法来解决,例如蒙特卡罗树搜索 (MCTS)。多步骤反合成的性能,通过搜索时间和路线可解决性之间的权衡来衡量,因此取决于搜索算法的超参数。在本文中,我们展示了三种 MCTS 超参数(迭代次数、树深度和树宽度)对线性综合速度准确性得分 (LISAS) 和反向效率得分等指标的影响,这些指标同时考虑了路线可解决性和搜索时间。这种探索是通过采用三种数据驱动的方法来进行的,即系统网格搜索、基于分子集合的贝叶斯优化以获得静态 MCTS 超参数,以及机器学习方法来动态预测给定输入目标分子的最佳 MCTS 超参数。通过在内部数据集上获得的结果,我们证明了有可能确定一组超参数,该超参数的性能优于当前的 AiZynthFinder 默认设置。在各种目标输入分子上,无论是在专有数据集还是公共数据集上,它都表现出了最佳性能。使用内部数据集确定的设置在内部数据集上的可解决性达到 93%,中位数搜索时间为 151s,在 ChEMBL 数据集上的可解决性达到 74%,中位数搜索时间为 114s。这些数字可以与当前的默认设置进行比较,当前的默认设置在中位数时间为 110s 和 84s 时分别解决了 85%和 73%的问题,分别适用于内部数据集和 ChEMBL 数据集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验