Suppr超能文献

用于化合物组合优化的贝叶斯优化中的初始样本选择

Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical Compounds.

作者信息

Morishita Toshiharu, Kaneko Hiromasa

机构信息

Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa214-8571, Japan.

出版信息

ACS Omega. 2022 Dec 30;8(2):2001-2009. doi: 10.1021/acsomega.2c05145. eCollection 2023 Jan 17.

Abstract

An efficient search for optimal solutions in Bayesian optimization (BO) entails providing appropriate initial samples when building a Gaussian process regression model. For general experimental designs without compounds or molecular descriptors in explanatory variable , selecting initial samples with a larger D-optimality allows little correlation between in the selected samples, which leads to effective regression model building. However, in the case of experimental designs with compounds, a high correlation always exists between molecular descriptors calculated from chemical structures, and compounds with similar structures form clusters in the chemical space. Therefore, selecting the initial samples uniformly from each cluster is desirable for obtaining initial samples with maximum information on experimental conditions. As D-optimality does not work well with highly correlated molecular descriptors and does not consider information on clusters in sample selection, we propose an initial sample selection method based on clustering and apply it to the optimization of coupling reaction conditions with BO. We confirm that the proposed method reaches the optimal solution with up to 5% fewer experiments than random sampling or sampling based on D-optimality. This study makes a contribution to the initial sample selection method for BO, and we are convinced that the proposed method improves the search performance of BO in various fields of science and technology if initial samples can be determined using cluster information appropriately formed by utilizing domain knowledge.

摘要

在贝叶斯优化(BO)中高效搜索最优解需要在构建高斯过程回归模型时提供合适的初始样本。对于解释变量中没有化合物或分子描述符的一般实验设计,选择具有较大D - 最优性的初始样本可使所选样本中的变量之间相关性较小,这有助于有效构建回归模型。然而,在有化合物的实验设计中,从化学结构计算出的分子描述符之间总是存在高度相关性,并且结构相似的化合物在化学空间中形成簇。因此,从每个簇中均匀选择初始样本有助于获得关于实验条件的最大信息的初始样本。由于D - 最优性在处理高度相关的分子描述符时效果不佳,且在样本选择中未考虑簇的信息,我们提出了一种基于聚类的初始样本选择方法,并将其应用于通过BO优化偶联反应条件。我们证实,与随机抽样或基于D - 最优性的抽样相比,所提出的方法在实验次数少至多5%的情况下就能找到最优解。本研究为BO的初始样本选择方法做出了贡献,并且我们相信,如果能够利用领域知识适当地形成簇信息来确定初始样本,所提出的方法将提高BO在各种科学技术领域的搜索性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/602a/9850731/d7be60a4f00c/ao2c05145_0002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验