Suppr超能文献

通过使用密度泛函理论创建不同的描述符数据集来提高贝叶斯优化的搜索性能。

Enhancing the Search Performance of Bayesian Optimization by Creating Different Descriptor Datasets Using Density Functional Theory.

作者信息

Morishita Toshiharu, Kaneko Hiromasa

机构信息

Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan.

出版信息

ACS Omega. 2023 Aug 28;8(36):33032-33038. doi: 10.1021/acsomega.3c04891. eCollection 2023 Sep 12.

Abstract

Descriptors calculated from molecular structure information can be used as explanatory variables in Bayesian optimization (BO). Even though structural and descriptor information can be obtained from various databases for general compounds, information on highly confidential compounds such as pharmaceutical intermediates and active pharmaceutical ingredients cannot be retrieved from these databases. In particular, determining the stable structure and electronic state of a compound via quantum chemical calculations from descriptor information requires considerable computational time. Although descriptor information can be obtained using density functional theory (DFT), which has a relatively light computational load, only conventional combinations of basis sets and functionals can be selected before experiments instead of the best ones. Few studies have discussed these effects on the search performance of BO, and good search performance is highly dependent on the application. Therefore, we developed a method to improve the search performance of BO by using descriptors computed from several combinations of basis sets and functionals. The dataset obtained from averaging multiple descriptor sets exhibited better BO search performance than that of a single descriptor dataset. In addition, the more descriptor sets used for averaging, the better the search performance. This method has a relatively small computational load and can be easily used by those who are unfamiliar with quantum chemical calculations.

摘要

从分子结构信息计算得到的描述符可作为贝叶斯优化(BO)中的解释变量。尽管对于一般化合物,可以从各种数据库中获取结构和描述符信息,但诸如药物中间体和活性药物成分等高机密化合物的信息无法从这些数据库中检索到。特别是,通过描述符信息进行量子化学计算来确定化合物的稳定结构和电子状态需要相当长的计算时间。虽然可以使用计算量相对较小的密度泛函理论(DFT)来获取描述符信息,但在实验之前只能选择传统的基组和泛函组合,而不是最佳组合。很少有研究讨论这些因素对BO搜索性能的影响,而且良好的搜索性能高度依赖于应用。因此,我们开发了一种方法,通过使用从多个基组和泛函组合计算得到的描述符来提高BO的搜索性能。通过对多个描述符集进行平均得到的数据集比单个描述符数据集表现出更好的BO搜索性能。此外,用于平均的描述符集越多,搜索性能越好。该方法计算量相对较小,不熟悉量子化学计算的人也可以轻松使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db15/10500684/a8ba6e84b333/ao3c04891_0002.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验