一种数据驱动的生成策略，用于避免多目标分子设计中的奖励操纵。

A data-driven generative strategy to avoid reward hacking in multi-objective molecular design.

作者信息

Yoshizawa Tatsuya, Ishida Shoichi, Sato Tomohiro, Ohta Masateru, Honma Teruki, Terayama Kei

机构信息

Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Kanagawa, Japan.

RIKEN Center for Biosystems Dynamics Research, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Kanagawa, Japan.

出版信息

Nat Commun. 2025 Mar 11;16(1):2409. doi: 10.1038/s41467-025-57582-3.

DOI:10.1038/s41467-025-57582-3

PMID:40069140

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11897179/

Abstract

Molecular design using data-driven generative models has emerged as a promising technology, impacting various fields such as drug discovery and the development of functional materials. However, this approach is often susceptible to optimization failure due to reward hacking, where prediction models fail to extrapolate, i.e., fail to accurately predict properties for designed molecules that considerably deviate from the training data. While methods for estimating prediction reliability, such as the applicability domain (AD), have been used for mitigating reward hacking, multi-objective optimization makes it challenging. The difficulty arises from the need to determine in advance whether the multiple ADs with some reliability levels overlap in chemical space, and to appropriately adjust the reliability levels for each property prediction. Herein, we propose a reliable design framework to perform multi-objective optimization using generative models while preventing reward hacking. To demonstrate the effectiveness of the proposed framework, we designed candidates for anticancer drugs as a typical example of multi-objective optimization. We successfully designed molecules with high predicted values and reliabilities, including an approved drug. In addition, the reliability levels can be automatically adjusted according to the property prioritization specified by the user without any detailed settings.

摘要

利用数据驱动生成模型的分子设计已成为一项很有前景的技术，影响着药物发现和功能材料开发等各个领域。然而，由于奖励操纵，这种方法往往容易出现优化失败的情况，即预测模型无法外推，也就是说，无法准确预测与训练数据有很大偏差的设计分子的性质。虽然诸如适用域（AD）等估计预测可靠性的方法已被用于减轻奖励操纵，但多目标优化使其具有挑战性。困难在于需要提前确定具有一定可靠性水平的多个适用域在化学空间中是否重叠，并为每个性质预测适当调整可靠性水平。在此，我们提出了一个可靠的设计框架，以在防止奖励操纵的同时使用生成模型进行多目标优化。为了证明所提出框架的有效性，我们将抗癌药物的候选物设计作为多目标优化的一个典型例子。我们成功地设计出了具有高预测值和可靠性的分子，包括一种已获批的药物。此外，可靠性水平可以根据用户指定的性质优先级自动调整，而无需任何详细设置。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种数据驱动的生成策略，用于避免多目标分子设计中的奖励操纵。

A data-driven generative strategy to avoid reward hacking in multi-objective molecular design.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

一种数据驱动的生成策略，用于避免多目标分子设计中的奖励操纵。

A data-driven generative strategy to avoid reward hacking in multi-objective molecular design.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献