探索基于图的分子生成中的图遍历算法。

Exploring Graph Traversal Algorithms in Graph-Based Molecular Generation.

机构信息

Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg Pepparedsleden 1, 431 50 Mölndal, Sweden.

Department of Computer Science and Engineering, Chalmers University of Technology Chalmersplatsen 4, 412 96 Gothenburg, Sweden.

出版信息

J Chem Inf Model. 2022 May 9;62(9):2093-2100. doi: 10.1021/acs.jcim.1c00777. Epub 2021 Nov 10.

DOI:10.1021/acs.jcim.1c00777

PMID:34757744

Abstract

Here, we explore the impact of different graph traversal algorithms on molecular graph generation. We do this by training a graph-based deep molecular generative model to build structures using a node order determined via either a breadth- or depth-first search algorithm. What we observe is that using a breadth-first traversal leads to better coverage of training data features compared to a depth-first traversal. We have quantified these differences using a variety of metrics on a data set of natural products. These metrics include percent validity, molecular coverage, and molecular shape. We also observe that by using either a breadth- or depth-first traversal it is possible to overtrain the generative models, at which point the results with either graph traversal algorithm are identical.

摘要

在这里，我们探讨了不同图遍历算法对分子图生成的影响。我们通过训练一个基于图的深度分子生成模型来构建结构，该模型使用通过广度优先或深度优先搜索算法确定的节点顺序。我们观察到，与深度优先遍历相比，使用广度优先遍历可以更好地覆盖训练数据特征。我们在天然产物数据集上使用各种指标对这些差异进行了量化。这些指标包括有效百分比、分子覆盖率和分子形状。我们还观察到，无论是使用广度优先遍历还是深度优先遍历，都有可能过度训练生成模型，在这种情况下，两种图遍历算法的结果是相同的。