Kengkanna Apakorn, Ohue Masahito
Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan.
Commun Chem. 2024 Apr 5;7(1):74. doi: 10.1038/s42004-024-01155-w.
Graph Neural Networks (GNNs) excel in compound property and activity prediction, but the choice of molecular graph representations significantly influences model learning and interpretation. While atom-level molecular graphs resemble natural topology, they overlook key substructures or functional groups and their interpretation partially aligns with chemical intuition. Recent research suggests alternative representations using reduced molecular graphs to integrate higher-level chemical information and leverages both representations for model. However, there is a lack of studies about applicability and impact of different molecular graphs on model learning and interpretation. Here, we introduce MMGX (Multiple Molecular Graph eXplainable discovery), investigating the effects of multiple molecular graphs, including Atom, Pharmacophore, JunctionTree, and FunctionalGroup, on model learning and interpretation with various perspectives. Our findings indicate that multiple graphs relatively improve model performance, but in varying degrees depending on datasets. Interpretation from multiple graphs in different views provides more comprehensive features and potential substructures consistent with background knowledge. These results help to understand model decisions and offer valuable insights for subsequent tasks. The concept of multiple molecular graph representations and diverse interpretation perspectives has broad applicability across tasks, architectures, and explanation techniques, enhancing model learning and interpretation for relevant applications in drug discovery.
图神经网络(GNNs)在化合物性质和活性预测方面表现出色,但分子图表示的选择对模型学习和解释有显著影响。虽然原子级分子图类似于自然拓扑结构,但它们忽略了关键子结构或官能团,其解释与化学直觉部分一致。最近的研究提出使用简化分子图的替代表示来整合更高层次的化学信息,并将这两种表示用于模型。然而,关于不同分子图对模型学习和解释的适用性及影响的研究尚少。在此,我们引入MMGX(多分子图可解释发现),从多个角度研究包括原子、药效团、连接树和官能团在内的多种分子图对模型学习和解释的影响。我们的研究结果表明,多种图相对提高了模型性能,但根据数据集的不同程度各异。从不同视角对多种图的解释提供了更全面的特征和与背景知识一致的潜在子结构。这些结果有助于理解模型决策,并为后续任务提供有价值的见解。多分子图表示和多样解释视角的概念在跨任务、架构和解释技术方面具有广泛适用性,可增强药物发现相关应用中的模型学习和解释。