Wang Huijia, Zhu Guangxian, Izu Leighton T, Chen-Izu Ye, Ono Naoaki, Altaf-Ul-Amin M D, Kanaya Shigehiko, Huang Ming
Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.
Department of Pharmacology, University of California, Davis, CA, United States.
Front Physiol. 2023 May 9;14:1156286. doi: 10.3389/fphys.2023.1156286. eCollection 2023.
Given the direct association with malignant ventricular arrhythmias, cardiotoxicity is a major concern in drug design. In the past decades, computational models based on the quantitative structure-activity relationship have been proposed to screen out cardiotoxic compounds and have shown promising results. The combination of molecular fingerprint and the machine learning model shows stable performance for a wide spectrum of problems; however, not long after the advent of the graph neural network (GNN) deep learning model and its variant (e.g., graph transformer), it has become the principal way of quantitative structure-activity relationship-based modeling for its high flexibility in feature extraction and decision rule generation. Despite all these progresses, the expressiveness (the ability of a program to identify non-isomorphic graph structures) of the GNN model is bounded by the WL isomorphism test, and a suitable thresholding scheme that relates directly to the sensitivity and credibility of a model is still an open question. In this research, we further improved the expressiveness of the GNN model by introducing the substructure-aware bias by the graph subgraph transformer network model. Moreover, to propose the most appropriate thresholding scheme, a comprehensive comparison of the thresholding schemes was conducted. Based on these improvements, the best model attains performance with 90.4% precision, 90.4% recall, and 90.5% F1-score with a dual-threshold scheme (active: ; non-active: ). The improved pipeline (graph subgraph transformer network model and thresholding scheme) also shows its advantages in terms of the activity cliff problem and model interpretability.
鉴于与恶性室性心律失常的直接关联,心脏毒性是药物设计中的一个主要关注点。在过去几十年中,基于定量构效关系的计算模型已被提出用于筛选出具有心脏毒性的化合物,并显示出了有前景的结果。分子指纹与机器学习模型的结合在广泛的问题上表现出稳定的性能;然而,在图神经网络(GNN)深度学习模型及其变体(如图形变换器)出现后不久,由于其在特征提取和决策规则生成方面的高度灵活性,它已成为基于定量构效关系建模的主要方式。尽管取得了所有这些进展,但GNN模型的表达能力(程序识别非同构图结构的能力)受限于WL同构测试,并且直接与模型的敏感性和可信度相关的合适阈值方案仍然是一个悬而未决的问题。在本研究中,我们通过图子图变换器网络模型引入子结构感知偏差,进一步提高了GNN模型的表达能力。此外,为了提出最合适的阈值方案,我们对阈值方案进行了全面比较。基于这些改进,最佳模型采用双阈值方案(活性:;非活性:)时,精确率达到90.4%,召回率达到90.4%,F1分数达到90.5%。改进后的流程(图子图变换器网络模型和阈值方案)在活性悬崖问题和模型可解释性方面也显示出了优势。