Hasebe Tatsuya
Research & Development Group, Hitachi, Ltd., 832-2, Horiguchi, Hitachinaka, Ibaraki 312-0034, Japan.
ACS Omega. 2021 Oct 14;6(42):27955-27967. doi: 10.1021/acsomega.1c03839. eCollection 2021 Oct 26.
The graph neural network (GNN) has become a promising method to predict molecular properties with end-to-end supervision, as it can learn molecular features directly from chemical graphs in a black-box manner. However, to achieve high prediction accuracy, it is essential to supervise a huge amount of property data, which is often accompanied by a high property experiment cost. Prior to the deep learning method, descriptor-based quantitative structure-property relationships (QSPR) studies have investigated physical and chemical knowledge to manually design descriptors for effectively predicting properties. In this study, we extend a message-passing neural network (MPNN) to include a novel MPNN architecture called the knowledge-embedded MPNN (KEMPNN) that can be supervised together with nonquantitative knowledge annotations by human experts on a chemical graph that contains information on the important substructure of a molecule and its effect on the target property (e.g., positive or negative effect). We evaluated the performance of the KEMPNN in a small training data setting using a physical chemistry dataset in MoleculeNet (ESOL, FreeSolv, Lipophilicity) and a polymer property (glass-transition temperature) dataset with virtual knowledge annotations. The results demonstrate that the KEMPNN with knowledge supervision can improve the prediction accuracy obtained from the MPNN. The results also demonstrate that the accuracy of the KEMPNN is better than or comparable to those of descriptor-based methods even in the case of small training data.
图神经网络(GNN)已成为一种很有前景的方法,可通过端到端监督来预测分子性质,因为它能够以黑箱方式直接从化学图中学习分子特征。然而,为了实现高预测精度,监督大量的性质数据至关重要,而这通常伴随着高昂的性质实验成本。在深度学习方法出现之前,基于描述符的定量结构-性质关系(QSPR)研究已经探究了物理和化学知识,以手动设计描述符来有效预测性质。在本研究中,我们扩展了消息传递神经网络(MPNN),纳入了一种名为知识嵌入MPNN(KEMPNN)的新型MPNN架构,该架构可以与人类专家对包含分子重要子结构信息及其对目标性质影响(例如,正向或负向影响)的化学图的非定量知识注释一起进行监督。我们使用MoleculeNet中的物理化学数据集(ESOL、FreeSolv、亲脂性)和具有虚拟知识注释的聚合物性质(玻璃化转变温度)数据集,在小训练数据设置下评估了KEMPNN的性能。结果表明,具有知识监督的KEMPNN可以提高从MPNN获得的预测精度。结果还表明,即使在小训练数据的情况下,KEMPNN的准确性也优于基于描述符的方法或与之相当。