Digital Annealer Unit, Fujitsu Laboratories Ltd., 10-1 Morinosato-Wakamiya, Atsugi, Kanagawa, 243-0197, Japan.
Artificial Intelligence Laboratory, Fujitsu Laboratories Ltd., 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, Kanagawa, 211-8588, Japan.
Mol Inform. 2020 Jan;39(1-2):e1800155. doi: 10.1002/minf.201800155. Epub 2019 Oct 7.
Classification of the biological activities of chemical substances is important for developing new medicines efficiently. Various machine learning methods are often employed to screen large libraries of compounds and predict the activities of new substances by training the molecular structure-activity relationships. One such method is graph classification, in which a molecular structure can be represented in terms of a labeled graph with nodes that correspond to atoms and edges that correspond to the bonds between these atoms. In a conventional graph definition, atomic symbols and bond orders are employed as node and edge labels, respectively. In this study, we developed new graph definitions using the assignment of atom and bond types in the force fields of molecular dynamics methods as node and edge labels, respectively. We found that these graph definitions improved the accuracies of activity classifications for chemical substances using graph kernels with support vector machines and deep neural networks. The higher accuracies obtained using our proposed definitions can enhance the development of the materials informatics using graph-based machine learning methods.
化学物质生物活性的分类对于高效开发新药非常重要。通常采用各种机器学习方法来筛选大量化合物库,并通过训练分子结构-活性关系来预测新物质的活性。其中一种方法是图分类,其中可以用带有标记的图来表示分子结构,节点对应于原子,边对应于这些原子之间的键。在传统的图定义中,原子符号和键序分别用作节点和边的标签。在这项研究中,我们分别使用分子动力学方法力场中的原子和键类型的分配来开发新的图定义作为节点和边的标签。我们发现,这些图定义使用支持向量机和深度神经网络的图核提高了化学物质活性分类的准确性。使用我们提出的定义获得的更高准确性可以增强基于图的机器学习方法在材料信息学中的发展。