Department of Mathematics, Michigan State University, MI, 48824, USA.
Department of Mathematics, University of Kentucky, KY, 40506, USA.
Comput Biol Med. 2021 Jul;134:104460. doi: 10.1016/j.compbiomed.2021.104460. Epub 2021 May 12.
While automated feature extraction has had tremendous success in many deep learning algorithms for image analysis and natural language processing, it does not work well for data involving complex internal structures, such as molecules. Data representations via advanced mathematics, including algebraic topology, differential geometry, and graph theory, have demonstrated superiority in a variety of biomolecular applications, however, their performance is often dependent on manual parametrization. This work introduces the auto-parametrized weighted element-specific graph neural network, dubbed AweGNN, to overcome the obstacle of this tedious parametrization process while also being a suitable technique for automated feature extraction on these internally complex biomolecular data sets. The AweGNN is a neural network model based on geometric-graph features of element-pair interactions, with its graph parameters being updated throughout the training, which results in what we call a network-enabled automatic representation (NEAR). To enhance the predictions with small data sets, we construct multi-task (MT) AweGNN models in addition to single-task (ST) AweGNN models. The proposed methods are applied to various benchmark data sets, including four data sets for quantitative toxicity analysis and another data set for solvation prediction. Extensive numerical tests show that AweGNN models can achieve state-of-the-art performance in molecular property predictions.
虽然自动化特征提取在图像分析和自然语言处理的许多深度学习算法中取得了巨大成功,但它不适用于涉及复杂内部结构的数据,例如分子。通过高级数学表示的数据,包括代数拓扑、微分几何和图论,在各种生物分子应用中表现出优越性,但是,它们的性能通常取决于手动参数化。这项工作引入了自动参数化加权元素特定图神经网络(AweGNN),以克服这个繁琐的参数化过程的障碍,同时也是一种适用于这些内部复杂生物分子数据集的自动化特征提取的技术。AweGNN 是一种基于元素对相互作用的几何图特征的神经网络模型,其图参数在整个训练过程中不断更新,这导致了我们所谓的网络启用自动表示(NEAR)。为了增强小数据集的预测能力,我们构建了多任务(MT)AweGNN 模型,除了单任务(ST)AweGNN 模型。所提出的方法应用于各种基准数据集,包括四个用于定量毒性分析的数据集和另一个用于溶剂化预测的数据集。广泛的数值测试表明,AweGNN 模型可以在分子性质预测中达到最先进的性能。