Signature Science and Technology Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States.
Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States.
J Chem Inf Model. 2024 Aug 12;64(15):5806-5816. doi: 10.1021/acs.jcim.4c00446. Epub 2024 Jul 16.
Predicting the mass spectrum of a molecular ion is often accomplished via three generalized approaches: rules-based methods for bond breaking, deep learning, or quantum chemical (QC) modeling. Rules-based approaches are often limited by the conditions for different chemical subspaces and perform poorly under chemical regimes with few defined rules. QC modeling is theoretically robust but requires significant amounts of computational time to produce a spectrum for a given target. Among deep learning techniques, graph neural networks (GNNs) have performed better than previous work with fingerprint-based neural networks in mass spectra prediction. To explore this technique further, we investigate the effects of including quantum chemically derived information as edge features in the GNN to increase predictive accuracy. The models we investigated include categorical bond order, bond force constants derived from extended tight-binding (xTB) quantum chemistry, and acyclic bond dissociation energies. We evaluated these models against a control GNN with no edge features in the input graphs. Bond dissociation enthalpies yielded the best improvement with a cosine similarity score of 0.462 relative to the baseline model (0.437). In this work we also apply dynamic graph attention which improves performance on benchmark problems and supports the inclusion of edge features. Between implementations, we investigate the nature of the molecular embedding for spectra prediction and discuss the recognition of fragment topographies in distinct chemistries for further development in tandem mass spectrometry prediction.
基于规则的键断裂方法、深度学习或量子化学 (QC) 建模。基于规则的方法通常受到不同化学子空间条件的限制,并且在化学规则较少的情况下表现不佳。QC 建模在理论上是强大的,但为给定目标生成光谱需要大量的计算时间。在深度学习技术中,图神经网络 (GNN) 在质谱预测方面的表现优于基于指纹的神经网络的先前工作。为了进一步探索该技术,我们研究了将量子化学衍生信息作为 GNN 中的边特征包含在内以提高预测准确性的效果。我们研究的模型包括类别键序、从扩展紧束缚 (xTB) 量子化学得出的键力常数以及非循环键离解能。我们在输入图中没有边特征的对照 GNN 上评估了这些模型。与基线模型(0.437)相比,键离解焓的余弦相似性评分提高了 0.462,取得了最佳改进。在这项工作中,我们还应用了动态图注意力,这提高了基准问题的性能,并支持边特征的包含。在实现之间,我们研究了用于光谱预测的分子嵌入的性质,并讨论了在不同化学中识别碎片拓扑结构,以进一步开发串联质谱预测。