Wei Jennifer N, Belanger David, Adams Ryan P, Sculley D
Google Brain, Cambridge, Massachusetts 02142, United States.
Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States.
ACS Cent Sci. 2019 Apr 24;5(4):700-708. doi: 10.1021/acscentsci.9b00085. Epub 2019 Mar 19.
When confronted with a substance of unknown identity, researchers often perform mass spectrometry on the sample and compare the observed spectrum to a library of previously collected spectra to identify the molecule. While popular, this approach will fail to identify molecules that are not in the existing library. In response, we propose to improve the library's coverage by augmenting it with synthetic spectra that are predicted from candidate molecules using machine learning. We contribute a lightweight neural network model that quickly predicts mass spectra for small molecules, averaging 5 ms per molecule with a recall-at-10 accuracy of 91.8%. Achieving high-accuracy predictions requires a novel neural network architecture that is designed to capture typical fragmentation patterns from electron ionization. We analyze the effects of our modeling innovations on library matching performance and compare our models to prior machine-learning-based work on spectrum prediction.
当面对一种身份不明的物质时,研究人员通常会对样本进行质谱分析,并将观察到的光谱与先前收集的光谱库进行比较,以识别该分子。虽然这种方法很常用,但它无法识别不在现有库中的分子。作为回应,我们建议通过使用机器学习从候选分子预测的合成光谱来扩充库,从而提高库的覆盖范围。我们贡献了一个轻量级神经网络模型,该模型能快速预测小分子的质谱,平均每个分子预测时间为5毫秒,召回率@10准确率为91.8%。实现高精度预测需要一种新颖的神经网络架构,该架构旨在捕捉电子电离产生的典型碎片模式。我们分析了建模创新对库匹配性能的影响,并将我们的模型与先前基于机器学习的光谱预测工作进行了比较。