Institute of Molecular Systems Biology, Department of Biology, ETH Zürich, Zürich, Switzerland.
Eawag, Dübendorf, Switzerland.
Nat Methods. 2022 Jul;19(7):865-870. doi: 10.1038/s41592-022-01486-3. Epub 2022 May 30.
Current methods for structure elucidation of small molecules rely on finding similarity with spectra of known compounds, but do not predict structures de novo for unknown compound classes. We present MSNovelist, which combines fingerprint prediction with an encoder-decoder neural network to generate structures de novo solely from tandem mass spectrometry (MS) spectra. In an evaluation with 3,863 MS spectra from the Global Natural Product Social Molecular Networking site, MSNovelist predicted 25% of structures correctly on first rank, retrieved 45% of structures overall and reproduced 61% of correct database annotations, without having ever seen the structure in the training phase. Similarly, for the CASMI 2016 challenge, MSNovelist correctly predicted 26% and retrieved 57% of structures, recovering 64% of correct database annotations. Finally, we illustrate the application of MSNovelist in a bryophyte MS dataset, in which de novo structure prediction substantially outscored the best database candidate for seven spectra. MSNovelist is ideally suited to complement library-based annotation in the case of poorly represented analyte classes and novel compounds.
当前用于小分子结构阐明的方法依赖于找到与已知化合物光谱的相似性,但不能为未知化合物类别预测新的结构。我们提出了 MSNovelist,它将指纹预测与编码器-解码器神经网络相结合,仅从串联质谱 (MS) 光谱中从头生成结构。在对来自全球天然产物社会分子网络站点的 3863 个 MS 光谱进行的评估中,MSNovelist 在第一级正确预测了 25%的结构,总体上检索到 45%的结构,并再现了 61%的正确数据库注释,而在训练阶段从未见过结构。同样,对于 CASMI 2016 挑战赛,MSNovelist 正确预测了 26%的结构,检索到了 57%的结构,恢复了 64%的正确数据库注释。最后,我们在一个苔藓 MS 数据集的应用中说明了 MSNovelist 的应用,其中从头预测结构在七个光谱中大大超过了最佳数据库候选物。在代表不良的分析物类别和新型化合物的情况下,MSNovelist 非常适合补充基于库的注释。