Brungs Corinna, Schmid Robin, Heuckeroth Steffen, Mazumdar Aninda, Drexler Matúš, Šácha Pavel, Dorrestein Pieter C, Petras Daniel, Nothias Louis-Felix, Veverka Václav, Nencka Radim, Kameník Zdeněk, Pluskal Tomáš
Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czechia.
Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Vienna, Austria.
Nat Methods. 2025 Sep 15. doi: 10.1038/s41592-025-02813-0.
Untargeted high-resolution mass spectrometry is a key tool in clinical metabolomics, natural product discovery and exposomics, with compound identification remaining the major bottleneck. Currently, the standard workflow applies spectral library matching against tandem mass spectrometry (MS) fragmentation data. Multi-stage fragmentation (MS) yields more profound insights into substructures, enabling validation of fragmentation pathways; however, the community lacks open MS reference data of diverse natural products and other chemicals. Here we describe MSLib, a machine learning-ready open resource of >2 million spectra in MS trees of 30,008 unique small molecules, built with a high-throughput data acquisition and processing pipeline in the open-source software mzmine.
非靶向高分辨率质谱是临床代谢组学、天然产物发现和暴露组学中的关键工具,化合物鉴定仍然是主要瓶颈。目前,标准工作流程是将光谱库与串联质谱(MS)碎片数据进行匹配。多级碎裂(MS)能更深入地了解子结构,有助于验证碎裂途径;然而,该领域缺乏各种天然产物和其他化学品的开放MS参考数据。在此,我们描述了MSLib,这是一个可供机器学习使用的开放资源,包含30,008种独特小分子的MS树中的200多万个光谱,通过开源软件mzmine中的高通量数据采集和处理管道构建而成。