Nguyen Julia, Overstreet Richard, King Ethan, Ciesielski Danielle
Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States.
Signature Science and Technology Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States.
J Am Soc Mass Spectrom. 2024 Oct 2;35(10):2256-2266. doi: 10.1021/jasms.4c00154. Epub 2024 Sep 11.
Tandem mass spectrometry (MS/MS) is an important tool for the identification of small molecules and metabolites where resultant spectra are most commonly identified by matching them with spectra in MS/MS reference libraries. While popular, this strategy is limited by the contents of existing reference libraries. In response to this limitation, various methods are being developed for the generation of spectra to augment existing libraries. Recently, machine learning and deep learning techniques have been applied to predict spectra with greater speed and accuracy. Here, we investigate the challenges these algorithms face in achieving fast and accurate predictions on a wide range of small molecules. The challenges are often amplified by the use of generic machine learning benchmarking tactics, which lead to misleading accuracy scores. Curating data sets, only predicting spectra for sufficiently high collision energies, and working more closely with experimental mass spectrometrists are recommended strategies to improve overall prediction accuracy in this nuanced field.
串联质谱法(MS/MS)是鉴定小分子和代谢物的重要工具,所得光谱最常见的鉴定方法是将其与MS/MS参考库中的光谱进行匹配。虽然这种方法很流行,但它受到现有参考库内容的限制。为应对这一限制,人们正在开发各种生成光谱的方法以扩充现有库。最近,机器学习和深度学习技术已被应用于以更高的速度和准确性预测光谱。在此,我们研究了这些算法在对广泛的小分子进行快速准确预测时所面临的挑战。使用通用的机器学习基准测试策略往往会放大这些挑战,从而导致误导性的准确性分数。精心策划数据集、仅对足够高的碰撞能量预测光谱以及与实验质谱专家更紧密合作是在这个细微领域提高整体预测准确性的推荐策略。