National Institute of Standards and Technology, Charleston, South Carolina 29412, United States.
Bioinformatics Research Group, University of Applied Sciences Upper Austria, Softwarepark 11, 4232 Hagenberg, Austria.
J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.
In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.
近年来,机器学习在对质谱数据的多个方面进行建模方面取得了广泛的进展。我们将蛋白质组学数据生成器、存储库管理员和机器学习专家召集到一个研讨会上,目的是评估和探索机器学习在多维基于质谱的蛋白质组学分析数据的实际建模中的应用,无论该数据来自何种样本或生物体。按照从样本到数据的路线图有助于确定知识差距和定义需求。能够生成定制和真实的合成数据在系统适用性、方法开发和算法基准测试中具有合法且重要的用途,同时也带来了关键的伦理问题。研讨会的跨学科性质为讨论当前的可能性和未来的机遇和挑战提供了信息。在接下来的视角中,我们总结了这些讨论,希望能传达我们对机器学习在蛋白质组学中的潜力的兴奋之情,并激发未来的研究。