Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.
Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium.
J Proteome Res. 2023 Feb 3;22(2):585-593. doi: 10.1021/acs.jproteome.2c00616. Epub 2023 Jan 23.
A key analysis task in mass spectrometry proteomics is matching the acquired tandem mass spectra to their originating peptides by sequence database searching or spectral library searching. Machine learning is an increasingly popular postprocessing approach to maximize the number of confident spectrum identifications that can be obtained at a given false discovery rate threshold. Here, we have integrated semisupervised machine learning in the ANN-SoLo tool, an efficient spectral library search engine that is optimized for open modification searching to identify peptides with any type of post-translational modification. We show that machine learning rescoring boosts the number of spectra that can be identified for both standard searching and open searching, and we provide insights into relevant spectrum characteristics harnessed by the machine learning model. The semisupervised machine learning functionality has now been fully integrated into ANN-SoLo, which is available as open source under the permissive Apache 2.0 license on GitHub at https://github.com/bittremieux/ANN-SoLo.
质谱蛋白质组学中的一个关键分析任务是通过序列数据库搜索或光谱库搜索将获得的串联质谱与它们的原始肽匹配。机器学习是一种越来越流行的后处理方法,可最大限度地提高在给定错误发现率阈值下可以获得的可信谱识别数量。在这里,我们在 ANN-SoLo 工具中集成了半监督机器学习,该工具是一种高效的光谱库搜索引擎,针对开放修饰搜索进行了优化,可识别具有任何类型翻译后修饰的肽。我们表明,机器学习重新评分可提高标准搜索和开放搜索可识别的光谱数量,并深入了解机器学习模型利用的相关光谱特征。半监督机器学习功能现已完全集成到 ANN-SoLo 中,该工具可在 GitHub 上的 Apache 2.0 许可下作为开源使用,网址为 https://github.com/bittremieux/ANN-SoLo。