Xu Xiaofang, Yang Chunde, He Qiang, Shu Kunxian, Xinpu Yuan, Chen Zhiguang, Zhu Yunping, Chen Tao
The School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, Victoria 3122, Australia.
Bioinform Adv. 2023 Apr 25;3(1):vbad057. doi: 10.1093/bioadv/vbad057. eCollection 2023.
peptide sequencing for tandem mass spectrometry data is not only a key technology for novel peptide identification, but also a precedent task for many downstream tasks, such as vaccine and antibody studies. In recent years, neural network models for peptide sequencing have manifested a remarkable ability to accommodate various data sources and outperformed conventional peptide identification tools. However, the excellent model is computationally expensive, taking up to 1 week to process about 400 000 spectrums. This article presents PGPointNovo, a novel neural network-based tool for parallel peptide sequencing. PGPointNovo uses data parallelization technology to accelerate training and inference and optimizes the training obstacles caused by large batch sizes. The results of extensive experiments conducted on multiple datasets of different sizes demonstrate that compared with PointNovo the excellent neural network-based peptide sequencing tool, PGPointNovo, accelerates peptide sequencing by up to 7.35× without precision or recall compromises.
The source code and the parameter settings are available at https://github.com/shallFun4Learning/PGPointNovo.
Supplementary data are available at online.
串联质谱数据的肽段测序不仅是鉴定新型肽段的关键技术,也是许多下游任务(如疫苗和抗体研究)的前置任务。近年来,用于肽段测序的神经网络模型已展现出整合各种数据源的卓越能力,且性能优于传统的肽段鉴定工具。然而,性能优异的模型计算成本高昂,处理约400000个谱图耗时长达1周。本文介绍了PGPointNovo,一种基于神经网络的新型并行肽段测序工具。PGPointNovo采用数据并行化技术加速训练和推理,并优化了由大批量数据导致的训练障碍。在多个不同规模数据集上进行的大量实验结果表明,与基于神经网络的优秀肽段测序工具PointNovo相比,PGPointNovo在不损失精度或召回率的情况下,将肽段测序速度提高了7.35倍。
源代码和参数设置可在https://github.com/shallFun4Learning/PGPointNovo获取。
补充数据可在线获取。