Zhan Zhaohui, Wang Lusheng
Department of Engineering, Shenzhen MSU-BIT University, Shenzhen, 518172, China.
Department of Computer Science, City University of Hong Kong, Hong Kong, 999077, China.
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf007.
Proteoforms are the different forms of a proteins generated from the genome with various sequence variations, splice isoforms, and post-translational modifications. Proteoforms regulate protein structures and functions. A single protein can have multiple proteoforms due to different modification sites. Proteoform identification is to find proteoforms of a given protein that best fits the input spectrum. Proteoform quantification is to find the corresponding abundances of different proteoforms for a specific protein.
We proposed algorithms for proteoform identification and quantification based on the top-down tandem mass spectrum. In the combination alignments of the HomMTM spectrum and the reference protein, we need to give a correction of the mass for each matched peak within the pre-defined error range. After the correction, we impose that the mass between any two (not necessarily consecutive) matched nodes in the protein is identical to that of the corresponding two matched peaks in the HomMTM spectrum. We design a back-tracking graph to store such kind of information and find a combinatorial path (k paths) with the minimum sum of peak intensity error in this back-tracking graph. The obtained alignment can also show the relative abundance of these proteoforms (paths). Our experimental results demonstrate the algorithm's capability to identify and quantify proteoform combinations encompassing a greater number of peaks. This advancement holds promise for enhancing the accuracy and comprehensiveness of proteoform quantification, addressing a crucial need in the field of top-down MS-based proteomics.
The software package are available at https://github.com/Zeirdo/TopMGQuant.
蛋白质异构体是由基因组产生的具有各种序列变异、剪接异构体和翻译后修饰的蛋白质的不同形式。蛋白质异构体调节蛋白质的结构和功能。由于修饰位点不同,单个蛋白质可以有多种蛋白质异构体。蛋白质异构体鉴定是找到与输入质谱最匹配的给定蛋白质的蛋白质异构体。蛋白质异构体定量是找到特定蛋白质的不同蛋白质异构体的相应丰度。
我们提出了基于自上而下串联质谱的蛋白质异构体鉴定和定量算法。在同源多电荷质谱(HomMTM)谱与参考蛋白质的组合比对中,我们需要在预定义的误差范围内对每个匹配峰的质量进行校正。校正后,我们规定蛋白质中任意两个(不一定连续)匹配节点之间的质量与HomMTM谱中相应的两个匹配峰的质量相同。我们设计了一个回溯图来存储此类信息,并在该回溯图中找到峰强度误差总和最小的组合路径(k条路径)。得到的比对结果还可以显示这些蛋白质异构体(路径)的相对丰度。我们的实验结果证明了该算法识别和定量包含更多峰的蛋白质异构体组合的能力。这一进展有望提高蛋白质异构体定量的准确性和全面性,满足基于自上而下质谱的蛋白质组学领域的关键需求。