College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China.
Kunming University of Science and Technology, Kunming, Yunnan, China.
Brief Bioinform. 2021 Mar 22;22(2):1729-1750. doi: 10.1093/bib/bbaa015.
Proteins are dominant executors of living processes. Compared to genetic variations, changes in the molecular structure and state of a protein (i.e. proteoforms) are more directly related to pathological changes in diseases. Characterizing proteoforms involves identifying and locating primary structure alterations (PSAs) in proteoforms, which is of practical importance for the advancement of the medical profession. With the development of mass spectrometry (MS) technology, the characterization of proteoforms based on top-down MS technology has become possible. This type of method is relatively new and faces many challenges. Since the proteoform identification is the most important process in characterizing proteoforms, we comprehensively review the existing proteoform identification methods in this study. Before identifying proteoforms, the spectra need to be preprocessed, and protein sequence databases can be filtered to speed up the identification. Therefore, we also summarize some popular deconvolution algorithms, various filtering algorithms for improving the proteoform identification performance and various scoring methods for localizing proteoforms. Moreover, commonly used methods were evaluated and compared in this review. We believe our review could help researchers better understand the current state of the development in this field and design new efficient algorithms for the proteoform characterization.
蛋白质是生命过程的主要执行者。与遗传变异相比,蛋白质的分子结构和状态变化(即蛋白质异构体)与疾病的病理变化更直接相关。蛋白质异构体的特征在于识别和定位蛋白质异构体中的一级结构改变(PSA),这对于医学的发展具有实际意义。随着质谱(MS)技术的发展,基于自上而下 MS 技术的蛋白质异构体的特征描述成为可能。这种方法相对较新,面临许多挑战。由于蛋白质异构体的鉴定是蛋白质异构体特征描述中最重要的过程,因此我们在本研究中全面回顾了现有的蛋白质异构体鉴定方法。在鉴定蛋白质异构体之前,需要对谱图进行预处理,并可以过滤蛋白质序列数据库以加快鉴定速度。因此,我们还总结了一些流行的解卷积算法、用于提高蛋白质异构体鉴定性能的各种过滤算法以及用于定位蛋白质异构体的各种评分方法。此外,在本综述中还评估和比较了常用的方法。我们相信,我们的综述可以帮助研究人员更好地了解该领域的发展现状,并设计新的有效算法用于蛋白质异构体的特征描述。