Institute of Biomedical Chemistry, Moscow, Russia.
Biomed Khim. 2024 Sep;70(5):364-373. doi: 10.18097/PBMC20247005364.
Changes in information on the number of human proteoforms, post-translational modification (PTM) events, alternative splicing (AS), single-amino acid polymorphisms (SAP) associated with protein-coding genes in the neXtProt database have been retrospectively analyzed. In 2016, our group proposed three mathematical models for predicting the number of different proteins (proteoforms) in the human proteome. Eight years later, we compared the original data of the information resources and their contribution to the prediction results, correlating the differences with new approaches to experimental and bioinformatic analysis of protein modifications. The aim of this work is to update information on the status of records in the databases of identified proteoforms since 2016, as well as to identify trends in changes in the quantities of these records. According to various information models, modern experimental methods may identify from 5 to 125 million different proteoforms: the proteins formed due to alternative splicing, the implementation of single nucleotide polymorphisms at the proteomic level, and post-translational modifications in various combinations. This result reflects an increase in the size of the human proteome by 20 or more times over the past 8 years.
对 neXtProt 数据库中与蛋白质编码基因相关的人类蛋白形式、翻译后修饰 (PTM) 事件、可变剪接 (AS) 和单氨基酸多态性 (SAP) 的数量信息的变化进行了回顾性分析。2016 年,我们小组提出了三种预测人类蛋白质组中不同蛋白质 (蛋白形式) 数量的数学模型。八年后,我们比较了信息资源的原始数据及其对预测结果的贡献,将差异与蛋白质修饰的新的实验和生物信息分析方法相关联。这项工作的目的是更新自 2016 年以来数据库中已鉴定蛋白形式记录的状态信息,并确定这些记录数量的变化趋势。根据各种信息模型,现代实验方法可能会识别出 500 万到 1.25 亿种不同的蛋白形式:由于可变剪接形成的蛋白质、在蛋白质组水平上实施的单核苷酸多态性以及各种组合的翻译后修饰。这一结果反映了过去 8 年来人类蛋白质组的大小增加了 20 倍以上。